Virtualization's forgotten feature: Short reboot times

Virtualization's forgotten feature: Short reboot times

Summary: How much time do you spend sitting around waiting for physical systems to reboot during maintenance windows?

SHARE:
TOPICS: Virtualization
25

In many ways, virtual machines are a system administrators dream come true. Short boot times, though an often forgotten feature, are part of that dream. If you haven't thought about it, how much time do you spend per year rebooting and waiting on systems? Trust me, it's more than you realize. Virtual machines alleviate the long waits, the potential hardware failures, and the endless questions for status updates. Virtual machines still need rebooting, but without the waiting. So if you think you spend half your life waiting on systems to reboot, you're not far from the truth.

Reboot a physical system and wait. And wait, and wait, while the other callers on the maintenance call sound like young children on a long road trip, "Are we there yet?"

Here's a very typical dialog that many of you are familiar with from your own experience:

12.18am: Can you go ahead and reboot that system?

Sure, rebooting now.

12.20am: OK, where are we with that system?

It's still shutting down. Some services take a while to close.

12.23am: Can you give us a status on that system?

It's just now going through POST.

12.26am: What's our status?

You reply calmly: It's coming up. Right now, it's still checking hardware.

12.30am: Is it up yet?

No, Windows is starting.

12.33am: Is there a problem with the system?

No, it's almost at a logon prompt.

12.36am: Are we there yet?

Yes, I'm logging on now.

Sound vaguely familiar to you? It should if you're in the business of rebooting systems for maintenance or troubleshooting. Just for fun, here's that same conversation with a virtual machine as the topic:

12.18am: Can you go ahead and reboot that system?

Sure, rebooting now.

12.20am: OK, where are we with that system?

It's still shutting down. Some services take a while to close.

12.23am: Can you give us a status on that system?

12.26am: Are we there yet?

Yes, I'm logging on now.

Many virtual machines don't have any trouble shutting down or booting up, and boot times can take less than four minutes. Those of you who have experience with both types of systems know that I'm not exaggerating. If I've exaggerated in this demo, it's in shortening the time for a physical system reboot, and lengthening the amount for a virtual system reboot. The illustration is simply a demonstration of the difference between the two.

Reboot times are no longer a significant time problem during maintenance windows, which lowers the duration of maintenance windows considerably. It also makes troubleshooting through multiple reboots a lot less tedious. If you have to employ a team of system administrators to manage hundreds of server systems who reboot servers at least once a month for patching, you've saved a huge amount of time — time that translates as less down time.

Time is money.

You're not just saving your system administrator's time. Add up the time saved for everyone on that call, plus time saved for the hardware guy standing by in the datacenter in case of a hardware failure.

You can do the math yourself. The savings is huge.

Converting your physical systems can save money, you know that already. It can save labor too. But most of all, it can save time.

Topic: Virtualization

About

Kenneth 'Ken' Hess is a full-time Windows and Linux system administrator with 20 years of experience with Mac, Linux, UNIX, and Windows systems in large multi-data center environments.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

25 comments
Log in or register to join the discussion
  • From Power on at the UPS

    From Power on at the UPS it takes 45 seconds to the log in, then to Firefox running a total of 1 min 12 seconds.
    Standard WD hard drive, AMD 6 core, Win 7.
    1 min 12 sec to wait from a cold boot including browser up.
    Fine by me.
    MoeFugger
    • I'm talking servers not workstations

      Win 7 is not a server.
      khess
      • Servers are usually redundant anyways . . .

        Servers are usually redundant anyways - virtual or not, another server is gonna take the load while the other one cycles.

        Besides, even a physical server shouldn't be much slower than a virtual server. If MoeFugger can get his workstation to boot up that fast, shouldn't a decent server admin be able to do the same?

        And last I checked, even my VM emulates the POST sequence . . .

        Although to be fair, POST is largely dependent on your vendor, and if you set it up right, most machines can POST pretty instantly.

        And of course, even a VM has to be put onto a physical machine. You still have to consider the physical machine may need a reboot, no?
        CobraA1
        • Ummm.... No .... Hell no.

          Servers have more advanced hardware. Disk arrays have to be powered on, and initialized. fans synchronized and checked. Cooling assured, memory checked.... And The admin can't do a damned thing to speed that up. Also has nothing to do with the Operating system either.
          mikedees
          • Wow

            Love my job, since I've been bringing in $5600… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online.(Click Home information)
            http://goo.gl/NcF9c
            AlisonKrauss
          • However these tasks don't need to be performed...

            ...for a reboot. Everything should already be initialized during a reboot.
            ye
        • Server grade hardware has a much longer boot process

          Than a workstation. The time it takes to reachost, hence when the OS starts could easily be 4 to 6 minutes. A failovercluster will make sure a reboot of the physical host (the one that hosts the vm's) has no downtime for the acual vm's as these are being migrated (online) to other hosts in the cluster.
          sjaak327
        • I used to own a Sun Fire 3800 system.

          That system could take quite a while (30-45 minutes) to perform its POST. And that didn't include a huge amount of disk.
          ye
        • Don't think you have much experience with servers

          Workstation OS certainly do not have the capacity to host large scale operations workloads and load balancing oriented task. How are you going to host that SQL Server that is currently hosting a 1.5TB database in your windows 7 x64 or CentOS desktop?

          Well, technically, you 'can' but it will be a lot of work and time waste just to 'get it working' and like the article mentioned, in the corporate environment, time IS money.

          In my current job, we have a maintenance windows of only 2 hours for downtime and we have more than 50+ servers (all XenApp servers serving 1000+ users) so you can imagine that short boot time IS important!!!
          lord_lad
  • What system are you running?

    20 minutes to shutdown and reboot Windows Server??? That's long even by Windows standards. Can you please share with us which applications you were running (that took so long to shut down), what server you were using and what special hardware you were using? I could maybe see a high end SQL engine taking time to disconnect peers, shut down the database and then let Windows shutdown and reboot. I could also possibly see a specialized multipath storage driver taking time to initialize... but 20 minutes?

    I used to see that in mid-range UNIX days (where Sun and HP UNIX machines were very slow to boot). I haven't seen long reboots with Linux (even Linux databases) in years.
    lkarnis@...
  • Aside from avoiding POST how do VM's reboot faster?

    The OS still has to perform the same shutdown and startup process regardless of whether it's hosted on a physical server or a virtual server. Likewise a reboot doesn't usually involve POST and that's typically a function of powering on the system.
    ye
    • Intel's server class systems do a complete POST on every boot.

      I think these are the systems he is talking about. Real server class systems have multiple SCSI and RAID subsystems to initialize and they do it every time. It can take up to 5 minutes, but it seems a lot longer when someone is waiting, and calling.
      anothercanuck
      • I know some systems do and some don't.

        I'll have to check it out on one of my x4100 and T1000 systems as it's been a while since I actually observed a reboot.
        ye
        • My x4100 performs POST after a reboot.

          My T1000's and V210's do not. Thus rebooting these SPARC based systems would probably take no longer than booting a virtual system.
          ye
      • Yep

        You got it.
        khess
    • Wrong

      Sorry, but physical systems do a lot more.
      khess
      • Such as?

        When I first power up my various Sun systems the ILOM/ALOM boots up and runs through its various tests. Once up I can then power on the system itself which runs some form of POST.

        When I reboot these systems the ILOM/ALOM doesn't need to run its self tests. Can't recall if the system itself does or not.
        ye
        • If you take yout average Dell or HP blade server

          The time it takes to reach post is about 4-6 minutes. And as said no way to speed this up. Now this isn't a problem, as we use these servers for virtualisation, and since we have several of them in a cluster, these systems rebooting does have no visiable downtime for anybody but me, as the workload is being drained to the other servers.
          sjaak327
          • Ken was referring to rebooting.

            Not initial power on. My testing shows my SPARC based systems do not perform POST when rebooted. However my Intel based systems do.
            ye
      • Here is the boot sequence of a T1000

        This is from initial power on (of the server, not the ALOM) to the login prompt and then a reboot (the "init 6" command) back to the login. As you can see the POST was not performed during the reboot:

        sc> poweron
        sc> console -f
        SC Alert: Host System has Reset
        Enter #. to return to ALOM.
        0:0:0>
        0:0:0>Sun Fire[TM] T1000 POST 4.30.4.b 2010/07/09 14:25
        /export/delivery/delivery/4.30/4.30.4.b/post4.30.4-micro/Niagara/erie/int
        egrated (root)
        0:0:0>Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
        0:0:0>VBSC cmp0 arg is: ffffffff.00000211
        0:0:0>POST enabling threads: 00000000.ffffffff
        0:0:0>VBSC cntl arg is: ffffffff.00000211
        0:0:0>VBSC selecting POST MAX Testing.
        0:0:0>VBSC setting verbosity level 2
        0:0:0>Start Selftest.....
        0:0:0>Master CPU Tests Basic....Done
        0:0:0>Init MMU.....
        0:0:0>L2 Tests....Done
        0:0:0>Test Memory....Done
        0:0:0>Setup POST Mailbox ....Done
        0:0:0>Extended CPU Tests....Done
        0:0:0>Scrub Memory....Done
        0:0:0>Functional CPU Tests....Done
        0:0:0>Extended Memory Tests....Done
        0:0:0>IO-Bridge Tests....Done
        2013-05-11 09:10:51.418 0:0:0>INFO:
        2013-05-11 09:10:51.425 0:0:0> POST Passed all devices.
        2013-05-11 09:10:51.445 0:0:0>POST: Return to VBSC.
        2013-05-11 09:10:51.464 0:0:0>Master set ACK for vbsc runpost command and spin..
        .
        -
        SC Alert: Host system has shut down.
        \
        SC Alert: Host System has Reset
        /

        Sun Fire(TM) T1000, No Keyboard
        Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
        OpenBoot 4.30.4.d, 16256 MB memory available, Serial #73263168.
        Ethernet address 0:14:4f:5d:cb:75, Host ID: 845dcb75.


        Boot device: disk File and args:
        SunOS Release 5.10 Version Generic_142909-17 64-bit
        Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
        Hostname: t1000
        Reading ZFS config: done.
        Mounting ZFS filesystems: (5/5)

        t1000 console login: root
        Password:
        May 11 09:13:08 t1000 login: ROOT LOGIN /dev/console
        Last login: Thu May 2 13:11:59 on console
        May 11 09:13:08 t1000 sendmail[379]: My unqualified host name (t1000) unknown; s
        leeping for retry
        Oracle Corporation SunOS 5.10 Generic Patch January 2005
        root@t1000# init 6
        root@t1000# svc.startd: The system is coming down. Please wait.
        svc.startd: 79 system services are now being stopped.
        May 11 09:13:29 t1000 syslogd: going down on signal 15
        svc.startd: The system is down.
        syncing file systems... done
        rebooting...

        SC Alert: Host System has Reset
        -

        Sun Fire(TM) T1000, No Keyboard
        Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
        OpenBoot 4.30.4.d, 16256 MB memory available, Serial #73263168.
        Ethernet address 0:14:4f:5d:cb:75, Host ID: 845dcb75.



        Boot device: disk File and args:
        SunOS Release 5.10 Version Generic_142909-17 64-bit
        Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
        Hostname: t1000
        Reading ZFS config: done.
        Mounting ZFS filesystems: (5/5)

        t1000 console login:
        ye