A lesson for the cloud: 100 percent uptime achieved -- for 16 years

A lesson for the cloud: 100 percent uptime achieved -- for 16 years

Summary: It's all about expecting reliability.

SHARE:
TOPICS: Data Centers, Cloud
10

For more than 16 years, a NetWare 3.12 server had been doing its job. Its current administrator, who was just a child when the server first went into service, was faced with the prospect of finally decommissioning it, as it’s antique 5 1/4" full height 800 MB hard drives were finally dying. And as reported by Ars Technica, the administrator decided it was finally time to put the server to rest last Friday

Today, the achievement seems amazing. More than 16 years without a glitch.  But the truth is more that the company had a use for the server for the last decade or so, not that the operating system kept working.  Because that was what NetWare did. It just worked. And while this story is amazing for the length of time the physical hardware survived, it’s only the most recent story about NetWare doing something that seems amazing, in a world where servers are reset, rebooted, and reconfigured at a rapid pace.

In 2001, a university found a NetWare server that had been lost for four years. IT knew it existed; they could see it and manage it on their network, but no one had any idea where it was physically located. It was eventually discovered during some renovations, when a hole was punched in a wall and it was found that a previous renovation had built a wall that blocked off the server closet where it was located.

Back in the NetWare days, servers staying up for years were not uncommon. Novell even had a page of screen shots from consoles showing extended uptimes that customers submitted. Domain controllers, file and print servers, application servers, it usually didn’t matter. The software just didn’t crash; the majority of users surveyed back then said that their servers were brought down only when they needed to be updated.

That’s the level of reliability that cloud service providers need to strive for. For users to depend on their services,  they need to just work. No excuses, no finger pointing, no questions. When clients boot up their systems, the provider services just need to be available. 

We’re getting there, but unplanned, unexpected, unprepared-for outages still plague the business, as well as a general unwillingness for providers to just say, “we screwed up; we will fix it and make it better.”

See also:

Topics: Data Centers, Cloud

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • No race for "Latest and greatest"

    A major difference between the Netware era and now is there was no race for "latest and greatest". Hardware and software were installed, connections were set up, and people used the network. People weren't constantly making new and often ridiculous demands for new features few people actually wanted and even fewer new features they actually needed.

    Our law firm is still using a DOS-based database system written around 1995. We've looked into porting it to a Windows database but both I and the outside consultant who originally wrote the system have told the firm's owner (who is fairly tech-savvy for a lawyer who hasn't worked in tech), "If all you're going to do is keep the same functionality you have now but move it to a Windows database program, it will be a huge waste of time and money. The only advantage you'll get is the ability to cut and paste easier."

    (I'm not saying added functionality like storing generated documents, etc., wouldn't be nice, but with a total of about 10 people using the system, a huge outlay for new custom software isn't realistic.)
    Rick_R
  • The world has changed.

    These days servers are constantly patched to deal with security threats. Windows more than, Linux, and Linux more than Unix. But reboots are inevitable. And it's no big deal. See they have this technology called "Clustering" which allows you to take down individual machines without taking down an application or service. Any cloud provider that doesn't have this sort of redundancy needs to pack it in.
    mikedees
    • These days versus those days

      Ten years ago, I worked for a contractor that had the IT support contract for the US Senate. Most senators, committees, and sub-committes had their own servers, and the majority ran Netware and cc:Mail - also Lotus 1-2-3 and Wordperfect.

      Why? Senators received more email than most other occupations, and were definitely at the top of the list when it came to email from irate people. That made them high profile targets for hackers, script kiddies, etc.

      (Yeah, we had the IT support contract when someone mailed Anthrax to the Senate via snail mail. Talk about a SNAFU. We had to reconstruct the entire IT infrastructure in the temporary offices - YESTERDAY!!! Imagine hundreds of office managers screaming to be moved to the top of the list.

      "Do you know who I work for? A US Senator." "Yeah, you and everybody else on this list!"

      "There are some very sensitive files that are only on CD-ROM in my desk. I need you to retrieve them." "Why can't you get them?" "The building is contaminated with anthrax." "Duh!")

      Anyhow, David is not reporting on hardware reliability. Netware and cc:Mail were robust and reliable pieces of software.
      Steve Webb
      • Internet facing

        the problem today, is more and more devices are either internet facing or they are on networks where other devices are on the internet.

        That means they need to be regularly patched, so that they can't be compromised.

        With old NetWare networks, a lot of them never got anywhere near an external network, so there wasn't the need for security patching.

        We have servers that go for months between reboots. They used to go years between reboots, but that is no longer possible, unless you isolate your network from the internet.
        wright_is
    • Reboots are not inevitable

      Ksplice pretty much eliminates the need for reboots when security patching linux servers.
      fitzgerrell
  • Lessons learnt? None.

    That 1 in a million servers lasted a million hours tells us:
    - nothing
    - that Chernicoff knows zip about probability and statistics

    Gee, we all want our stuff to be 100% reliable.
    What a surprise!

    Post again when you have some practical advice to give, other than buying a lottery ticket :-(
    jacksonjohn
  • Scaling up may actually be part of the problem.

    Scaling up may actually be part of the problem. As things get larger, they become more complex. And as they become more complex, they become more difficult to keep stable. All the talk about redundancy does very little to ultimately solve the complexity problem.
    CobraA1
  • I've got a similar story with a Netware 5 system...

    It was installed in 2000. 24x7, 13 years old and still running today. All original Dell hardware, including the hard drives, all 9 GB of them. Only used to access old email stored in GroupWise, but it's still going strong.
    corton
  • corton: send it to Cool Solutions

    At Novell Cool Solutions we're looking for NetWare uptime champs. You should enter your NetWare 5 system at http://www.novell.com/communities/node/14134/16-years-uptime-can-you-top
    ssalgy
  • What, No Hot-Swap?

    No ability to swap hard drives without a shutdown? How sad...

    Also have a look at the kexec facility for Linux, which lets you do a kernel update without a full reboot.

    Put it all together, and you can have 100% uptime WITHOUT being stuck with an outdated OS.
    ldo17