Online storage service MediaMax, also called The Linkup, went out of business following a system administration error that deleted active customer data. The defunct company leaves behind unhappy users and raises questions about the reliability of cloud computing.
Streamload offered unlimited and then 25 GB of free storage for quite some time. This resulted in a tremendous amount of data stored in a few million free, non-active accounts for [nine years]. Streamload was literally paying for former users to store 100’s of terabytes of old, inactive data for free. In preparation for the split of the two companies, and subsequent move of the MediaMax application to SAVVIS, it was determined that the inactive data from former users would be purged on the Streamload/MediaMax storage system, thus shrinking the overall storage needs and costs for the new MediaMax company. During this process, a system administrator ran a script that misidentified active account data and disassociated physical files from their owners.
Although The Linkup lost a ton of customer data, CEO Steve Iverson told Network World he's unsure how much is gone:
Iverson says at least 55% of the data was safe. How much of the remaining 45% was saved is not clear, he says. "We know there was definitely a lot of customer problems, and when we looked at some individual accounts, some people didn’t have any files, and some people had all their files."
THE PROJECT FAILURES ANALYSIS
As with most failures, this story is fraught with complications and contradictions. Besides finger pointing and back-biting, which I suppose is to be expected, confusing corporate relationships coupled with a seemingly bizarre level of process and technical carelessness lend a weird flavor to the whole mess.
The human drama is documented in links from this post; more importantly, two significant and highly connected issues were at play:
- Business process failures. Apparently, the company allowed a lone system administrator to perform tasks affecting the company's core business without sufficiently performing dry runs. I suppose this point is self-evident: scenario planning is critical whenever IT handles irreplaceable data. Management is responsible for establishing all operating plans and contingency procedures before IT executes data-threatening procedures.
- Technical failures. Given the high stakes and the script's intended goal, the company should have performed intensive testing ahead of time.
Beyond meeting legitimate business and regulatory requirements, retaining years of old, inactive data adds unnecessary risk and cost.
There was also a process failure. Newsgator takes active steps to isolate problems and prevent this type of damage. In addition to sandbox testing, which is computer science 101, we require two-key authorization: the sys admin can only run these types of scripts after a second person has given approval. A well-defined system of checks and balances prevents problems.
While this case is an interesting footnote in the history of IT failures, the larger implications relate to cloud computing. On this subject, Larry Dignan says:
[The cloud's] growing pains, which are more evident each day that we rely more on service-based software efforts, indicate that you can’t really trust the cloud at this juncture. It’s too early and providers are learning as they go.
Despite being a cloud-based failure, the underlying problem is human error and poor judgment. This cloud failure is no different from any other IT problem, where immature process coupled with lax management oversight resulted in catastrophic meltdown.