Will cloud survive outage storm?

First it was Amazon. More recently, Sony.The world has been recently rocked by news that tech giants suffered massive compromises to their companies' system functions and breaches in database respectively.

First it was Amazon. More recently, Sony.

The world has been recently rocked by news that tech giants suffered massive compromises to their companies' system functions and breaches in database respectively.

On Apr. 21, Amazon's Elastic Compute Cloud (EC2) service at the company's northern Virginia site, which handles Amazon's Web Services (AWS) operations for the U.S. East Coast, went down causing difficulty for customers trying to connect to its servers over a network. Startups such as Quora, Foursquare, Reddit and Hootsuite, were some of the names affected by the outage.

Amazon sorted things out eventually, but the complexity of the problems caused the stricken public cloud service provider to take a much longer-than-expected time to resolve the issues.

In Sony's case, data from more than 70 million customers was stolen on Apr. 26 from its PlayStation and PC games network, including customer names, addresses, e-mail addresses, birthdays, PlayStation Network and Qriocity passwords, online user handles and usernames.

To date, the Japanese electronics giant has yet to pinpoint the exact attack scenario and the Japanese gaming giant promised that most services on PSN would be restored by last week.

But a blog post last Sunday on Sony's PlayStation.Blog revealed that the corporation needed "more time for testing" before relaunching its online game play for the PlayStation 3 and PlayStation Portable, as well as chat functions.

With all these negative publicity pieces that have hit the media, some have begun doubting if the public cloud computing model is the way to go forward, with Reuters reporting that some businesses may even put the brakes on plans to move their operations onto the cloud.

Over the past couple of months, I've been tracking the cloud computing issue right here on ZDNet Asia, having covered the topic in three distinct verticals: small and midsize businesses, financial services and the public sector.

There are several key takeaways that I think are now pertinent to consider in light of what has happened over the past few weeks.

First, cloud computing is here to stay, whether we like it or not. The fact that AWS got hit with a major outage, and Sony's defenses were breached and lost its data, isn't going to kill the cloud computing model.

The cloud proposition is undeniably valuable. The notion of having the ability to rent hardware and software on-demand and in a scalable way, the reduction of companies' dependence on IT servers, storage, networking and expertise to manage all of it, and the shift from a capital-intensive way of meeting IT needs to one that is based on operational expenses, are fundamentally good propositions for many enterprises to still consider.

But, while the foundations of the cloud proposition are sound, there are several other points that must be made.

The first is that nothing is ever foolproof 100 percent of the time. I'm sure AWS customers do not expect its servers and systems to never ever fail, as that's just an impossible promise to fulfill. But in Amazon's case, Computerworld reported that it had made a "configuration error" during a network upgrade and that during this configuration change, a traffic shift "was executed incorrectly".

The question to ask must be, how this error happened and why wasn't Amazon able to ensure its systems were able to continue after the error took place? Where were its backup systems to ensure services were able to carry on with as minimal interruptions as possible?

In Sony's case, its systems could have been aware of the "insecure" state of its application servers that was attacked but did not act on it. In a ZDNet Asia report, Guillaume Lovet, senior manager of FortiGuard Labs' threat response team, was quoted to say such decisions [to not act on insecure states] occur "more frequently than we think among companies, as it is for performance and service continuity reasons".

Both cases demonstrate that companies offering public cloud services must simply be beyond reproach when it comes to resiliency, security and privacy matters.

This does not necessarily mean a 100 percent uptime and hardware availability, but it does certainly mean there have to be redundant backup procedures to prevent systems from going under longer than it's necessary.

It means processes must be refreshed continually and that security and privacy policies be executed without compromise. Often it's the people and the processes that get in the way and fail, not technology per se.

On the flip side to this is that companies touting cloud computing services mustn't oversell the cloud proposition to a point that customers believe nothing will ever go wrong in the cloud. As these companies go through their marketing campaigns, they have to be as honest about the strengths of the cloud as they are with the weaknesses.

Customers, on the other hand, should not shy away from the cloud just because of the failure of these high-profile cases. It does mean that before utilizing the cloud, enterprises comprehend the technical architecture of the cloud and assess what parts of your IT infrastructure are suited to the cloud.

It also means that companies need to invest in people who can guide decisions makers through this new paradigm in computing.

And finally, enterprises need to scrutinize the complex service level agreements with a fine-tooth comb and understand exactly what it is and is not that you're buying.

Because cloud computing is so new, there will be growing pains in this evolution. The best practices for this sector are evolving too, and while there are no mature universal practices for now, there are some key ones summarized well, as a ZDNet blogger notes.

And while the cloud may have gone through a storm front with these recent events, I do believe it will recover as the storm clouds recede to see sunshine again.