Dave LeClair, Senior Director of Cloud Strategies at Stratus Technologies, dropped by a while ago to discuss how the company was doing and share some thoughts about cloud computing and availability. As always, it was interesting to think about a topic not always mentioned in cloud computing implementations — workload availability.
LeClair started the conversation with a quick review of how Status Technologies faired since our last conversation. The following bullets summarize that portion of the conversation:
The conversation then turned to Stratus' current discussion points that center on how cloud computing users were best served if they consider outages in processing, networking and storage during the design phase of their cloud workloads.
As expected, he pointed out recent high-profile examples of outages experienced by the major suppliers of cloud infrastructure, platform and software services. Some customers were hurt because they didn't consider where and how their workloads would failover to use other resources when a failure occurs. The key point LeClair wanted to get across is that customers should really thinking about the business requirements for each of their cloud workloads and where redundant hardware and software must be deployed to address potential outages.
To that end, LeClair discussed Stratus' work with the OpenStack community to help implement what he described as "software defined availability."
Stratus is one of the few computer companies still standing that offers hardware and software based continuous processing/non-stop/fault tolerant computing environments. Most suppliers have chosen to implement availability in layers of software using various types of processing, storage and network virtualization technology. While Stratus makes it possible to use those types of technology, they are one of two remaining suppliers of fault tolerant hardware — systems that have redundant components built in and firmware-level failover control.
While using software redundancy is fine for stateless, Web-based application architectures, it isn't always the best choice for traditional commercial workloads that simply can not be seen to fail. It can take too long for processes to restart or transactions can be lost during the failover period. In that case, Stratus would point out that fault tolerant hardware is the best choice.
I've always thought Stratus had a number of very good points when they talk about planning for availability along with planning for needed capacity, performance, management and security. I've been known to suggest Stratus to clients when it appeared that they were just accepting the availability claims made by processing virtualization suppliers, such as Microsoft, VMware and Citrix, when workload migration was discussed.
Stratus really needs to make more noise about this topic as organizations increasingly look to the cloud. Unfortunately, the companies messages are being drowned out by those offered by bigger competitors.