In 2011, Netflix's online video rental service regularly accounted for 30 percent of all US download internet traffic.
To support this combination of huge traffic and unpredictable demand spikes, Netflix has spent the past few years developing a global video distribution system using the Amazon Web Services (AWS) cloud.
By outsourcing to Amazon, the company says it has been able to save on the costs of maintaining and updating a datacentre infrastructure and can react better to demand. However, Netflix cloud architect Adrian Cockcroft still has a number of items on his wishlist — chiefly, faster input-output mechanisms — which are lacking in the cloud.
Before taking on responsibility for overseeing and developing Netflix's cloud, Cockcroft worked at Sun and eBay, where he helped found eBay Research Labs.
According to Cockcroft, moving to the cloud lets an organisation change how its developers work and enables it to dispense with IT operations — a concept he calls 'no-ops'. But he believes companies must still be highly critical in evaluating the types of technology on offer in the cloud.
Q: At the end of February Microsoft's Azure cloud had a severe outage and Amazon has had trouble in the past. How can you be so confident in depending on a single cloud?
A: I think there're some architectural differences — the way that Microsoft has built their cloud, they have much more linkage between regions. They have data replication across the country that is centrally managed so they have to have services that span everything. We haven't used their architecture in any real sense other than looking at it for some storage purposes.
Amazon is very anal about having regions be very separate, so the US east, west and central regions are very centrally managed — they don't talk to each other at all, which is actually a pain because there are services we'd like to have cross-region but we can't because they don't want to do the coupling. They also have [separate] availability zones [within each region]. They've had control failures for [Elastic Block Store (EBS)] zones but they've redesigned EBS to stop that.
What are you greatest concerns when it comes to designing a cloud-based architecture?
When we first went to the cloud we started off with a series of pathfinder projects and benchmarks — what is this beast, how does it behave, which facilities are mature, how does this scale and how does it work?
By outsourcing to Amazon, Netflix says it has been able to save on the costs of maintaining and updating a datacentre infrastructure and can react better to demand. Image credit: Netflix
The Netflix architecture is based on the stuff we found that works and we tended to avoid some of the things that didn't work as well, which is why we don't have a strong dependency on EBS, which has always had performance variants and there have been a number of outages that have helped us say, 'It's something not to use'. It's relatively low-performing — one of the weak spots in the [AWS] cloud.
The instances available from AWS have similar CPU, memory and network capacity to instances available for private datacentre use, but are currently much more limited for disk I/O. They typically have two internal disks and there are network attached storage options like EBS which can provide a few hundred I/O per second. It's easy in the datacentre to provide thousands or tens of thousands of I/O per second. So that is a gap in cloud offerings from AWS.
The hard thing to do in the cloud is to do high-performance IO [input-output], but that is starting to change as third-party vendors are figuring out ways of connecting high-performance IO externally, and we've worked around it with our [Cassandra] data store architecture.
Amazon themselves now have DynamoDB with solid-state disks behind it which is a very encouraging sign for me — I've been asking for SSDs in the cloud for some time. We're hoping that eventually we can get more access to them than just through DynamoDB.
Many enterprises seem keen on SSDs, so why do you think it has taken Amazon a while to roll them out?
It's purely scale for them. For Amazon to do something they have to do it on a scale that's really mind-boggling. If you think about deploying an infrastructure service with a new type of hardware — if they got it wrong, they can't turn it back out and do it again differently. So they have to over-engineer what they do.
In some ways there are parallels between Apple and Amazon.
In some ways there are parallels between Apple and Amazon. Apple builds products that take a long time and when they come out they are very well polished. With Amazon they take a long time to get stuff done but when it comes out it is very large scale. There is a long lead time for everything they do, but they have enormous resources and are starting work on these projects earlier than other people and they're having more people working on things.
What we're doing at Netflix is leveraging that investment. Amazon has thousands of people working on AWS and way more engineers than we have working on everything at Netflix. We're able to leverage [that investment] by using the APIs and telling them what we want.
It seems as though the major consumers of cloud are either technology-oriented start-ups or large companies, such as Netflix. What about medium-sized businesses?
Well, most of the people using clouds are start-ups, using five or 10 machines. We started there. Two years ago our production system was...