How the cloud will save -- and change -- disk drives

Google has changed many aspects of computer infrastructure, including power supplies and scale-out architectures. Now they're asking vendors to redesign disks for cloud use. How will that affect you?
Written by Robin Harris, Contributor

Video: How major cloud vendors stack up in 2018

At a past Usenix FAST conference, Eric Brewer, Google's VP of Infrastructure, gave a keynote address, Disks and their Cloudy Future. For cost reasons Google and the other big internet data centers remain committed to disk drives, but they'd like some changes to make disks cloud friendly, at the cost of making them less consumer friendly.

The cloud and SSDs have already forced big changes on the disk drives. Mobile computing is mostly SSD-based today, and will only get more so. High performance drives - the 10k and 15k drives - are still being sold, but new models are not coming out. The industry is changing anyway, so why not listen to Google.

Read also: Google teams with MobileIron to create cloud services marketplace

Google's problem

Google's data centers are different from your enterprise. First, they run many tens of thousands servers with software designed to automatically recover from server failures. Second, they have a high degree of data redundancy, so the failure of a single disk, or a single server, doesn't affect data availability.

Nor do they much care about write latency, because most of their workload is reads. And since the data is spread among multiple servers and drives, they don't care about features that optimize a drive for the server.

Their big concern is long tail read latency. The mean read time is about 10ms, but the 99th percentile is 100ms, and 99.9th percentile is seconds. That's too long.

Read also: Big Data 2018: Cloud storage becomes the de facto data lake

What Google wants

Google wants lower latency, which is a combination of more IOPS and fewer interruptions with the disk's I/Os. But how do you get there?

Since all the data is replicated, Google doesn't need elaborate methods to achieve <1 in 10 to the negative 15th error rate, which is common on today's server drives. Since those error rates are achieved by heroic retry efforts and lengthy and capacity consuming ECC, increasing the unrecoverable read error rate would reduce latency and increase drive capacity.

Vendors could also dispense with remapping bad blocks and maintaining spare blocks, since Google treats all disks as part of a giant block pool. They don't need all drives to be the same size or any particular size, unlike RAID arrays.

Google would also like disk drives to be smarter about I/O. They label all I/Os as either a) low latency, b) max bandwidth, and c) best effort. They'd like the disk drive to understand those labels and to perform I/Os accordingly, since the disk knows best where the data and its heads are.

Also, Google would like specialized APIs so it can manage disks from above the server level. Remember, any server can fail without data loss, so Google needs a way to manage disks as a giant block pool.

Read also: Tech Pro Research: Mobile device computing strategy

The Storage Bits take

Vendors got right on Google's demand for efficient power supplies, and the rest of us benefitted. But their desires for disk drives aren't as good for consumers.

My guess is that drive vendors are working on stripped down firmware that they can put on their high-volume drives, that takes out lots of features that enterprises and consumers like, such as low error rates and bad block replacement, and adds in the APIs that Google wants.

The name of the disk manufacturing game is volume. If the cloud vendors keep buying high capacity disks, it doesn't matter what version of firmware they run. Us consumers will be able to join in the fun of reliable, high capacity and low-cost disk storage for years to come.

Courteous comments welcome, of course.

Previous and related coverage

Cloud computing: AWS, Google or Microsoft? How to choose your platform

Tech leaders share their cloud strategies and decision making.

Head in three clouds: ANAO finds ATO contracts missing service commitments

After eight reports into the outages experienced by the ATO over the past 18 months, ANAO has delivered the findings of its investigation, recommending the taxation office to reassess its service commitments with three cloud vendors.

Editorial standards