Deep within the technology of Glacier, Amazon's newly-announced cloud archival service, lies a set of technologies that could have a signifcant impact on storage as we know it.
That's the conclusion that I've come to after putting what we know about Glacier together with information from a former Amazon S3 employee, as well as some data that recently came out about Facebook's own storage plans.
Based on these three sources, it seems Amazon and Facebook are independently working on a novel approach to storage that could massively cut the cost of backing up data. Amazon's technology is already running in production, and Facebook's is at the prototype stage.
What we know:
- Glacier is tapeless and runs on "inexpensive commodity hardware components".
- There is no limit on the amount of data that can be stored, though the maximum size per data archive tops out at 40TB.
- According to Wired Enterprise, Facebook is building a miniature datacentre to house an experimental hard-disk storage server that shuts down drives when they are not in use.
- Facebook's current storage servers burn 4.5Kw and Facebook hopes to use the experimental equipment to drive this down to around 1.5Kw.
What we think we know:
A post on Hacker News by a former S3 employee gives a lot more information on the Glacier technology. The user (whose identity was confirmed by ZDNet) says:
- Glacier's hardware is based on a low-RPM custom hard drives built by "a major hardware manufacturer" for Amazon.
- These drives are put in custom racks with custom logic boards, and only a percentage of a rack's drives can be spun at full speed at any one time due to some type of constraint within the system.
- The reason for the 3-5 hour lag in accessing stored data in Glacier is that it must be taken out of these systems and moved to staging storage before it can be downloaded by the client.
What this means for storage:
When you put this all together, it seems Amazon and Facebook are independently trying to cut the cost of storage by creating data backup/archival servers that consume dramatically less power.
Both approaches rely on hard drive technologies that can alter the power consumption of each drive according to use -- if a hard drive rotates at a lower rate, it can consume less power -- while Facebook's drives are also designed to power-down completely when not in use.
We can assume that Amazon is trying to do the same thing, as when a user makes a data request from Glacier the data is loaded into staging servers. This implies that Amazon is trying to minimise the amount of time Glacier drives are having to be powered-on to deliver data, hence the staging servers (we've asked Amazon to confirm if that's the case, but the company has yet to respond).
Still, from what we know, it seems Amazon and Facebook are both trying to cut the cost of their data storage appliances. In Amazon's case, this is probably so it can reduce the price of its services for customers and therefore encourage more users to adopt its cloud, and for Facebook it is to reduce its own rolling datacentre operational expenditures.
What this means for enterprise IT, is that Amazon, if it gets the technology right, could pass on these power savings to customers in the form of price cuts.
Facebook, on the other hand, could choose to publish schematics and architectural information about its experimental storage technology to the Open Compute Project, as it has done with its server and rack designs in the past. Ultimately, this means in a few years non-web companies could buy storage technologies based on Facebook's designs, letting them cut their power costs.
It's a fascinating time in the cloud, and I think that the technologies being worked on by Amazon and Facebook point to a better future for data backups for everyone. The main question is whether both of these companies will be willing to share their technologies with others anytime soon.