Could the tech beneath Amazon's Glacier revolutionise data storage?

Could the tech beneath Amazon's Glacier revolutionise data storage?

Summary: A number of sources are pointing to Amazon and Facebook independently working on technologies that could shake-up the future of enterprise data storage

TOPICS: Cloud, Storage

Deep within the technology of Glacier, Amazon's newly-announced cloud archival service, lies a set of technologies that could have a signifcant impact on storage as we know it. 

That's the conclusion that I've come to after putting what we know about Glacier together with information from a former Amazon S3 employee, as well as some data that recently came out about Facebook's own storage plans.

Based on these three sources, it seems Amazon and Facebook are independently working on a novel approach to storage that could massively cut the cost of backing up data. Amazon's technology is already running in production, and Facebook's is at the prototype stage. 

What we know:

- Glacier is tapeless and runs on "inexpensive commodity hardware components". 
- There is no limit on the amount of data that can be stored, though the maximum size per data archive tops out at 40TB. 

- According to Wired Enterprise, Facebook is building a miniature datacentre to house an experimental hard-disk storage server that shuts down drives when they are not in use.
- Facebook's current storage servers burn 4.5Kw and Facebook hopes to use the experimental equipment to drive this down to around 1.5Kw.

What we think we know:

A post on Hacker News by a former S3 employee gives a lot more information on the Glacier technology. The user (whose identity was confirmed by ZDNet) says:
- Glacier's hardware is based on a low-RPM custom hard drives built by "a major hardware manufacturer" for Amazon.
- These drives are put in custom racks with custom logic boards, and only a percentage of a rack's drives can be spun at full speed at any one time due to some type of constraint within the system. 
- The reason for the 3-5 hour lag in accessing stored data in Glacier is that it must be taken out of these systems and moved to staging storage before it can be downloaded by the client.

What this means for storage:

When you put this all together, it seems Amazon and Facebook are independently trying to cut the cost of storage by creating data backup/archival servers that consume dramatically less power.

Both approaches rely on hard drive technologies that can alter the power consumption of each drive according to use -- if a hard drive rotates at a lower rate, it can consume less power -- while Facebook's drives are also designed to power-down completely when not in use.

We can assume that Amazon is trying to do the same thing, as when a user makes a data request from Glacier the data is loaded into staging servers. This implies that Amazon is trying to minimise the amount of time Glacier drives are having to be powered-on to deliver data, hence the staging servers (we've asked Amazon to confirm if that's the case, but the company has yet to respond). 

Still, from what we know, it seems Amazon and Facebook are both trying to cut the cost of their data storage appliances. In Amazon's case, this is probably so it can reduce the price of its services for customers and therefore encourage more users to adopt its cloud, and for Facebook it is to reduce its own rolling datacentre operational expenditures.

What this means for enterprise IT, is that Amazon, if it gets the technology right, could pass on these power savings to customers in the form of price cuts.

Facebook, on the other hand, could choose to publish schematics and architectural information about its experimental storage technology to the Open Compute Project, as it has done with its server and rack designs in the past. Ultimately, this means in a few years non-web companies could buy storage technologies based on Facebook's designs, letting them cut their power costs.

It's a fascinating time in the cloud, and I think that the technologies being worked on by Amazon and Facebook point to a better future for data backups for everyone. The main question is whether both of these companies will be willing to share their technologies with others anytime soon. 

Topics: Cloud, Storage

Jack Clark

About Jack Clark

Currently a reporter for ZDNet UK, I previously worked as a technology researcher and reporter for a London-based news agency.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • LOL

    After reading the headline I pictured an alien spacecraft built with technology far more advanced tahn anything seen on Earth, buried underneath a glacier in the Amazon... but of course, there wouldn't be a glacier in the Amazon, would there? Maybe in Patagonia, but not the Amazon.
  • Not a replacement

    For major data needs, tape is still cheaper backup. It will make inroads into companies that don't have the capital to invest in a good tape backup system, those that smaller needs, and those that need faster restoration. It will have limited success.
  • I don't know. 3-5 hr access time?

    That has got to be for extremely long term storage with rare access. As previously mentioned tape is best for that. How long ya gonna store it? 100 years?
  • Hardly revolutionary.

    What you're describing sounds a lot like the old MAID (Massive Array of Idle Disks) technology that burned brightly then faded a few years back. I'm pretty sure that Nexsan are still in business, the tech sounds a lot like they stuff they were (are ??) peddling.

    Interesting, but not revolutionary ... the biggest problem with them was aligning the data consumption application with the limitations of the technology which limited them to backup use-cases, and that got killed by deduping VTLs like data domain.