The life story of a piece of business information is generally one of decreasing temperature: when first created, it is often 'hot', needing to be accessed quickly and frequently, and so ideally it's placed on the fastest media, close to the compute resources needed to process it. While it's still 'warm', the data should be backed up and rapidly accessible, in order to facilitate business continuity should disaster strike the primary copy. Finally, when the data is 'cold' (that is, it's no longer going to change, and rarely needs to be accessed), it's ripe for archiving.
Archiving involves a lot more than dumping vast amounts of data into the cloud, or onto an on-premises tape library. For a start, it only makes sense to archive data that's likely to have some future value, or is required to be retained for compliance purposes. Beyond data curation, there are also issues surrounding discovery, file-format preservation and media longevity to consider, among others.
One UK company specialising in archiving as a service (AaaS) is Arkivum, a startup spun out of the University of Southampton's IT Innovation Centre in 2011. ZDNet recently met with CEO Jim Cook and CTO Matthew Addis at the launch of two new services, Arkivum 1+1 and Arkivum OnSite.
Arkivum specialises in long-term data preservation -- currently serving mostly higher education and life sciences customers -- using an on-premises (physical or virtual) gateway appliance to copy encrypted data to tape libraries in its data centres. Offline copies are also held, either in escrow by a third party or by the customer, allowing easy data retrieval should customers need to dispense with Arkivum's services.
Storing and managing multiple copies of customers' data in geographically separate locations allows Arkivum to guarantee 100 percent data integrity irrespective of the data volume or retention period -- which can scale from terabytes to petabytes and range between 10 and 25 years respectively.
"Customers will often pay in advance to store 100 terabytes for 25 years", says CEO Jim Cook, "which gives us a very interesting business model." Cash upfront is good, but as Cook notes "actually that's all liability, because we have to deliver that over time."
Arkivum's deep knowledge of tape storage technology sets it apart, says Cook, from "a simple cloud MSP who might be using shared infrastructure or going out and buying a whole load of JBODs and pulling together something that's going to work for five years and then be thrown away. We're not about that; we're about long-term commitment to our customers."
The LTO (Linear Tape Open) roadmap caters for impressive capacity increases: the current generation, LTO-6, supports compressed (at 2.5:1) capacity up to 6.25TB per cartridge, while generations 9 and 10 will, in due course, support up to 62.5TB and 120TB repsectively.
This means that tape -- especially when combined with developments like LTFS (Linear Tape File System) and the use of flash for fast access to metadata -- will remain the most cost-effective medium for long-term data storage, as this 2013 chart from Wikibon makes clear:
"For data that isn't being really frequently accessed and is not going to change, and with very high safety requirements, tape is a really good fit," says CTO Matthew Addis.
You can, of course, access your tape-archived data -- up to a point: "We have a 'fair usage' clause where customers are not charged for accessing up to five percent per month," says Cook. "Above that, we reserve the right to talk to them to see what we should be doing, because if they're pulling back more data than that then it's not really an archive," he adds.
Arkivum offers three services, the flagship being Arkivum 100 (A/100), launched in 2012. This offers a 100 percent insurance-backed data integrity guarantee, based on the existence of three copies of your data -- two stored in geographically separate data centres and a third held offline in a third-party escrow service. You can provision from 1TB of storage a year for at least a year with A/100, using either an upfront (CapEx) or PAYG (OpEx) payment model.
Arkivum 1+1 (A/1+1), launched earlier this year, is designed for situations where you need to provision large volumes (250TB+) of infrequently accessed storage for at least five years, but don't require the 100 percent data integrity guarantee offered with the (more expensive) A/100 service -- the key difference is that A/1+1 only holds one online copy of the data rather than two.
Also launched this year, to cater for customers who want or need to keep their data on-premises, is Arkivum OnSite (A/OnSite). Arkivum supplies the gateway appliance(s), tape libraries and media, and offers its 100 percent data integrity guarantee when (a) three copies of the data are held (two online, one offline) and (b) Arkivum is able to remotely monitor and manage the system in partnership with the customer.
Although Arkivum is currently focused on tape technology, Cook points out that "the service-level agreement we work with is entirely agnostic to the medium we're using; as far as customers are concerned, we could be writing data to stone tablets for all they care."
CTO Addis elaborates: "It's almost a certainty that humanity is going to generate ever more vast amounts of data, which means it's equally certain that people are going to develop new technologies for storing it, and the cost of those technologies will come down. Commodity IT storage -- whether it's tape, disk, flash or whatever else comes after -- is what we can build our service on."
Arkivum is not the only archive-as-a-service solution, of course, and one of the most prominent in the pure-cloud space is Amazon's Glacier. You can store an unlimited amount of data in Glacier, with prices starting at $0.011 per gigabyte per month. Upload and retrieval requests cost from $0.055 per 1,000 requests, while data transfer between AWS regions starts at $0.02 per GB. Amazon does not divulge the technologies on which Glacier runs (tape, optical drives and ageing hard drives have all been mentioned by observers), but quoted retrieval times as high as 3-5 hours have prompted speculation that tape is the primary medium. For more details on Glacier's operation and pricing, see Amazon's FAQs.
AWS's traditional attraction is its low pricing, but that's exactly where database giant Oracle has recently chosen to attack, with its Archive Storage Cloud Service (part of a range of new platform and infrastructure services). "Our new Archive Storage service goes head-to-head with Amazon Glacier and is one-tenth their price," said Oracle's executive chairman and CTO Larry Ellison at the launch. Oracle's archive storage costs $0.001/GB/month, with retrieval costing $0.005/GB, while requests (PUT, COPY, POST or LIST) cost $0.005 per 1,000 (see Oracle's website for full pricing details).
Specialist archive-as-a-service providers like Arkivum still have plenty to offer, but they're going to have to cope with the fallout from a price war among cloud infrastructure giants such as Amazon and Oracle (and presumably, in due course, Google and Microsoft).