Many companies want to move their processing into the cloud, but one of the biggest headaches can be getting hoards of data out of their own systems first.
For companies with vast amounts of rapidly shifting, transactional data -- like customer orders -- it is hard to move from one system to another without a significant interruption to service, a problem which UK company WANdisco says its technology can help solve.
"The cloud is now fully arrived and it is disrupting virtually everything," said WANdisco CEO and co-founder David Richards.
WANdisco says its Fusion technology can transfer data as it changes in on-premise file systems, Hadoop clusters or other cloud environments without interrupting operations.
"About six months ago a big travel company came to us and said 'What we want to do is move data from on-premise into the cloud, but it's transactional data meaning that it changes all the time, so we can't actually move it'," he said.
"It's a bit like grabbing files from the desktop and dropping them into a network share folder; while those files are being moved you can't edit them or change them. When you have storage arrays with 100 petabytes of data -- and that's the quantity of data people are moving into the cloud by the way -- that's a really hard problem."
He said to move 100 petabytes of data would take around six months -- far too long for companies to cope with. "We realised this was a massive problem in the marketplace.
"If you are going to migrate to cloud, the migration by definition is hybrid cloud; you have some data that lives on-premise and data that is living -- beginning to live -- in the cloud, and it would be nice if you can keep on using that data while that move was taking place."
In February, the company signed deals with Google and Amazon to offer its active-transactional replication technology on their cloud platforms. Last week, it was also announced that the WANdisco Fusion data replication offering will be sold as an IBM-branded component in IBM storage and analytics products -- covering Hadoop, on-premise or cloud environments.
Fusion allows cloud-based and on-premise systems to operate in parallel during migration, allowing companies to move data and applications gradually. The same technology can also be used to move between cloud providers and by companies to use the cloud for offsite disaster recovery without data loss, according to WANdisco.
"We also have a number of deals where people want the ability to move between clouds, so they'll arbitrage between cloud vendors. We can replicate between Azure and AWS, for example, which is quite a big use case where people don't want to be locked into a single cloud vendor. Or they may want to run Elastic MapReduce in Amazon and then maybe use some of the great analytics tools in Azure and you can do that by replicating between both," he said.
Richards said WANdisco recently did a trial run for a company with one billion files, which wants to use the cloud for disaster recovery. "The time it previously took to move two million files was close to three days -- it took us 11 minutes, so that's quite a big difference, which means it's feasible for them to move that scale of files."
Options for moving huge amounts of data into the cloud are limited. For example Amazon Web Services will ship a ruggedised hard drive -- known as Snowball -- to customers who want to move large amounts of data (say more than 10TB) into its cloud services. Customers order one, fill it with data, and then send it back to Amazon, which then uploads it, although clearly this involves some delay.