Santa Clara-based storage company Hedvig has launched what it believes is the first universal answer for storage issues -- the Universal Data Plane.
Intended to overcome the "rigidity, economics, and lack of data portability endemic to traditional storage", it offers a single, programmable data management layer that "spans across workloads, clouds, and storage tiers", the company says.
It does this by running on commodity servers in the public cloud, and offers a virtualised abstraction layer which enables any workload to store and protect its data across any location.
The company is the brainchild of its CEO Avinash Lakshman, a man with an impressive track record in developing new storage systems.
ZDNet: Could Hedvig's data plane be the answer to data storage issues? Lakshman: Before founding Hedvig, my background was primarily in distributed systems. In 2004, I joined Amazon and was one of the three core inventors of Amazon Dynamo. As you may know, Dynamo was the genesis for the entire NoSQL movement as we know it today.
The reason we invented Dynamo was because Amazon was not getting the kind of availability guarantees that it needed from its existing infrastructure, which was mainly Oracle.
If you go to Amazon.com today, your shopping cart is being served out of the Dynamo system. It was previously stored on Oracle and we did a lot of study at that time where we found out traffic, typically in an Amazon environment, ramped up in the fourth quarter. Any downtime in any of the Oracle systems could lead to millions of dollars of lost revenue. But that was some years ago and nowadays, of course, it would be much, much more.
So back then we decided to go a different route and build a custom solution for the shopping carts: so Dynamo came to be.
Some of the key things were that, number one we had to solve the availability problem. This implicitly meant that the solution had to run across multiple datacentres. And, it had to natively provide multi-site replication through the platform. So, that was the platform we built.
Then in mid-2007 I left Amazon to join Facebook, and I was the founder of Cassandra. That was my brainchild.
The reasons for Cassandra were that we were trying to provide storage capability for Facebook messages.
To give you some context, when we started looking at building the system, we were looking at 45 to 50 million users. We did some back-of-the-envelope calculations to provide this capability on the existing infrastructure and it turned out to be $3m to $5m dollars in total hardware spend.
So we went ahead and built this system called Cassandra, and one year later, we released it into the public domain and bootstrapped it with 55 million users, and transferred it onto our existing hardware. Then we scaled it to about 700 million users. This was in about 2011.
You can see we are talking about systems that are really addressing internet scale. So, to step back, as my time at Amazon was coming to an end, I had taken a really hard look at what was happening in the storage infrastructure space in the modern datacentre.
It struck me that there had been no fundamental innovation in storage in the last 10 to 15 years. What was happening was good, but very incremental in nature.
The skill set I had which I had garnered through the systems that I had built, and having the experience of running them operationally, were germane to the problems that I wanted to solve in storage. And that is how Hedvig came to be.
When we started building the Hedvig platform, our vision was to address how to collate multiple data spheres into the one platform.
We are talking about Tier 1, Tier 2, and Tier 3. Now, I am going to leave the Tier 3 form, which is where people typically user all-Flash arrays, because we are not going after that market at this point.
Along the way, we realised that when you look at what happens in an enterprise, they typically go for only one vendor for each platform. They go to one vendor for their SAN infrastructure, a totally different vendor for their NAS, and everybody, at that time at least, was typically looking at what their strategy should be.
I very strongly believe that you could build one platform that could cater to all these different architectures. I believe the Universal Data Plane is the answer to that.
Now, the move to the cloud has already begun, and ultimately everything will be in the public cloud, but [there's] one problem, which is that putting all of your bets on a single cloud provider is one massive vendor lock-in.
Typically, when you think of how applications are deployed, it is not only compute, it is also storage. Tomorrow if, for whatever reason, you want to move your entire application from one cloud vendor to another, then if you want to move your data along with it, you are going to find it really hard.
It's very hard to move terabytes or petabytes of data and, above all, the cost to move data out of one particular cloud vendor is excruciatingly high.
If you take what we do, then understand that all of my background has been in large, multi-site that have been built natively. Now we build a platform that builds an alliance between your site and the public cloud. It doesn't matter what type, because it is all the same to us. We are site-agnostic from a cluster perspective.
So, what if you could run your platform in such a way that it spans multiple clouds? And you can have data replicating across different clouds in such a way that there is no problem of moving different clouds?
So does that mean that you can run your applications seamlessly across different clouds and how can you guarantee that it will be seamless?
To take one of our customers, they have one cloud service, four different datacentres and across two countries -- two in London and two in Paris.
As far as we are concerned, if you take two datacentres, one of them could be AWS and one of them could be Google Cloud, it really doesn't matter to us, you can be replicating data across these datacentres, seamlessly.
Now, with the granularity that we provide for these storage assets, you could create a volume and specify which datacentres you want to replicate it into. Since we have that level of control over the provisioning, you can imagine what you can do.
Let's say you have an app that you want to replicate across Google, AWS, and another: it is just a matter of creating a volume in one of our systems and the three locations into which I want to replicate and the system can automatically do that for you.
As far as I know, we are the only ones who can provide that capability today.
But what about updates? When one updates does that automatically update the other two? Well, yes and no. When people talk about cloud, cloud means a lot of things to a lot of people. To me, cloud is virtualisation, be it hypervisor-based or container-based, and self-service. Those two, coupled in my mind, are what the cloud is.
So, if we stick with hypervisor-based virtualisation, a VM typically needs some storage. Now, your applications are going to be running over that VM, and that VM is going to be doing writes at any given instance of time at only one location.
So let's say you want your storage location running in the cloud and understanding Google, Amazon, and Microsoft and let's say that, for whatever reason, your application is running as an AWS environment. That is where the writes are coming from and, when we write, the data is coming from all three locations.
So now, when you have to move your application from AWS to Microsoft, then there is some work that needs to be done outside of storage. By the time you have moved your app to the Microsoft environment, the data is ready for you because the automatic replication will have already kicked in. That's because it gets the replication as and when you do the writes.
So, all this ease of replication and the fact that you can run it across multiple sites and get automatic updating sounds fine, but how do you make it seamless? A lot of this goes back to my background, the kind of systems that I have been involved with. If you look at [Amazon] Dynamo, we published a paper and it took off. Fortunately, or unfortunately, a lot of the research and a lot of the work had been known in academia for some time.
Now, there were a lot of papers out there, and most of the time when you are trying to build a system, you don't just read one paper, you probably read a hundred. Then you have to figure out how much of it is actually garbage that cannot translate into a real system. You have to filter.
Most of all, I believe that operational experience goes a long way and I had that both at Amazon and at Facebook. For right or wrong reasons, when you build a service you are responsible for running it operationally -- you can't just throw it over the wall and have someone else run it for you.
Now, when you run it operationally, trust me, it is never smooth. There are a lot of mistakes that might not have been apparent when designing and implementing it. Things like going back and retrofitting the architecture to make things easy for yourself for debugging, for upgrading, and so on. There are a lot of things you learn when you run these things operationally.
So, a lot of that experience has contributed to being able to build this, to make it seamless. To make it work in such a way that the app user has no idea of the capability that is running underneath.
What about security? We will have full encryption. This is not a property we have now, but we will have in our next release. Again, it is a property that you can just turn on and turn off at a storage asset, granularly.
Because you can run everything in the public cloud, you can turn on the public cloud, and then you can turn on encryption and have everything in an encrypted format. That way security is implicitly guaranteed.
Now, in the absence of encryption, security is guaranteed by the different cloud vendors. One thing that people could do in the interim, with our platform, is to perhaps use self-encrypting drives.
I don't know if the cloud providers today provide that, but that is one way to get some degree of security today. In the future, our plan is to provide encryption support in software so that it can be a property that you turn on in your storage assets and you are as secure as one can get.