X
Tech

Scality: Why object storage has a nice ring to it

Q&A with CEO Jérôme Lecat, CEO of object-based storage company Scality.
Written by Colin Barker, Contributor

Scality's software-defined storage RING allows customers to store and access billions of objects, or even petabyte-sized objects, across standard hardware.

The company's proposition is that it can do this with very large files (it claims to have over 800 billion objects stored on its servers) and at lightning speed.

s00jzy1f400x400.png

The French company has spent seven years first developing and then delivering the product. Users and partners include blue-chip companies from Comcast to Time Warner Cable, Orange to RTL, and Renault to the Los Alamos National Laboratory.

We talked to CEO Jérôme Lecat to find the secret of its secret sauce.

ZDNet: How would you describe the company's model?

Lecat: We are selling the software and our customers deploy it on servers to build an object storage platform. Where it becomes more complicated is that, if you take most of our competitors, their object storage software is targeted at archival storage -- cold storage.

Our software, although it is object storage technology and it can look like static object technology, is different. From an application standpoint, we can take primary workloads on the consumer internet.

So now you have an object store that is not only an object store, but is also a file store, and is not only a file store, it can actually serve very active workloads with lots of operations a second and lots of throughput. Now none of the other object storage vendors can do that.

So where have you started from?

Well, I started with another company that was a large-scale email platform for service providers [Bizanga, bought by Cloudmark in 2010]. Our customers were some of the biggest telcos all around the world.

While I was there I was talking with two customers about the need for a new generation of storage that was completely software-based, extremely scalable, and able to respond to users' requests over the internet in near real-time.

They both needed performance, but they needed performance that could actually wait for a few tens of milliseconds. If you wait for only that time, you don't notice it.

So they wanted high performance, but not a high performance system based on a small database, but real-time in a system that was dealing with thousands of requests through the internet. That, of course, is a very different performance profile.

So we developed an architecture that is based on that and then built up the business from there. And the business splits into two: half is with service providers and half enterprise.

jerome-lecat-openstack-summit-2015-1080x675.jpg

Lecat: "We have designed out product to work on a very large scale and many of our customers deploy on hundreds of machines."

Image: Scality

They were completely separate businesses but now our enterprise customers are realising that, as they transition to deal with small businesses, they need exactly the same thing that service providers need.

That is our business model and our business. We have 108 customers to date, a lot of larger projects: 25 percent of our customers spent more than a $1m with Scality and that's on software alone. We have distribution agreements with HP and Dell along with a partnership agreement with Cisco.

We are active around the world and half of our revenues come from the US and 35 percent from Europe -- the UK, Germany, and France -- and 15 percent of our revenue comes from Japan, where we have a very strong presence.

So you must be dealing with large amounts of data in the gigabytes and terabytes?

Yes, we do. We can easily saturate the network, so you need a good network design, good switches that have phenomenal capacity, and so on. That is all part of the equation.

And you have a new version out?

Yes. The first part is our support for [Microsoft] Active Directory. It is not just support for Active Directory but support for IAM [Identity and Access Management] from Amazon Web Services.

IAM is the engine that supervises our security policies and our access management, and it is completely compatible with the Amazon approach.

The second feature is that we now have an option for compliance which is especially good for industries like healthcare.

The third thing is the release of support for AWS S3 server which is especially good for working with Docker. You can download a Docker image, put it on your laptop so you can download an S3 instance, and this essentially gives you S3 on your laptop.

Once you have it on your laptop and put it on a server -- and it's production level code, so you could have a one machine micro-cloud and run it in production.

More and more people are doing this and using it for back-up so that they can run on a single machine backed up in the cloud.

When they want to upgrade they can transition to the Ring, which is our main system and it is open source. At that point they need to deploy several machines because the Ring is a distributed system.

Can you take me through the Ring and how it works?

The Ring is a distributed system, so we need multiple machines, and what makes our system different is that no machine is specific in any way. So a machine can break and it will not impact the overall service at all.

We have designed our product to work on a very large scale and many of our customers deploy on hundreds of machines.

At the very core of the product it is a peer-to-peer network between all the hard drives of the machines on the network. So, assuming you have 10 hard drives a machine on six machines, you would have 60 hard drives, so you have a peer-to-peer network between the 60 drives. And whenever you want to store something the system will give that something a key.

If you are running a DS3 API for example, the hash of the URL will become a key -- the address of what you want to store on the system. Then the peer-to-peer algorithm will find out which hard drive is responsible for this key.

There is no table. In a file system you would normally have a file allocation table (FAT) but in our system there is no FAT there is just the key from the address, showing where it is and on which hard drive.

The data will be placed there. You just take your piece of data and it finds the right place.

The advantage of this is that it will find the data, and the number of steps it will need to do this is limited, even in very, very large systems.

It is also a very stable algorithm. It is peer-to-peer so if there is a hard drive that dies for whatever reason, the system will reconfigure itself automatically, without any human intervention.

We also make sure that there are multiple representations of an object so that we protect the object through replication or we will protect it using erasure code. And if a hard drive is lost, the system will automatically reconfigure itself and take into account that it has lost an element so it will re-build the information on a different hard drive.

And it will do it automatically on six hard drives -- which is why we have six servers as a minimum. That way, if a hard drive dies, you don't have to act on the system right away, it can wait. The system is stable.

That is the core of our technology and then, on top of this, we have built a distributed database for the metadata. Doing an object database the way I have described it is actually very simple and you can find several of them in open source.

What we are adding to this is that, first, we have a representation that offers very high performance.

Secondly, we have added to that our own metadata database when most of our competitors are taking a database from the open source community.

As much as the object store is immutable [it cannot be modified after it has been created], if you want to offer a file system, you have to be able to rename a file, modify a file, it needs to be POSIX compliant, and so on.

A lot of proprieties are not really object-store so it is really thanks to our object level database that we are able to do this.

What stage is Scality at?

We have done that and so far we have raised $93m. We are funded by US and European VCs and by some major corporations -- HP is one. We are at 190 employees.

I call out stage a 'scale-up'. We are not a startup and we are not an independent enterprise, yet. We are still investing very heavily so, for example, we have 72 development engineers.

Further reading:

Editorial standards