Vast Data: A smart new architecture for cloud scale storage

As enterprise capacity requirements continue their inexorable rise into the exabyte range, the problem of persisting data economically grows more pressing. Vast Data is pioneering a smart new architecture that goes beyond advanced erasure codes to ensure storage reliability, low-cost, and high performance.
Written by Robin Harris, Contributor

Vast Data is the newest storage startup to reach a billion dollar valuation with a radical take on high-scale storage. How high? A petabyte - 1,000 TB - is entry level, and scales into the million+ terabyte range, at a cost that rivals disk-based systems.

The system uses new technologiesv -- NVMe-over-Fabric (NVMe-oF), quad-level cell (QLC) flash, and 3D XPoint NVRAM -- to create a novel storage system. There's a lot of innovation in the Vast system, but the deduplication/compression is especially interesting because it shows how scale changes everything, especially in storage.

Let's get physical

In a Vast system, the "storage server" is containerized software, residing on the application server, and serving files and objects to applications. All the intelligence of the system resides in Vast's software.

The physical servers are connected to the Vast storage enclosures through NVMe-oF, which can be either 100Gb/s Ethernet or Infiniband. The Vast servers share a global address space and, through the fabric, the entire storage capacity the system appears local to every server.

The enclosures contain the 3D XPoint NVRAM packaged into Optane drives, and the QLC flash. The pizza box enclosures are just several hundred TB of flash and NVRAM, with four 100Gb/s links to the fabric.

Follow the yellow brick data road

When the application issues a write, the data is triplicated across the fast Optane NVRAM drives before the write is confirmed, a sub-millisecond process. Writes are kept in Optane drives until a full RAID stripe write, made up of individual full flash erasure blocks -- typically 256KB to a 1MB -- can be written, minimizing write amplification and wear on the flash. The often-updated metadata is kept on the Optane drives as well.

But before the data is written to the QLC flash, which is specified for about 500 lifetime writes, the Vast software performs data reduction, predictive data placement, and data protection runs, before writing the full stripe. The goal is to perform as few writes as possible to maximize the life of the fragile QLC flash.

Global data reduction

Small chunks of data can only be compressed about 50%, but with large blocks of data, especially similar data, much larger compression ratios can be achieved. Big data applications are typically storing many instances of similar info, such as genomics data.

What Vast does is run the file's blocks through a fingerprinting hash process. Blocks with similar hashes are then clustered together. A single reference block is chosen from the cohort of similar blocks and the byte level differences between it and its similar blocks are computed and stored, dramatically compressing the data.

Prediction power

The Vast system sees how each application is handling its data, and can quickly determine the expected lifetime of a block. In big data apps, most data is written once and read many times, but some data may be rewritten monthly, weekly, or even daily. The Vast system spreads the likely-to-be-rewritten data across the QLC drives to ensure long drive life, key to the 10 year warranty on their systems.

RAID power

Vast uses a form of RAID to protect the data, but RAID at enormous scale. A common RAID 6 array might use 8 data drives and 2 parity drives that protects against two drive failures,

Vast goes big. Imagine a 150 -- or 500 -- data blocks worth of capacity with 20 parity blocks in a stripe. The system could lose 20 drives (or blocks) and still recover all data.

This level of RAID redundancy is possible because the Vast software can see and access every block of storage as if it were local through the NVMe fabric.

The take

There is much more to the Vast architecture than I've covered here. If you're in the market for a few petabytes of storage you should check them out.

What I like is how the Vast architecture makes creative use of the newest technologies to build an on-prem, cloud-scale, single tier storage system. I'm not sure about their claim that the system doesn't need to be backed up. While the architecture supports that claim, I'd like to see a few years of operational data before unplugging my tape silo.

Vast uses the scale of big data apps, fabric connectivity, 3D XPoint speed, and the density of flash storage to fundamentally rethink how to build high-performance, low-cost storage. Their billion dollar valuation looks to be justified.

Comments welcome.

Editorial standards