X
Tech

Symbolic IO and the limits of compression

A start-up offering a "computational defined storage solution" which they claim offers the "fastest, most dense, portable and secure, media and hardware agnostic" storage solution. has hit the market. Too good to be true? I look at their patents to divine the secret sauce.
Written by Robin Harris, Contributor
compressthis-5631.jpg

Compress this!

Robin Harris

Here's the hype:

Symbolic IO is the first computational defined storage solution solely focused on advanced computational algorithmic compute engine, which materializes and dematerializes data -- effectively becoming the fastest, most dense, portable and secure, media and hardware agnostic -- storage solution.

Dematerializes data? That is a feature I've never heard anyone ask for. Sounds like a few steps up from "cloud".

Patents

I haven't spoken to anyone at the company, so I started by looking at their patents. The founder of Symbolic IO, Brian Ignomirello, has an uncommon name, so finding his patents is easy. Two of particular interest: Method and apparatus for dense hyper io digital retention and Bit markers and frequency converters.

If the product follows the patents, there are two key elements to Symbolic IO's system:

  • An efficient encoding method for data compression.
  • A hardware system to optimize encode/decode speed.

As you may recall from Claude Shannon's seminal A Mathematical Theory of Communication -- (it's amazing how fast that can empty a room):

The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters, is roughly 50%.

Some assume that compression beyond 50 percent -- cut the length in half -- is therefore not feasible. But that's wrong, as de-duplication demonstrates, which achieves compression ratios as high as 25 to 1 in some production environments.

Symbolic IO, evidently, seeks to achieve very high compression with a different, hardware-accelerated technique. More about the hardware in a moment.

Their system analyzes raw data to create a frequency chart of repeated bit patterns or vectors. These vectors are then assigned bit markers, with the most common patterns getting the shortest bit markers.

In addition, these patterns are further shortened by assuming a fixed length and, say, trailing zeros. Ideally, you might replace 4k bytes with an 4 byte market, for a massive compression ratio and much higher bandwidth.

Since the frequency of bit patterns may change over time, there is provision for replacing the bit markers to ensure maximum compression with different content types. Bit markers may be customized for certain file types, such as mp3, as well.

Maintaining these data structures takes a lot of I/O. Keeping this I/O from bottle necking the entire system is key. Which gets us to the hardware.

Hardware to the rescue

Symbolic IO's patent for digital retention discusses how servers can be optimized for their encoding/decoding algorithms. The solution includes:

  • A specialized driver, of course.
  • A DIMM-slot based hardware controller.
  • A box of RAM controlled by the DIMM controller, reached by a unique memory interface.
  • Super caps to maintain power to the RAM in case the lights go out.

Lots of lookups to re-hydrate the data, so RAM is the obvious answer. Adding intelligence to a DIMM slot offloads the work from the server CPU, while giving you the fastest and most consistent I/O possible -- much better than any PCIe or NVMe bus.

The Storage Bits take

While I like the clever DIMM slot controller, I'm not sold on Symbolic IO's claims. Why? Because much of the bulkiest data is already compressed -- video, for example -- and if the compression is thorough the data should be nearly random, making it difficult to find vectors common enough for further compression.

Also, much depends on the stability of the bit patterns over time, otherwise you'll be generating a new frequency chart every few days, generating considerable overhead. And the data structures need to be bulletproof, or all your data could go poof in a millisecond.

But over all, a refreshingly creative data storage architecture.

Courteous comments welcome, as always. Parts of this post appeared first on StorageMojo.com

Editorial standards