Vertica 7 to NoSQL DBs: Drop dead

With its new Flex Zone product, HP's Vertica wants to take on your unstructured data, without the NoSQL middle man.
Written by Andrew Brust, Contributor

It's no secret that Big Data, NoSQL and relational products are becoming increasingly inter-operable; that trend's been in motion for a while.  But the game changer now is that products in each of these categories are assuming attributes of products in the others, in an effort to become one-stop shops, and not just part of customers' best-of-breed line-ups.

As examples, most Hadoop distributions now include some kind of interactive SQL layer and operational and data warehouse RDBMS products like IBM's DB2 and Teradata now accommodate semi-structured JSON (JavaScript object notation) data.  And now, with the latest release of the Vertica MPP (massively parallel processing) data warehouse, the trend continues, in a very interesting way.

Structure is a relative term
The Vertica Analytics Platform's new "Flex Zone" product allows ingestion of schema-less data (from log files, delimited text files, machine-generated data and the like) into special "Flex Tables" in a Vertica relational database.  Rather than forcing you to declare any schema up-front, Vertica works with Flex Tables in an accommodating way, offering a compelling hybrid database architecture approach.

First, Flex Zone overlays some very minimal structuring, essentially interpreting the raw data as a series of key-value pairs (the core data structure of most NoSQL databases).  From there, the raw data can be queried with SQL, either directly, or through any number of BI and reporting tools.  This allows for helpful, if imprecise, perusal of the data.  

On its own that's very cool, but if Flex Zone's capabilities ended there, the novelty would probably wear off.  Luckily it goes much further.

Column promotion
Flex Zone's key-value mode is just the tip of the iceberg, because it facilitates review of the data, letting DBAs or analysts discern some or all of its inherent structure.  From there, they can "promote" sections of the data to be visible as true columns, with Flex Zone taking care of assigning the appropriate data type to the column.  From here the data becomes digestible by conventional tools in a conventional manner, and yet the table can continue to be consumed in its initial unstructured state as well.

Of course, the premise of explicitly-defined columns in unstructured and semi-structured data is risky, because even explicit columns may not be be present in each row.  But Vertica is a column store database, so that's just fine.  Since data is stored column-wise, rather than row-wise, missing column values don't take up any space.  Effectively, Flex Zone leverages Vertica's MPP column store architecture and morphs it into that of a column family NoSQL database like HBase or Cassandra.  This is a fascinating hybrid approach.

And more
HP takes Flex Zone one step further, by flipping its features on their head: the key-value view of data can also be retroactively applied to tables that are fully structured and relational.  This application of the feature isn't implemented for data browsing, but rather for optimizing simple key-based lookups, thereby, says HP, providing the same kind of read access scalability that NoSQL databases tout.

The Vertica 7 wave brings in more than just Flex Zone.  For example, it adds optimization to query parallelization, both between nodes in the cluster and amongst cores within a single node.  The product also has improvements to tuning and enhancements to Kerberos-based Enterprise security.  And just to add to the category convergence, Vertica now ships a connector to Apache HCatalog, which provides a unified repository view of Pig, Hive and raw HDFS data.

I haven't been hands-on with Flex Zone, and I suppose it's possible that HP's claims of the product's data structure versatility are exaggerated.  But even if that were the case, the product would still shine a light on an important trend in the data world: the way we have taxonomized database products is becoming outdated.  The categories are becoming features.  And, ultimately, the NoSQL and Big Data revolutions' biggest win may be getting relational DBMS vendors off their butts and into the realm of innovation.  Really, the whole industry has just entered the Flex Zone.

Editorial standards