SQL-H tightly binds the schema metadata of Hadoop and that of the Teradata Aster Database. The enabling technology on the Hadoop side is the Apache HCatalog meta data store that creates a unified storage abstraction layer around Hadoop data stored in Pig, Hive and raw HDFS formats. Hortonworks is a major source code contributor to the Apache HCatalog incubator project.
While Hive provides SQL-like querying capability natively, it only does so for its own tables. HCatalog applies Hive's meta store to work with data in other formats and introduces a consistent schema and data type standard across them all. SQL-H can then address any data in the HCatalog store and make it available to Aster Database.
With SQL-H, data in the HCatalog store can be queried as if it were local. Queries can run in a "one-off" mode, in which case they will execute remotely and return the appropriate result sets. Queries can also be configured to persist the resulting data locally in the Aster Database.
The raw power of Hadoop has made users forgiving of its sometimes low degree of fit and finish, especially between components like Hive and Pig. But as Hadoop becomes more mainstream, its own idiosyncrasies, and the inconsistencies across its ecosystem components, need to be straightened out.
Businesses won't for much longer tolerate a double standard between their established database and BI tools on the one hand, and their Big Data technology on the other. Most Big Data companies know this. Teradata Aster is taking a sensible approach to this problem. It's leveraging the open source HCatalog project to eliminate the double standard, and to have multiple technologies coalesce around a single standard .