Dremio 2.0 adds Data Reflections improvements, support for Looker and connectivity to Azure Data Lake Store

Dremio, the Apache Arrow-based BI data fabric, gets smarter and faster Data Reflections, and makes cloud data lake stores first-class.
Written by Andrew Brust, Contributor

Dremio is a data virtualization and query acceleration layer that interfaces standard BI tools with collections of relational, NoSql and cloud data sources, as well as various large-scale file systems. The product leverages technology from Apache Arrow in the creation of so-called Data Reflections, which greatly accelerate queries without replicating their data.

The founders of the company, CEO Tomer Shiran and CTO Jaques Nadeau, both hail from MapR, and, in addition to leadership roles on the Apache Arrow project, both were heavily involved the Apache Drill project.

Also read: Apache Arrow unifies in-memory Big Data systems
Also read: Startup Dremio emerges from stealth, launches memory-based BI query engine

Arrow offers a unified format for representing columnar data in memory, allowing applications that support Arrow to share such data without it needing to be converted from one app's columnar format to a row store format, before being re-encoded in the other app's columnar representation.

Dremio leverages technology from Apache Arrow in the creation of so-called Data Reflections, which greatly accelerate queries without replicating their data. Essentially, Reflections work a bit like indexes do in a relational database by providing a columnar summary of the data for aggregation analysis.

Reflect on this
In this new version of the product, Reflections are improved in several ways. First off, they can now recognize and optimize for data stored in star or snowflake schemas (wherein metrics are stored in a single "fact table" and drill down categories are stored in the own, related "dimension tables") in source data systems. This improvement allows Dremio to accelerate queries against a collection of such fact and dimension tables, related through joins, rather than only optimizing against individual tables. This new Dremio version also adds vector processing capabilities for even faster performance.

The Dremio Learning Agent adds a new learning engine which observes queries executed by users and, based on common observed patterns, can recommend bringing certain other tables into a given query. And, by adding support for Azure Data Lake Store and Amazon S3, the product makes good on its promise of analyzing cloud-based data lakes together with other database and data storage products.

Lookie here
Dremio 2.0 also adds explicit support for Looker as a front-end. So while you can still use tools like Tableau, Power BI and Qlik, the new visual data darling of the industry will work too. This work is based on formal partnership between the two companies.


A Looker dashboard connected to data in Dremio.

Credit: Looker

Security, connectivity and more!
Other new features include integration of Dremio's own role-based access control security with LDAP stores, like Microsoft Active Directory. This allows access controls, down to the level of permissions on specific rows or columns of data stored in Dremio's own user management system, but integrated with the users and groups created in Active Directory, and other such stores. Dremio 2.0 also adds support for ElasticSearch and MongoDB.

That's a lot for one release, but remember, this a real upgrade, not just a security patch. It will take a little longer to download and install, but it should be well worth it.

Editorial standards