Dremio, a company long-focused on accommodating business intelligence workloads on data lakes, is today launching Dremio Cloud, a managed service for doing just that, on data stored in Amazon Web Services S3. Dremio Cloud builds atop Dremio's AWS Edition that it announced last month, but it's a full SaaS implementation, and adds a number of unique features.
ZDNet spoke with Dremio's founder and chief product officer, Tomer Shiran, who explained that the paradigm for Dremio Cloud is one of a global control plane with a centralized query planner that dispatches queries on S3 data stored across Amazon regions, and T shirt-sized data "engines" (clusters, really) to carry the queries out. The engines can be replicated on an auto-scaling basis to support what Dremio calls "inifinite concurrency"
Another feature unique to Dremio Cloud is single sign on capabilities supporting a number of enterprise and consumer/social identity providers, including Azure Active Directory, Okta, Ping and Google Identity. In a similar fashion, users of Tableau and Microsoft's Power BI are supported for single sign on to Dremio based on the credentials they use to log in to those BI tools.
Billing and availability
Billing for Dremio Cloud is usage-based, with engine compute resources being the unit of billable revenue. When numerous engine replicas are instantiated, rather than just one, billing will be higher. When no queries need to be accommodated then all engines spin down and the customer will not incur any charges. In other words, neither the control plane nor idle engine resources are billed (especially since the latter effectively disappear when not in use).
Dremio Cloud is now in "limited availability" (i.e. offered by invitation) but Shiran says the release is GA-caliber and that the service had already been in beta for quite some time. The service is launching exclusively for Amazon Web Services, but Shiran said the company expects to launch on Microsoft's Azure cloud later this calendar year and on Google Cloud next calendar year.
Dremio calls its platform a "SQL Lakehouse," which sounds similar to Databricks' Data Lakehouse branding. In fact, when I wrote about its Dart initiative last month, I said Dremio's platform was really a full data warehouse which just happened to operate on data stored in open formats, on cloud (or on-premises) object storage. Shiran pointed out that this is no small distinction, since leaving data in its native format in the lake means data science, data lake and other specialized engines can operate on that very same data, without requiring moving or copying, and while still allowing BI workloads to execute.
I would also point out that, beyond using proprietary storage formats, data in data warehouses is typically more curated and less inclusive than data in the lake, but that Dremio is enabling BI-style analysis on data caught by the data lake's wider net. Regardless, we are again seeing the consolidation of the data warehouse and data lake models, and witnessing how the popularity of cloud data warehouses and cloud data lakes is increasingly leading that consolidation to take place in a cloud context.