Dremio, the backer of Apache Arrow, focused on data lake query brokering for BI tools, has introduced a new Amazon Web Services (AWS) version of the product. This new version is tailored to AWS Simple Storage Service (S3) data lakes and available in the AWS Marketplace, and introduces two new features the company calls Elastic Engines and Parallel Projects.
Parallel Projects provide a multi-tenant-like approach to deployments of the Dremio platform. As the name would imply, they allow per-project deployments, all under a single customer account. Elastic Engines, meanwhile, allow for multiple deployments within a project, based on user-specified templates that define the number of nodes in the "engine" (which is really a cluster) and a defined inactivity period after which it automatically shuts down. Moreover, since Dremio's "reflections" (a pre-aggregation structure based on Apache Parquet and Apache Arrow) are persisted to S3, they can accelerate queries even for engines that have been stopped and restarted.
The goal with Elastic Engines is to provide for high performance and optimized billing of cloud resources, through the definition of both high-scale and more modest engine profiles, based on the expected user role and associated workload demands. Certain roles can be assigned to engines defined with a large number of nodes, execute their query work quickly by parallelizing the workload, and then shut down. Roles associated with smaller, ad hoc workloads can be assigned to smaller engines, which may be longer-running but with an efficiently smaller number of billable Amazon Elastic Compute Cluster (EC2) resources.
As a Marketplace product, Dremio AWS Edition is not a Software as a Service (SaaS) offering. But with a multi-tenant, multi-cluster approach, and simple portal experience for deployment, Dremio is intentionally creating a SaaS-like experience. Furthermore, with the granularity provided by Elastic Engines, the company can get its customers close to a pay-for-what-you-use style billing model.
Dremio's evolving orientation to cloud models, first with respect to cloud data lakes and now with multi-tenancy, ease of provisioning and more granular cloud billing models, conforms to trends in the broader data and analytics space. Customers have moved to cloud object storage for their data lakes and even data warehouses, and want platforms that are either serverless or which are premised on ephemeral computing resources, to enable usage-based billing.
For independent companies to compete with services offered by the cloud providers themselves, pivoting to cloud provisioning and billing models is a must. Dremio's done what it's had to do here and will likely do it again for other cloud platforms. Beyond that, making things more automated by, for example, defining and assigning engines implicitly, will add even more value.