Amazon re:Invent Preview: RockSet and Yotascale take their bows

A couple vendors making their debuts at re:Invent this week show how the cloud is changing the design and functionality of databases and systems management.

og.jpg

With re:Invent dominating the airwaves this week, Amazon's announcements will understandably garner the spotlight. But behind the scenes, some of the most significant announcements may crop up from the ecosystem that is building on the AWS cloud. We had a chance in the past few weeks to get acquainted with a couple of them that just emerged from stealth: Rockset, a search-oriented serverless database that exposes variably structured data as SQL; and Yotascale, a tool that integrates machine learning into cloud ops tools like Amazon CloudWatch or third-party offerings like DataDog or NewRelic.

Also: How Amazon's DeepLens seeks to rewire the old web with new AI

They provide good examples of how the cloud offers the opportunity to rethink how to design software. Rockset, cofounded by a Facebook alum who built the social graph, is a cloud-based search-based analytic database. It operates directly on raw data in JSON, XML, CSV, and Parquet and makes it queryable through SQL or Python, Java, or JavaScript through APIs. In essence, it is built like a NoSQL database, but operates like SQL. Like NoSQL databases, developers don't have to design a schema, but like SQL databases, you don't have to denormalize data, and you can use ordinary SQL syntax and conduct operations like joins.

It follows in the footsteps of Elasticsearch, which rethought the search index for a world of scale-out big data clusters. Rockset takes the next step by natively architecting a search-based analytics database on cloud storage and serverless architecture. It differentiates from Elasticsearch by targeting the large base of SQL developers.

Rockset characterizes its schema as a "relational document model," and claims the term is not a contradiction. That's due to the way that Rockset decomposes and automatically indexes data. There are parallels with Azure Cosmos DB in the automation and flexibility of indexing. As Rockset ingests data, it decomposes it to a series of key-value entities that are stored in the underling open source RocksDB key-value engine -- so we know how the database got its name (and how it differentiates from Elasticsearch, which uses Lucene). It then introspects the values to determine the data type, and then indexes accordingly. You can guide the process along by specifying what type of values you're looking to analyze, and can selectively mask or bypass specific fields, and set data retention periods.

Also: The big data odyssey of SQL Server 2019, and more data and AI news from Microsoft Ignite

The generous indexing schemes of Rockset, like Cosmos DB, is based on the design assumption that cloud-based storage is cheap, abundant, and readily scaled. The automated indexing is key to another differentiator of Rockset: the data is made queryable without the need for building data transformation pipelines or conducting data preparation.

As Amazon partner, Rockset is designed to work off data stored in S3 buckets and works with Amazon's Identity and access management tooling. It is targeting customer 360, IoT, and cybersecurity use cases.

Yotascale is also making its debut at re:Invent this week. It picks up where your systems and application management tools leave off, by adding a layer of machine learning to detect and predict cost and usage trends. It bills itself as an "autonomous cloud operations" tool that is used for managing and optimizing performance. It ingests cloud provider data covering cost, utilization, inventory, logs, and containers, and correlates with third-party performance, memory, and application configuration data.

The company doesn't claim to be the only systems management player to apply machine learning -- its differentiator is that it can synthesize the big picture from multiple sources. So, it takes in metrics from AWS CloudWatch, or third-party DataDog or NewRelic that show context in the applications, or configuration management tools like Chef or Puppet that tell you the last time that the software was patched or updated on the servers.

At this point, Yotascale is not a tool that provides lights-out operation of your cloud implementation. Instead, it focuses on cloud spending, anomaly detection, and provides recommendations for improving resource allocation and reducing cloud spend. It automatically tags cloud infrastructure to provide visibility. It picks up where cloud reporting tools leave off by providing recommendations. On the roadmap, Yotascale is planning to add predictive capacity planning, which will let you project out 3-6 months ahead to predict the amount of resource you will need based on past history or specific scenarios, especially for businesses that have seasonal variations.

There will be many more such startup stories in Vegas this week.