Looking under the hood at Amazon Neptune

At re:Invent, Amazon Web Services announced the preview of the latest edition to its cloud database family: Amazon Neptune, a graph database that, unlike most rivals, lets you model graph data both ways.
Written by Tony Baer (dbInsight), Contributor

Image: Stratio

A few weeks back, my Big on Data bro George Anadiotis provided the deep dive on why graph platforms may become the next database in your portfolio, regardless of whether you know what they are. At its simplest, graph databases excel in representing the many-to-many relationships that are difficult, if not impossible, to model in relational and NoSQL databases.

Also: AWS Sumerian: A bet that enterprise AR and VR will be browser-based

What makes the case for graph databases so compelling is that they model the real world the way it actually exists. Outside of transaction systems, people tend to be interconnected in multiple social tribes that may or may not be interlocking. Product catalogs are typically composed of families of SKUs that mix and match different grab bags of features. Cyber threats often arise from a multitude of players that may or may not be interrelated, with characteristics that can often betray lineage. You could model these in a relational database, if you don't mind databases with hundreds of tables (or more) and queries requiring dozens or even hundreds of joins.

Welcome to the graph database which is designed to represent the tangled webs of relations that often exist in real life.

Not surprisingly, virtually every data platform household name has added graph support to its portfolio. Recently, Amazon Web Services joined the fray by announcing the public preview of Amazon Neptune late last year at its re:Invent conference. Now four months into its public preview, AWS is aiming for general release later this year. We had a chance this week to sit down with AWS and get a peek under the covers on how Neptune works while gaining some insights on early adoption.

A distinguishing feature of Neptune is that, unlike most graph data platforms, it supports both Resource Description Framework (RDF) graphs and property graphs. AWS cites different use cases for each form. Customers engaging in standardized data interchange prefer the more definitely structured RDF triples; this is especially useful when using data sources, such as knowledge graphs or clinical data stores, that lend themselves well to the triple model. Conversely, when contending with variably structured data sources such as social media, property graphs can be more practical.

Amazon Neptune allows you to make the choice, by declaring which model to use, but at this point, property graph and RDF data are not interoperable. That is, you can't form a single query that would span both types of data. We wouldn't be surprised if some form of federated query capability could bridge this gap in the future.

Another differentiator for Neptune is an obvious one: it leverages the same back end storage technology as other instance-based AWS database platforms. So, just like platforms such as Aurora and DynamoDB, Neptune automatically replicates six read-only copies across three availability zones (providing the option for customers to replicate up to 15 copies). Likewise, Neptune supports encryption-at-rest (using customer-managed keys) and in-transit. And it provides a similar ACID transactional model featuring a write master (that provides immediate consistency), with transactions committed on distributed replica (slaves) once at least four of them have completed updates. We believe that the exception to this practice would have to be Neptune's bulk load feature, which would suspend ACID guarantees to enable higher write throughput rates.

Also: AWS announces Secrets Manager, more tools for security

Although Neptune supports transaction (OLTP) and analytic (OLAP) queries, the emphasis for now is on transactional use cases for interactive graph applications. Not surprisingly, these use cases are the sweet spot for most early adopters in preview. As this offering is in preview, Amazon is not publishing performance benchmarks at this time.

What's interesting is how Amazon's approach to graph differs from Microsoft's. Where AWS supports graph in a dedicated platform, Microsoft supports it as part of a multi-model approach. SQL Server 2017 (and Azure SQL Database in the cloud) added support of graph tables (that are restricted to a single graph model) as an extension to SQL; here, developers need not learn specialized languages like Gremlin or SPARQL. For more complex graph scenarios, there is Cosmos DB, where SQL, JSON, key/value, and graph are made first class citizens through an API layer.

Down the road, we'd like to see Amazon take a cue from Microsoft by launching SQL developers on a direct journey to Neptune. The use cases are clearly there. Imagine a customer segmentation query made to Amazon's Redshift data warehouse that could be enriched with social graph data stored in Neptune. There is precedent for Amazon to make such an integration. It has opened up Redshift, for instance, with Redshift Spectrum, providing the ability to directly query data stored in S3 cloud storage without the need to move data. Likewise, there is Amazon Athena, which provides a serverless ad hoc SQL query service that also directly accesses S3. While the mode by which Redshift would query graph data stored in Neptune might differ, the notion is hardly out of the question.

Editorial standards