Neo4j Aura, a fully managed native graph Database as a Service (DBaaS), has just been released. The key points Neo4j emphasizes about Aura are always-on availability, on-demand scalability, and a developer-first approach. With Aura, Neo4j, and graph databases, enter the cloud era.
Aura has been in the works, and we have known about it for at least a couple of years now. Initially, Aura will run on Google Cloud, and we have also known about Neo4j's partnership with Google since April 2019. So, when discussing Aura with Neo4j CEO Emil Eifrem and Neo4j Cloud product management director Kurt Freytag, these were key points:
What took Aura so long, and where does it stand in relation to the Google Cloud partnership?
Aura, Neo4j, and Google Cloud
Cloud has been a strategic priority for Neo4j for a while now, and one that Eifrem shared in previous conversations, too. As Eifrem put it, sometimes calling out where things are going is the easy part -- the hard part is figuring out exactly when things will happen. So, that is one part of the answer; Neo4j wanted to sync with the market.
The other part has to do with the technical complexity that underpins Aura. As Freytag put it, they wanted to do it the right way. The way Neo4j has chosen to implement Aura is based on Kubernetes. Managing stateful workloads, which is a requirement for bringing any database to the cloud, only became possible with Kubernetes recently.
Interestingly, a Kubernetes-based implementation means that, although Aura initially runs on Google Cloud, it can also run on AWS and Microsoft Azure. This is part of the reason Kubernetes was chosen, and AWS and Azure will be supported eventually, too. Which brings us to the second point: Where does Aura stand in relation to the Google Cloud partnership?
Aura does not equal the Neo4j-Google partnership. As per Eifrem, Neo4j is working on deeply integrating with GCP, and there will be another announcement on that soon. The goal is to have two entry points for Aura. Google Cloud customers will eventually be able to use Aura just like they use Google products such as Spanner or BigTable and even integrate with them.
Neo4j customers, on the other hand, will eventually be able to use Aura from one central point and choose which cloud provider to use for their workloads. Today, however, Aura itself is the only entry point, and GCP is the only cloud vendor supported. Speaking of workloads, this brought up another interesting point: Zero administration.
In announcing Aura, Neo4j emphasized some key points, such as "always-on" availability (managing complex processes such as tuning, security patches, software updates and configuration changes with zero downtime), on-demand scaling (automatically resizing the database), security, ACID transactions, simple pricing, and zero administration.
Never worry about servers again. This was the phrase used in the announcement, and it made us wonder if Neo4j has gone down the "self-driving" database path. Eifrem distanced Neo4j from Oracle's definition. Discussing on-demand scaling, for example, he said that while Aura automatically resizes the database, the user is still in control, as this may impact resources and billing.
Graph database adoption in the enterprise and beyond
Aura, simply put, is Neo4j's way of getting to the mass market. Since we've been covering the graph database market for a while now, seeing various adoption metrics and prognostications, and getting insights from the field, too, we've been sincerely wondering how much of this adoption is projected, and how much of this is real.
Are 50% of organizations really using graph databases today? Or is it more like 12.5%? It's hard to tell. First off, as Eifrem put it, it depends on what you're counting: does a proof of concept somewhere in the organization count as adoption, in the same way a mission-critical project does? And then, what organizations are we talking about?
Eifrem talked about Neo4j's adoption in the enterprise, mentioning for example that 8/10 biggest insurance companies and 20/20 biggest banks in the world are Neo4j clients. Beyond the enterprise, however, measuring is harder, and adoption is lower. Metrics such as downloads are only proxies. This is where Aura comes in.
Eifrem said making Neo4j open source was their way of giving developers a database that was very powerful, flexible, and accessible to all. The vast majority of Neo4j paying customers started with individual developers downloading Neo4j, experimented with it, and realized graphs were an ideal way to model and traverse connected data.
"However, only a few of those developers had direct access to a budget to leap our Enterprise Edition. Neo4j Aura bridges that gap for individuals, small teams and established startups. I believe this is the next logical step in Neo4j's vision to help the world make sense of data".
This makes sense, as it is in sync with a world in which open-source and cloud are becoming the norm. But Neo4j is not the only option when it comes to graph, so it's also worth seeing what Aura means for the graph database world at large.
The bigger picture
First off, Aura is not the only cloud graph database in town. Both AWS and Azure have their own offerings, Neptune and Cosmos DB, respectively. There are technical differences of course, but perhaps more importantly, Neptune and Cosmos DB are restricted to running in their respective clouds, while Aura is not.
It makes sense for Google to partner with Neo4j, as it does not have an in-house graph database offering. It also makes sense for Neo4j to partner with Google, as it won't face in-house graph database competition there, and it gets the chance to offer deep integration on an up-and-coming cloud vendor. But cloud vendors are not the only ones offering a fully managed graph database.
TigerGraph has also announced such an offering in September 2019. Unlike Neo4j, however, TigerGraph is not open-source. Hence, its way of onboarding users is to offer a free tier, which Neo4j does not do for Aura. While we find dates are of little significance beyond bragging rights, the point here is that we expect others to follow, too.
In a market where even offering an image for users to deploy themselves on the cloud was not a given until recently, fully managed cloud versions signify great progress. Another development which we enthusiastically hail is the move toward graph query language standardization. Let's briefly recap.
Graph databases come in two flavors; RDF and property graph. In the RDF world, standards exist both for query language (SPARQL), as well as for data format and schema (RDF and RDFS / OWL respectively). Property graphs did not have any of that, so Neo4j put forward a proposal to work on this, under the auspices of W3C, and other vendors from both camps joined.
The first steps were successful, and recently a proposal for a standard query language for property graphs called GQL has gone through the ISO/IEC's Joint Technical Committee 1, responsible for IT standards. This is good news, but there are a couple of points worth emphasizing here. First, this is just a preliminary step. There is lots of work, and negotiations, to be done before we can safely say a standard is there.
Second, we feel this could be bigger, and better, with a little bit of extra effort. As mentioned, both RDF and property graph vendors and independent experts participate in that effort. There is a proposal, called RDF*, which could bridge technical differences between them, and lead to common standards.
As is often the case, however, technical aspects may not be the most important ones in bridging this gap. RDF has been around for longer, and there is a body of work there which does not exist for property graphs. Why reinvent the wheel, or come up with proprietary solutions, when bridging/adopting existing work is possible and could bring benefits?
Doing it wrong
A data format, which is part of a migration path for graph databases, is a good showcase. Currently, the way to import data to Neo4j is CSV. While CSV is universal, it's far from elaborate, it can't easily capture the specifics of graph data, and the import process is proprietary. Both Eifrem and Freytag admitted this is an area that needs work.
So, here's a crazy idea: how about using RDF as a standard for data exchange between databases, graph, and otherwise? It's a great match for this, and it can work today even, with a little bit of effort. Eifrem mentioned MongoDB in our conversation a couple of times, as a kind of yardstick to measure Neo4j against.
MongoDB users wanting to import JSON data in Neo4j today will find this is not a very straightforward process. There could be another way, using RDF. JSON can be converted to RDF, and RDF can be imported to Neo4j. Neo4j has a plugin for that, but it's not a first-class citizen. It's something Neo4j's Jesus Barrasa built and open-sourced, not something Neo4j officially sponsors or promotes.
Why is that? The answer Eifrem gave was that although RDF got many things right, it's not a priority for Neo4j. We suspect part of this is a business strategy, and another part may have to do with what Dan Brickley, one of RDF's key figures and Google's schema.org mastermind, has called "Semantic Web fundamentalism" -- being harangued for doing RDF wrong.
RDF comes with a body of work, but also with some baggage. An academic attitude, in the bad sense, has been part of that. While we disapprove of the attitude (and have suffered from it, too), we see that changing. And we should also point out that things such as solid foundations for semantics would benefit property graphs, too.
Case in point: Neo4j's Alastair Green, leading the GQL standardization efforts for Neo4j, and Neo4j team members working with academic researchers on Neo4j's query language semantics, stating that their understanding of updates in the popular graph database model is still very rudimentary.
Long story short: Property graphs started with adoption in mind, and things such as standardization came as an afterthought. For RDF, the opposite holds. But now seems like a good time to meet in the middle, and that would be a good thing for everyone, creating a bigger and better ecosystem. Attitudes can change, and Neo4j is a good example of this, too.
In the past, Eifrem had stated that Neo4j never produces benchmarks. When asked how does that explain recent openings for benchmarking engineers at Neo4j, Eifrem said something a bit different: Neo4j does benchmarks, it has a team for this, it just does not use benchmarks in their marketing strategy. So, who knows, if benchmarks are not that bad, RDF could be next.