Neo4j Aura, a fully managed native graph Database as a Service (DBaaS), has just been released. The key points Neo4j emphasizes about Aura are always-on availability, on-demand scalability, and a developer-first approach. With Aura, Neo4j, and graph databases, enter the cloud era.
Cloud has been a strategic priority for Neo4j for a while now, and one that Eifrem shared in previous conversations, too. As Eifrem put it, sometimes calling out where things are going is the easy part -- the hard part is figuring out exactly when things will happen. So, that is one part of the answer; Neo4j wanted to sync with the market.
The other part has to do with the technical complexity that underpins Aura. As Freytag put it, they wanted to do it the right way. The way Neo4j has chosen to implement Aura is based on Kubernetes. Managing stateful workloads, which is a requirement for bringing any database to the cloud, only became possible with Kubernetes recently.
Interestingly, a Kubernetes-based implementation means that, although Aura initially runs on Google Cloud, it can also run on AWS and Microsoft Azure. This is part of the reason Kubernetes was chosen, and AWS and Azure will be supported eventually, too. Which brings us to the second point: Where does Aura stand in relation to the Google Cloud partnership?
Aura does not equal the Neo4j-Google partnership. As per Eifrem, Neo4j is working on deeply integrating with GCP, and there will be another announcement on that soon. The goal is to have two entry points for Aura. Google Cloud customers will eventually be able to use Aura just like they use Google products such as Spanner or BigTable and even integrate with them.
Neo4j customers, on the other hand, will eventually be able to use Aura from one central point and choose which cloud provider to use for their workloads. Today, however, Aura itself is the only entry point, and GCP is the only cloud vendor supported. Speaking of workloads, this brought up another interesting point: Zero administration.
In announcing Aura, Neo4j emphasized some key points, such as "always-on" availability (managing complex processes such as tuning, security patches, software updates and configuration changes with zero downtime), on-demand scaling (automatically resizing the database), security, ACID transactions, simple pricing, and zero administration.
Never worry about servers again. This was the phrase used in the announcement, and it made us wonder if Neo4j has gone down the "self-driving" database path. Eifrem distanced Neo4j from Oracle's definition. Discussing on-demand scaling, for example, he said that while Aura automatically resizes the database, the user is still in control, as this may impact resources and billing.
Graph database adoption in the enterprise and beyond
Are 50% of organizations really using graph databases today? Or is it more like 12.5%? It's hard to tell. First off, as Eifrem put it, it depends on what you're counting: does a proof of concept somewhere in the organization count as adoption, in the same way a mission-critical project does? And then, what organizations are we talking about?
Eifrem talked about Neo4j's adoption in the enterprise, mentioning for example that 8/10 biggest insurance companies and 20/20 biggest banks in the world are Neo4j clients. Beyond the enterprise, however, measuring is harder, and adoption is lower. Metrics such as downloads are only proxies. This is where Aura comes in.
Eifrem said making Neo4j open source was their way of giving developers a database that was very powerful, flexible, and accessible to all. The vast majority of Neo4j paying customers started with individual developers downloading Neo4j, experimented with it, and realized graphs were an ideal way to model and traverse connected data.
"However, only a few of those developers had direct access to a budget to leap our Enterprise Edition. Neo4j Aura bridges that gap for individuals, small teams and established startups. I believe this is the next logical step in Neo4j's vision to help the world make sense of data".
First off, Aura is not the only cloud graph database in town. Both AWS and Azure have their own offerings, Neptune and Cosmos DB, respectively. There are technical differences of course, but perhaps more importantly, Neptune and Cosmos DB are restricted to running in their respective clouds, while Aura is not.
It makes sense for Google to partner with Neo4j, as it does not have an in-house graph database offering. It also makes sense for Neo4j to partner with Google, as it won't face in-house graph database competition there, and it gets the chance to offer deep integration on an up-and-coming cloud vendor. But cloud vendors are not the only ones offering a fully managed graph database.
TigerGraph has also announced such an offering in September 2019. Unlike Neo4j, however, TigerGraph is not open-source. Hence, its way of onboarding users is to offer a free tier, which Neo4j does not do for Aura. While we find dates are of little significance beyond bragging rights, the point here is that we expect others to follow, too.
In a market where even offering an image for users to deploy themselves on the cloud was not a given until recently, fully managed cloud versions signify great progress. Another development which we enthusiastically hail is the move toward graph query language standardization. Let's briefly recap.
Second, we feel this could be bigger, and better, with a little bit of extra effort. As mentioned, both RDF and property graph vendors and independent experts participate in that effort. There is a proposal, called RDF*, which could bridge technical differences between them, and lead to common standards.
As is often the case, however, technical aspects may not be the most important ones in bridging this gap. RDF has been around for longer, and there is a body of work there which does not exist for property graphs. Why reinvent the wheel, or come up with proprietary solutions, when bridging/adopting existing work is possible and could bring benefits?
Doing it wrong
A data format, which is part of a migration path for graph databases, is a good showcase. Currently, the way to import data to Neo4j is CSV. While CSV is universal, it's far from elaborate, it can't easily capture the specifics of graph data, and the import process is proprietary. Both Eifrem and Freytag admitted this is an area that needs work.
So, here's a crazy idea: how about using RDF as a standard for data exchange between databases, graph, and otherwise? It's a great match for this, and it can work today even, with a little bit of effort. Eifrem mentioned MongoDB in our conversation a couple of times, as a kind of yardstick to measure Neo4j against.
RDF comes with a body of work, but also with some baggage. An academic attitude, in the bad sense, has been part of that. While we disapprove of the attitude (and have suffered from it, too), we see that changing. And we should also point out that things such as solid foundations for semantics would benefit property graphs, too.
Long story short: Property graphs started with adoption in mind, and things such as standardization came as an afterthought. For RDF, the opposite holds. But now seems like a good time to meet in the middle, and that would be a good thing for everyone, creating a bigger and better ecosystem. Attitudes can change, and Neo4j is a good example of this, too.