How Neo4j is taking graph databases into the mainstream

Q&A with Neo4j CEO Emil Eifrem on the development of the graph database, his biggest competition, and taking on the enterprise.
Written by Colin Barker, Contributor

Neo4j CEO EMil Eifrem: "Now we are clearly still the leader in [the graph database] space by any objective measure."

Image: Neo4j

Neo4j has done much to popularise the graph database, most famously by helping to analyse the Panama Papers.

With the company having just raised $36m to fund further expansion, ZDNet recently spoke to company CEO Emil Eifrem about the next steps.

ZDNet: $36m is a lot of money, how do you intend to spendit?

Eifrem: Yes, it is a lot of money. It's a private investment and we have been cautious and thoughtful about deploying this money. We have been very European, if you will, about how we run our company compared to all the other database companies, some of them raising $150 million to $300 million. We've always aimed to grow it in a little more careful and cash-efficient way.

You have not been a company that throws money around?

That's what's so crazy about Silicon Valley. In any other world, if you had raised $35m to $50m and now an additional $36, which is what we've done to date, it would be a lot of money. It's a lot of money in the rest of the world but [in Silicon Valley] it is not.

We think getting actual validation through paying customers is a much healthier way of doing things.

When you spend money, are you going to focus on expanding or on R&D?

We are going to be investing across the board. We are investing in product, in engineering, in sales, and in marketing, but the big narrative of this money is investment in product.

We think that ultimately companies should build great product that people want. That's what we want to do. Yes, we will spend on marketing because getting the word out in America is an important part of that, and ultimately we should benefit from that.

Now we are at a very interesting point in the graph space. We don't exactly own that space but we are the only company to benefit from graph in a big way.

For years and years, we were the solo voice in that choir. Now, others are joining the choir. Now some of the major players are announcing products in the graph space. Oracle now has a graph database, as do Amazon, Microsoft, and SAP. We are clearly still the leader in that space by any objective measure, the DB Engine, the external data points. We are as big in this space as everyone else combined.

We are the leader today, but we can't take anything for granted. If you look at Microsoft, they have a huge amount of money to throw at stuff, so we think that now is really the right time to start more investment in the product to ensure that we really remain the leader.

Surely one advantage for you must be that you have a lot of intellectual property in this market?

We have a lot of unique and proprietary intellectual property, and if it turns out that even though we are building a database, and there have been many databases built before, the fact that we have a database that is centred around relationships, that is centred around connections between the data points, and not just the data points themselves, actually turns a lot of things, I would not say upside down, but I would say it turns them by 90 degrees.

There are a number of things that are similar to building a regular database, but there are a number of things that are different, especially if you are building what's called a "native graph database". That is really where the gold is -- building a database that is fully optimised around graph and [shows] connections through data from the ground up.

Now when you do that, you have to break a lot of new ground.

How do you go about that?

It's about taking that fundamental perspective in your stack or your cloud. I am sure that throughout your career you have seen like a billion diagrams showing a three-tier architecture -- a database and then a middle layer and then a top layer. We have seen that so many times.

The difference is that we double stack on a database with multiple layers, so you have these tiers inside of a database. What you end up doing in a graph database is that you look at each and every one of them, and they all have geeky names like transaction subsystem, or a caching layer, and so on, and it is all very hard-core geeky stuff.

But fundamentally, you look at each and every one of them and you think, 'If I could just lay our tabular data like this and could truly store each and every piece, how would I evaluate each piece when I lay it out like this?'.

I London we have a big R&D centre, and that is actually one of the key things that attracts people, that attracts world class engineers to working with us, that and the fact that we are doing completely new things. It's not just a web app based on some new, fashionable technology. We are doing completely novel and new things.

Can you take me through the new, 3.1 enterprise version?

We are seeing things like unlimited size storage engines, and now 3.1 is all about enterprise strength. Now that is a phrase that everyone throws around but what we have seen happen over the last six to nine months is a shift in how graph databases are adopted. It used to be, and it still is largely today that you choose a graph database for a specific solution. You may be building a product database or a recommendation engine, or an identity management solution and you have a lot of connections in your data, so you look at a graph database.

That still happens but recently what we have seen is a shift where enterprises have started to adopt Neo4j as an enterprise-wide standard. Now that's a pretty significant thing, right?

We now have 75 to 100 global 2000 customers. We have 200 customers in total so clearly we can scale. Four of the top 10 retailers in the world use Neo4j today. Now we are still a pretty small company so I think that's pretty cool.

Now that is kind of the framwork in which we see the 3.1 release. We see that there are a number of points at which if we add these features to the database, it will accelerate enterprise cloud adoption.

In this new version, the two most important points are our next-generation clustering architecture and our new security foundation.

The clustering architecture is a huge piece of engineering that we have been working on for over two years. It had been built out of London and our chief scientist is based here.

It basically re-architects the way we save our graph databases. There are a number of underlying features but the key feature you get out of it is what is called "causal consistency". It is a geeky name for the feature which is causal clustering. What that means is that if you write a value -- let's says you go to your bank account and you update it to, say, a million dollars, then with other clustering architectures they do what's called, "eventually consistent". So, they write a million dollars to the node, then you read back from the node and you first get the old value and that will eventually get updated to the new value.

Now that architecture has upsides in terms of scalability, but we just think that it is a horrible way of writing in an enterprise application.

Now we have written it so that if you write a million to an account, it is guaranteed that within a millisecond you will get that million. We have done that in such a way that combines the consistency with scalability.

The second key feature is security. This is one of those things that if you sell only to Silicon Valley customers, open source individuals, hackers, and so on, they don't really care about this but if you sell to banks, government, big enterprises then, at that point, security becomes a central thing.

We have done a lot of work on things that enable us, for example, to do things like only allow certain people to read from a database or from particular parts of the database, and we can make this fine-grained.

Then there is all the stuff that enables you to be compliant with all the relevant security mandates.

The most relevant thing that we have done is have all this stuff that makes you compliant but while you used to have to go through a lot of setting up and manual work, we have put it all into frameworks.

We hope that this will accelerate the enterprise-wide deployments.

Are there particular areas, whether it be banking, finance, or pharmaceuticals, where you are finding interest?

Yes, over the last two years what we have found that when you are defining new categories, you don't really know where it is going to be adopted first. But we have the benefit of being open source so that you just sort of let it out.

But now that we have reached true scale, we can look at these and start seeing patterns. A couple of patterns we have seen are that people love to use graph databases for real-time recommendations. Customer A liked this, he has similarities with customer B so maybe we should offer this to customer B. That is a very popular use.

Another one if fraud detection. Using graph databases, you can very easily find patterns in data -- that is very much what we do -- and fraud detection is a lot about finding patterns. They can be easily used not just for detecting fraud but for prevention too.

That means that when I swipe a credit card, I can get an immediate 'yes' or 'no' rather than hours later.

Read more about Neo4j and graph databases

Editorial standards