Apache Cassandra firm DataStax is building a new graph database using the expertise of the engineering team from Aurelius, whose acquisition it announced today.
All the engineers at Aurelius, the company behind the open-source distributed Titan graph database, are moving to DataStax to start work on the DataStax Enterprise (DSE) Graph project.
"We're not going to do an integration. The play here is we'll take everything that's been done on Titan as inspiration, and maybe some of the Titan project will make it into DSE Graph," DataStax engineering VP Martin Van Ryswyk said.
"But we're really going to build something new because we're going to be able now to take advantage of Cassandra specifically and DSE features specifically. It will be an engineering effort to build a new product. We will not be supporting or integrating Titan as a product into our portfolio."
Graph databases use nodes and the connections between them to describe networks and contexts. Last year, Forrester Research predicted that about 25 percent of enterprises will be using graph databases by 2017 for next-generation apps that need connected datasets.
"Graph databases simplify and speed up access to data that is complex and contains many connections. [They] use graph structures with nodes, edges, and properties to store and access connected information, and can traverse parts of the data without touching the whole graph," Forrester said.
Aurelius managing partner and Titan lead developer Matthias Broecheler said the primary contributors to Titan are moving across to work on the new commercial project, which will also be available in an open-source community form.
"Two and a half years ago, when we started the Titan project, people really were not convinced that you could actually scale graph databases. Now we're at the point where people have accepted that, are using Titan and are coming to us saying, 'We're using Titan in mission-critical applications. We really need commercial support from you guys', he said.
"That's why from our Aurelius perspective this move makes a lot of sense - because we can get to commercial readiness much quicker with the help of DataStax, which already has the distribution channels and salesforce and support set up. We'll join them and build on top of that and then give the tech community a product that they can use, with all the support and infrastructure that they'd expect."
DataStax's Van Ryswyk said a number of DataStax customers have already been using Titan, which can run on top of Cassandra, and had undertaken integration work themselves to make it run with DataStax Enterprise, the commercial bundle of Cassandra tools and services.
"We were talking to a lot of the same customers with the same use cases and that to do it right, it made a lot of sense to take Titan and evolve it into something new," he said.
Creating DSE Graph, whose pricing and packaging has yet to be determined, is a major undertaking but the design processes have already begun, Van Ryswyk said.
"Before we did the acquisition, we had a lot of talks about what we thought would be the right way to do this and had agreement on that. But we haven't done the detailed designs or prototypes yet so it's a little early for us to figure out when it will be done. But it's not an immediate thing. I don't think it's multiple years," he said.
The goal for DataStax is to create a highly-scalable, highly-available distributed graph database.
"That's why it's such a good match. It's a very hard problem and Matthias [Broecheler] has spent a good part of his life thinking about this problem - to take a graph database and have it distributed across nodes," Van Ryswyk said.
"The fact that that's the problem they're trying to solve and it's also the problem we're trying to solve: to make things very efficient and scalable and distributed. That's why it's a good marriage between the two."
Matthias Broecheler said companies might want to opt for a graph database for applications that used highly interrelated data that needs to be analysed or queried in real time.
"We help people by saying, 'Look at your applications and try to think about how important the relationships are in your data and put it on a spectrum from not at all to very much'. The closer you get to very much, the more likely it is that you should be using a graph database for your applications," he said.
"They have huge usability advantages because the data model is very close to the actual application model. Secondly, the query language is very suitable to query such heterogeneous and highly connected data."
According to Van Ryswyk, many uses for which firms are running Cassandra already contain a graph component.
"For example, recommendation engines. This can be for retailers, who are trying to offer you a product based on all this highly connected data that they've collected about you. Data about your patterns, buying habits and associations with other people, and I'm using all that to run an algorithm that's trying to decide what to offer you," he said.
Other areas where workloads lend themselves to graph databases include finance, social networks, fraud detection, ID and access, and healthcare.