Fluree, the graph database with blockchain inside, goes open source

A hitherto under the radar graph database that uses blockchain to support data lineage and verification wants to take over the world, starting with the US Department of Defense

Fluree is not a very straightforward product to get. To some extent, that goes for all data management systems. More so for graph databases. Even more so for blockchain-based systems.

Fluree combines a graph database with blockchain. And it just switched to open source, after scoring a $1.5 million seed extension round as part of its DoD contract. ZDNet caught up with Brian Platz, Fluree co-founder and co-CEO, to try and unpack all that.

Meet Fluree

Fluree was founded in 2016 by Platz and veteran entrepreneur Flip Filipowski. Platz said he and Filipowski have been working together for about 20 years now. Managing data more effectively, not only for their products but also when working for customers and seeing the struggles that they had, has been top of mind for them, said Platz:

"We looked at the landscape. We saw some exciting momentum around blockchain technologies, which we thought could add value around data in securing the integrity of information, something that surprisingly we don't have a lot of today.

And then we've also been very excited about semantic web, semantic graph technologies and its ability to connect data like the internet-connected information. Those two, combined with some other things that we could look at, really have an opportunity to take data management to a whole new level."

flureedatacentric.png

Fluree has an ambitious vision: to support the switch to a data-centric view, as opposed to an application-centric one. Image: Fluree

The team spent years building, and in some cases rebuilding, the product. In 2018 a beta version was released, followed by a community edition, now counting 15.000 users. In 2019 a commercial version was released, adding extra features, services, and support to the community edition. Platz said about 60% of the company is R&D.

Fluree has managed to attract customers with its commercial version. Fluree's largest customer is the Department of Defense, US Air Force. Fluree touts itself as the Web3 Data Platform -- a semantic graph database that guarantees data integrity, facilitates secure data sharing, and powers connected data insights, all in one pluggable stack. What could the DoD do with that?

Platz said Fluree solves some pressing pain points that most organizations have, but they're more acute in certain scenarios where information is deemed highly secure or very important. Data is used in an automated way where provenance becomes important, and there can be very serious consequences if mistakes or security incidents happen. This is where the blockchain aspect comes in - but more about that in a while.

Fluree was self-funded until about a year ago when they raised a seed round of funding of $5 million. An interesting side-effect of having a contract with the DoD is that it mandated that Fluree raised an additional $1.5 million. The DoD wants to make sure that there's commercial interest in the products its contracts involve. Since this is hard to prove for small companies, the DoD mandates raising capital.

Going open source

Platz said they have put a lot of effort to make it easy for people to start using Fluree. The promise is that you do not need to know anything about blockchain or cryptography, those are features that you can tap into as you start to care about them:

"People can download Fluree on their laptops, run it as a single node, form a consensus of one machine automatically. You can put some data in and you could start building a React app or something that's using GraphQL and you can literally do that in like 20 minutes."

Fluree wants to minimize obstacles for organizations thinking about adopting Fluree. For this reason, today Fluree is going open source. Platz mentioned that they tried to address the perceived risk they realize comes with adopting a product like Fluree by adopting two main strategies. Standards is the first one.

flureelight-hero2x.jpg

Fluree is a RDF graph database that brings a number of capabilities to the table, including data lineage and authenticity powered by blockchain

Releasing the product under the AGPL license today is the second. AGPL has been chosen as it is known to be avoided by cloud vendors who tend to offer open source products as a service, competing with the vendors that build them.

The Fluree crew is not experienced in open source, and Platz acknowledged it took quite some work to get to a point where they felt confident enough to release. Releasing a product as open source comes with scrutiny, responsibility, and increased demand for documentation and community support.

Blockchain inside

Part of the reason for Fluree's complexity is its use of blockchain. Using blockchain was not a goal in itself, but it plays a role in the integrity of data and how data can be proven to external parties, said Platz. It's a side effect of the need to increasingly share data with other parties. This is typically done by building custom APIs today, but also with emerging technologies like AI, Platz went on to add:

"While we as humans can make judgment calls about the information we're receiving when we're making decisions, machines -- A.I. in particular -- has no good ability of making similar judgment calls about the data it is operating on, and being able to prove that the data hasn't been tampered with."

Platz thinks data lineage, aka how data originated, is going to be a very important issue as AI and machine learning makes more decisions automatically -- especially when those decisions have implications. Auditing is needed, and auditing costs. Why not leverage the ability to record and prove data lineage and integrity that blockchains provide, the thinking goes.

Fluree uses what is known as a permissioned blockchain, which means it's a controlled network within an organization, not one that anyone can join. Part of the reason is that this is a fit for the use cases that Fluree is targeting. Permissioned blockchains are faster and more reliable, said Platz:

"When you're dealing with data and integrity of information, you have to be able to guarantee both. So to be able to do that in a very fast way, really a permissioned [blockchain] network is the only good way of doing that."

A graph database with a twist

Fluree is an RDF graph database. Platz referred to the beauty of graph, how it facilitates complex queries that relational databases can't easily handle, as well as the fact that it's a universal data format. It was an obvious choice for Fluree, with the downside being that it's a newer technology that not as many people are familiar with.

Although "GraphQL has nothing to do with really specifically a query language or even a graph database", the shift toward GraphQL as an API interface is getting people familiar with how graph works, whether they know it or not, Platz thinks. Fluree also supports GraphQL. When a schema is set up in Fluree, a GraphQL interface for it is automatically exposed.

That works for finding a specific piece of information and then maybe crawling relationships and getting data out in a tree. Oftentimes that's exactly what people need to do, especially if they're powering a front=end app. But to get to the real power of graph for queries, you need a real graph query language. Fluree has chosen SPARQL, the standard query language for RDF graph databases, but with a twist.

flureestack.png

Fluree's technology stack. Image: Fluree

Fluree offers a SPARQL interface to be able to query, as well as its own JSON-based query language called FlureeQL, in addition to GraphQL. It's in the process of adding a SQL interface too, Platz said. This is all in line with the ongoing evolution of graph databases: From databases to platforms, and from one query language to many.

Under the hood, however, everything gets translated to FlureeQL. FlureeQL is SPARQL and JSON plus some other things, as Platz put it. Those other things include the ability to crawl graphs or query for specific points in time in the past. The latter is a capability specific to Fluree, the former is just not supported well enough in SPARQL, said Platz.

RDF graph databases are also called triple stores. This is because the RDF graph data model is expressed in triples. Over time, some vendors have extended this to quadruples, to be able to store additional information such as metadata about the graph to which a triple belongs. Fluree has extended this to 6-tuples, enabling it, among other things, to support the property graph data model.

Another option for open source fans

Fluree represents an interesting offering. It looked to us that the combination of elements it includes make it particularly suitable for data integration and data virtualization scenarios with a twist. Those scenarios are something RDF graph databases are generally good at. The twist here is the data lineage and authenticity capabilities that Fluree brings to the table. It also looked like Fluree is a complex product, at least conceptually.

Platz concurred on both. He emphasized the unique aspects of Fluree, while mentioning that they have worked hard to make the product more approachable. As of today, developers are free to check for themselves, and those looking for an open source graph database have another option.