If you were an investor, would you give a mid-sized startup comprised fully of engineers and sporting an open source product $11.5 million to pursue a singular path amidst heavy competition? You might, if it was in the hottest area of data management, was founded by an ex-Googler with deep expertise on the topic, and had a few Fortune 500 customers already.
That's what Redpoint Ventures, with participation from previous investors Bain Capital Ventures, Blackbird Ventures, Grok Ventures and Airtree Ventures just did for Dgraph, Manish Jain's open source graph database startup. During his time at Google, Jain led the knowledge graph serving system effort.
Though we don't usually cover funding rounds, this one was an opportunity to connect with Jain. Dgraph and Jain both have a sort of undercurrent reputation in the graph database world, and the conversation did not disappoint.
Graph databases and Dgraph
First off, even if you have not been following us (we've been in the graph database space earlier than most), you may have noticed when the Gartner oracles spoke graph earlier this year:
"The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science.
Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos, but the need for specialized skills has limited their adoption to date.
Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries".
Amen to that, and even if you did not take note, VCs apparently did. The conversation with Jain started off talking money. Before today, Dgraph had raised a total of $3M in seed and pre-seed rounds. Jain estimates this round A funding should give them a runway till 2022. Depending on how the team grows, that is, which was a good way to pick up on the conversation.
Currently, Dgraph has something like a dozen engineers spread over 2 locations - San Francisco and Bangalore. That's including Jain, who is a very hands-on CEO. The plan is to roughly double the team till end of year, but the big question there is "in what way".
Though we can relate to how all-engineer teams can get tons of work done in a no-fluff-just-stuff way, there is an obvious problem with that approach: how will people in the enterprise get to buy your toy without sales, marketing and the like?
Do they just stumble upon it in the wild, and they start using it, then like it so much they decide they need to pay for the enterprise version, and come knocking on your door? That's pretty much how it's been working for Dgraph, according to Jain.
Dgraph, Jain said, has been sustainable, and has a few enterprise customers, though names can't be disclosed. Jain did mention the goal is to ramp up the team and build a repeatable sales process, but the hard-core developer-centric approach won't change. So it's worth checking what is it that's been getting Dgraph sales without a sales team.
First off, the obvious: Dgraph is open source, so everyone can just download and start using it immediately. At this point, though there are Startup and Enterprise versions of the open source software, there's not much difference in terms of features, except (crucially) support and cluster size. That's about to change though, with Dgraph prioritizing development of more Enterprise features.
Another key choice made by Dgraph is to meet the bulk of developers building applications where they are. And that, according to Jain, is in JSON and GraphQL land. Though GraphQL is getting to be where JSON already is -- a de facto standard -- there is one problem: GraphQL, in what is probably one of the most confusing misnomers we've seen, is NOT a graph query language.
As a reminder, in the graph database world, there are a few camps and more query languages. RDF graph databases use SPARQL, which like RDF is a W3C standard. Property graph databases are not standardized, and come with a multitude of query languages. Most popular among them are Cypher, open sourced by Neo4j, and Gremlin, part of the open source Apache Tinkerpop project.
Though Jain said Dgraph may add support for Cypher and/or Gremlin at some point, he was adamant -- he's not very fond of them. So instead of adopting one of these, Dgraph extended GraphQL to actually make it work with graphs, naming it GraphQL+. Confusing, no doubt. And it gets worse.
Recently the W3C has initiated a process to standardize property graph data model and query languages, too. Though it's just the beginning, this could lead to 2 big graph database camps (RDF and property graph) with full interoperability among offerings in each camp, potentially also between camps to some extent. Why would Dgraph want to be the odd one out? Would they not benefit from standardization, too?
The way to do this would be to have GraphQL+ extensions adopted by GraphQL. When asked, Jain mentioned he did discuss this with Lee Byron, who is credited with co-creating GraphQL at Facebook. Byron did not express any interest in this, and Jain acknowledged getting GraphQL+ standardized as part of GraphQL is not very likely. That did not seem to bother him much, however.
Though it's clear that having a standard in SQL has been instrumental in relational database adoption, Jain said he does not see much enthusiasm around graph database standards. He went one step further actually, mentioning he does not even really see Dgraph as a graph database, but rather as a general purpose database with a graph back-end.
Something like StackOverflow, Jain said, could very well be built on Dgraph. This is why Jain is placing his bet on GraphQL+, and is not really keen on pushing Dgraph as a graph database, but rather as a "powerful solution for application building".
A singular path to glory?
That's all fine and well, but Dgraph is not the only database that supports GraphQL - not even the only graph database to do this. Fauna and Neo4j's GRAND stack come to mind, and the list of databases leveraging GraphQL seems set to grow. So, if you want a database with GraphQL support, why choose Dgraph?
Because, as per Jain, Dgraph is the fastest, more scalable option: "Dgraph is designed to execute equally well on graph style joins and traversals and SQL style selects, providing the only truly scalable general purpose graph database available today."
Products that emphasize performance and scalability in this way typically make sure they circulate a few benchmarks to back their claims. Though Jain was not shy of citing benchmarks done by clients, claiming they showed Dgraph to be 10 times or more faster than other options (naming names, which we won't), there is nothing to show for this at this point. Jain said Dgraph may release some benchmarks in the future, with the usual caveat around benchmark complexity and reliability.
Dgraph boasts "low-latency arbitrary-depth joins and traversals, Jepsen tested distributed transactions, data sharding and synchronous replication support and a flexible schema". Jain said this is what people are looking for in a database, and the plan is to use the funding to complete Dgraph's enterprise features.
Jain also said they will be building a Dgraph managed service to run on all clouds, based on their existing support for Kubernetes. In terms of strategy, they prioritize greenfield projects, and they'd rather get a piece of the (much bigger) relational and NoSQL database market, than go after the leaders in the graph database market.
Much of the above does not only make sense, but also sounds pretty impressive. We do, however, maintain a healthy skepticism. Dgraph makes some claims which require self-verification, and we don't know how many people can / will do this. Plus, executing on that singular path to glory may be a tall order for a startup at the stage where Dgraph is currently.
Dgraph is definitely worth keeping an eye on. Whether its singular path to graph glory will turn out to be the shortest one, remains to be seen.