GraphQL for databases: A layer for universal database access?
GraphQL is a query language mostly used to streamline access to REST APIs. Now, a new breed of GraphQL implementations wants to build an abstraction layer for any database on top of GraphQL, and it seems to be catching up.
Airbnb, Coursera, Docker, GitHub, Twitter, Uber, and, of course, Facebook, where it was invented. These are some of the organizations where people use GraphQL solutions, as presented in last week's GraphQL Europe, and if you're one to be impressed by name-dropping, this should get your attention.
GraphQL seems to be spreading like wildfire, and there's a reason for that. As REST APIs are proliferating, the promise of accessing them all through a single query language and hub, which is what GraphQL and GraphQL server implementations bring, is alluring.
REST APIs expose application functionality, and all applications use some database in the back end. So, a big part of those APIs is wrapping database CRUD (Create, Read, Update and Delete) operations.
Furthermore, databases may also expose APIs of their own for those CRUD operations. So the idea of using GraphQL for database CRUD operations comes as a natural next step, and there are a few initiatives working on that.
GraphQL for databases
PostGraphile, Prisma, and HyperGraphQL are different approaches at implementing a GraphQL abstraction layer for databases. Let's see what they are out to achieve, and how each of those tackles the issue. All these GraphQL database access layers are open source, but have different philosophies and ambitions.
Benjie Gillam, PostGraphile's creator, said: "The intention for PostGraphile is to be used in any situation where you have (or want) a PostgreSQL database and you need a GraphQL API, and its core focus is security, performance, and flexibility."
Johannes Schickling, Prisma's co-founder, said: "The right way to think about Prisma is as a productized version of the data layer implemented at large companies such as Twitter, Airbnb, and Facebook. As an open-source project, Prisma allows companies to start out with advanced technology that grows with them as they start building out a more advanced infrastructur."
Szymon Klarman, HyperGraphQL's original architect, said: "The motivation for HyperGraphQL comes down primarily to three goals. First, to equip the Semantic Web stack with a GraphQL-based query interface. Second, to facilitate federation of and querying across distributed linked data sources and services within the GraphQL framework. And third, to equip GraphQL with easy means of querying-linked data."
In a nutshell, PostGraphile works with Postgres, and is what Gillam's work is focused on, supported by Patreon and PostGraphile-related consulting work. Prisma works with MySQL, Postgres, and MongoDB, has more in its list, and just raised $4.5 million in a seed round. HyperGraphQL works with RDF graph databases, was developed as an open-source project during Klarman's stint with Semantic Integration Ltd, and is in somewhat of a flux at the moment.
There is another effort in this area, Hasura, but its people did not respond to requests for comment.
Is GraphQL for databases a glorified object-relational mapping layer?
For anyone familiar with object-relational mapping (ORM), the idea of adding a layer between the database and applications that use it should be familiar. ORMs let developers map tables in relational databases to domain objects, thus making it easier to work at a higher abstraction level, and offering a hub for database access that can add services such as caching to the mix. So, how similar are those GraphQL access layers to an ORM?
Schickling said, "Prisma is different from an ORM in that it is a dedicated infrastructure component. This allows Prisma to perform optimizations not normally possible for an ORM that is embedded in the application."
Gillam also said he does not think of PostGraphile, or GraphQL in general, as an ORM, but more as a "declarative data fetcher." GraphQL, he added, operates in terms of the "big picture" -- the client declares the full shape of the data/relations that it needs up front, then requests the full result set in one payload:
"This enables a sophisticated execution layer to understand all the requirements and resolve the request in the most efficient manner. PostGraphile does this by turning your GraphQL query into a single SQL query, leading to very efficient linear execution: Receive request, build SQL, execute SQL, send response.
By contrast, ORMs typically don't 'know' what data they need up front -- they discover their needs a bit at a time as the code reaches the relevant points during execution. This leads to frequent alternation between data fetching and code execution, causing increased latency and heavier load on various parts of the infrastructure, increasing the need for caching.
ORMs also tend to over-fetch data -- fetching columns or even whole rows that won't be needed. GraphQL, when used well, enables clients to easily eradicate these under-fetching and over-fetching woes."
So, both Gillam and Schickling seem aligned here. And it does make sense that batching a lot of queries would be more efficient than handling each one separately. We have to note, however, that ORMs if nothing else have been around for longer, so we can expect them to be more mature.
Klarman does not have much to add there, as ORMs for RDF graph databases do not exist (although other APIs to access them do). For HyperGraphQL, the emphasis is on providing an alternative for SPARQL that would be easier to work with for developers, while shifting the complexity of federated querying that SPARQL provides from the query level into the GraphQL server.
From the database to the application, or from the application to the database?
A significant recent addition to GraphQL was SDL, its schema definition language. SDL enables developers to define a schema governing interaction with the back-end that GraphQL servers can then implement and enforce. This also enables GraphQL for database solutions to work either from the database to the application, or the other way round.
Some people prefer creating a domain model and then generating a persistence layer from it, others prefer designing their database and then mapping it to their domain model. Both ways have their proponents and are supported in many ORMs. How do things work in the GraphQL world?
Schickling believes the GraphQL community is really coming together around the idea of schema-first development, and he sees GraphQL SDL as the foundation for all interfaces between systems:
"By having all important interfaces documented in a format that is easy to glance, it's much easier to talk about an entire system as a whole. We have also found that GraphQL SDL is a great format for domain experts and developers to experiment with the domain. Prisma specifically is focused on the data layer.
During development, Prisma automatically updates the underlying database to reflect changes in the SDL that represent your data storage model. We have generally found that it is most appropriate to work from the data storage first, up the stack to the final API. GraphQL bindings make it easy to compose a final API from one or more underlying storage APIs."
Gillam emphasizes the role of the database as the single source of truth for data and business logic for rapidly building maintainable applications, so he says his focus on PostGraphile is "taking your carefully designed database and building a sensible and useful GraphQL API from it.
With PostGraphile's watch mode any changes to the database are immediately reflected in GraphQL so you can ensure your GraphQL API is taking the shape you intend in real time. I prefer the monolithic database-led approach, but people also use PostGraphile in a micro-services architecture, where each instance has its own database. They combine individual services into a larger public GraphQL API using schema stitching and similar techniques.
There are users who use PostGraphile on AWS Lambda; others who use just the GraphQL schema directly without needing the HTTP middleware. There are even a few people who just use PostGraphile to keep their GraphQL SDL in sync with the database, building their resolvers by hand. We support all these use cases".
Klarman notes that HyperGraphQL doesn't affect the persistence data layer but only allows to create virtual views of the data. However, he adds, it could be useful in some scenarios to consider HyperGraphQL to be prior to the underlying RDF storage and use the HyperGraphQL schema to inform and drive the RDF persistence layer.
What about performance, and architecture?
Abstractions are nice and all, but a GraphQL for databases architecture seems to consist of many layers, with lots of HTTP requests in the mix as well, so this is something that makes one wonder what performance would be like. Schickling said this is something he hears often, but is quick to dismiss.
He said, "When you look at the actual performance implications of a network hop, it's pretty clear that this is not a real concern. A network hop inside an AWS region is sub-ms and much lower still if you deploy to a VPC".
Schickling's main point is that companies like Airbnb have multiple layers in their application, and the added network overhead is more than outweighed by the performance gains from a better architecture. He does add, however, that there is currently an effort to bring gRPC to GraphQL to minimize the serialization overhead with GraphQL.
PostGraphile has a different architecture compared to Prisma. While Prisma sits between a GraphQL server and a database, a PostGraphile stack typically consists of just your website/app, PostGraphile and your DB, said Gillam. He added that this is one of the things users love about it, and it contributes to performance and low latency:
"Typically, resolving the single GraphQL query for a view in an app involves just one HTTP request and a small SQL transaction with one main query -- just two layers! The ability to compile the GraphQL request into a single SQL query sets it apart from many other solutions, and means that, for non-trivial queries (those with a couple of nested relations), it's incredibly fast compared to systems that use DataLoader or similar techniques.
Klarman said, "HyperGraphQL queries are initially rewritten into minimum possible number of queries to other services (SPARQL endpoints or other HGQL servers) that are required to fetch all the relevant data. This way the number of necessary HTTP round trips is as low as possible, which is not generally the case with the basic GraphQL framework."
In terms of future directions, Klarman said he is considering extending HyperGraphQL to other graph databases and languages, like Cypher, but the primary focus remains on linked data. He cites extending the expressiveness of the query language with basic filters, introducing support for CRUD operations beyond basic GET queries, and developing techniques to exchange and semantically interpret schemas and delegate subqueries as key goals.
GraphQL as a layer for universal database access?
Recently, Gillam posted an ad-hoc comparison of PostGraphile to Prisma. Although not an apples to apples comparison, as he noted himself, it raises an interesting question. How do GraphQL database solutions compare, not necessarily in terms of performance, but more in terms of their overall positioning and roadmap?
Schickling said that supporting multiple databases is one of the core tenants of Prisma:
"As an application developer, you should be able to rely on a single coherent interface to your application data, no matter if you need fast key-value lookup, customizable full-text search of scale-out document storage.
You should be able to rely on the data layer seamlessly synchronizing data between the different models, so you can access the data in the most efficient way for your use case, without having to write and maintain a complex data transformation pipeline. It is this vision that makes Prisma uniquely different from a traditional ORM, and this is why it makes sense for Prisma to support a broad variety of data stores.
Over the last few months, we have refactored our query resolution engine to support connectors for multiple databases, and you should expect us to add support for many more databases throughout the rest of the year."
Gillam noted that PostGraphile has a different focus:
"We don't offer a BaaS, so we're incentivized to make self-hosting PostGraphile the best option available to you. We have a focus on your database being the single source of truth for everything: Data, business logic and authorization.
This means you're trusting the well established industry-approved database Postgres for the security of your data, rather a startup or rolling your own. I want PostGraphile to be the best solution for GraphQL APIs backed by a Postgres database.
Open source is very dear to me, and I have big plans for PostGraphile. Right now, I'm focussing on growing the project steadily and making work on it sustainable. Although it's exciting to think how fast the project would advance if I could work on it full time, I'm not interested in VC investment at the moment.
Asked to comment on the vision for GraphQL as a universal access layer for databases, Gillam notes that mapping different databases to GraphQL is viable, and stitching GraphQL APIs together is already a fairly widely used technique:
"I'm interested to see how well Prisma manages to pull off consistent interfaces for these significantly different databases, and what sacrifices users have to make to achieve these goals. But they are smart folk and have a fair amount of resources to throw at the problem, so yes, I think they probably can make it work."