Are open source databases dead?

Elasticsearch’s moving the core of its stack from Apache 2 to a more restricted license once again brought forth the question of whether open source databases have a future. But maybe we shouldn’t get so hung up on licensing.

Here we go again. Last week Elastic did yet another wave of relicensing, in this case, for the rest of the portfolio not covered by its previous changeover. The changes hit the family jewels: the Elasticsearch search engine and Kibana visualization, which changed from an open source Apache 2 license to the Server Side Public License (SSPL), a pseudo open source license introduced by MongoDB back in 2018. And not surprisingly, a few days later, AWS responded by announcing a fork. Want the gory details? ZDNet colleague Steven J. Vaughan-Nichols gave the rundown on the blowout and the blowback.

Our first response when hearing of the license change was that at least Elastic didn't invent yet another weird new license, and instead grabbed one off the shelf. Customer legal departments can rest easy that they won't have to vet yet another strange licensing offshoot. And secondly, after we started Googling all the details, cookie tracking triggered a wave of "save on the cost of Elasticsearch log analytics" ads from ChaosSearch.

It's the latest in a saga that refuses to end: can open source databases avoid becoming victims of their own success? It dates back to MongoDB's 2018 embrace of the SSPL, followed by RedisCockroachDB, and Confluent announcing their own quasi-open source licenses du jour. Meanwhile, stalwart MariaDB retained the classic GPL license to keep the so-called cloud predators away for its core engine, but also used BSL for some other parts of their platform such as MaxScale. There's so much sturm und drang in this field. Like here and here. But don't cry for usArgentina, most of these players have become unicorns or have successfully IPO'd.

We've revisited this saga several times in the past few years. One of our first ZDNet posts, almost five years ago, posited the question of whether open source was becoming the new default model for enterprise software. One could call it a simpler time, when one of Black Duck Software's annual Future of Open Source surveys indicated a doubling of open source use in the enterprise between 2010 – 2015 (Black Duck no longer conducts these surveys).

Open Core Model

Just after the outbreak of the mutant open source licenses, we wondered whether the open core model might be the answer to the industry's angst. Some, like Grafana, are continuing to use that model where the core virtualization engine is on an Apache 2 license, while plugs-ins for adapters to enterprise data sources are not. Nonetheless, others term open core bad rubbish. As if.

Despite all the angst, open source remains alive and well in certain spheres such as commodity infrastructure. Today, no one aside from Red Hat, SuSE, Ubuntu, and maybe a few others, competes on the flavor of Linux. And, in spite of the fact that Google dominates contributions to Kubernetes, the container orchestrator has become the emerging de facto standard for developing new cloud-native services. Open source is also alive and well in rapidly innovating areas such as AI frameworks. Practitioners, like data scientists, are eager to place popular open source projects on their resumes to keep their skills in demand.

But for databases, it might be tempting to view defections towards restrictive licensing as the death knell for open source. A few months back, Matt Asay, whose career in open source long predates his current stint heading open source strategy for AWS, posed this same question just after the Snowflake IPO. And he proffered that, first of all, even proprietary offerings like Snowflake leverage a lot of open source in their products. And secondly, referring to an exchange he had with former Cloudera head Mike Olson, the cloud has become the new means for differentiation.

We couldn't agree more with Olson. Just look at MongoDB. As of the latest quarter, Q3 FY2021, it reported that over 90% of its customer base is on its Atlas cloud service. Elastic Q2 FY2021 results saw SaaS revenues increasing 81% year over year, now accounting for nearly 30% of its business. Both may be leery about AWS poaching their business, but both offer one thing that AWS can't: their managed database cloud services work across all three clouds, while Amazon Elasticsearch Service and Amazon DocumentDB don't.

While licenses might protect products, you'll only have something to protect if customers are buying what you're selling. And in an era where cloud adoption is accelerating, getting SaaS right is arguably more important than all the legalese. That means mastering the blocking and tackling of delivering the right SLAs and delivering the operational simplification that a well-designed cloud service can offer. And that often means getting things like Kubernetes, security, and the customer experience right. Cloud IP mostly doesn't come from licensable code, but instead is rooted in infrastructure and operational expertise, sound architectural design, and design thinking.

Eternal Class Struggle?

Drilling down more closely, we see contrasting pictures across the database landscape. On one hand, you have evergreen projects like PostgreSQLCassandra, and more recently, Spark, thriving; the common thread with these projects is that they are all community-based. Meanwhile, formerly proprietary players like Splice Machine are embracing open source, while as noted above, the unicorns are dialing back. So, what gives?

A simplistic explanation is that we're witnessing a form of class divide. The haves, who have something to protect, vs. those aspiring to have status, opening up their code in the hopes of going viral.

Arguably the most successful poster child for a community-based open source project in the database world, PostgreSQL has been around practically forever, which in this case is 25 years and counting. One of the brainchildren of Michael Stonebraker, no single vendor controls the agenda, contributions are remarkably dispersed among a wide community, and the license is extremely liberal: just about the only restriction is a safe harbor clause for the University of California, where the project originated.

As a result, the database ecosystem is practically littered with an incredibly diverse crop of PostgreSQL offshoots from EnterpriseDB to Amazon RedshiftGreenplumNetezza, and a host of cloud services from the likes of AWSAzure, and GCP. And the flexibility of the PostgreSQL open source license has given AWS and Azure the rope to further innovate with hyperscale. In the PostgreSQL community, forks are not a dirty word. This community begat the skills base that in turn begat the vendors who had the ability to take PostgreSQL where they pleased. No wonder, a couple years back, we posed the question on whether the time has finally come for PostgreSQL.

As for those aspiring to become viral, you'll find names like Splice Machine, Yugabyte, Couchbase and others that view open source as a means of drawing developer attention. It's also a response to the issue that for an enterprise, committing to implementation from a small proprietary vendor carries greater risk because the underlying technology lives or dies by that vendor. Nonetheless, an open source project that is only commercialized by a single vendor likely carries a similar risk.

So, are we doomed to eternal class struggle where the haves wall off access the moment they get successful to keep cloud giants (read: AWS) from making land office money off their IP, and the market winds up with new forks? Or is there a third way?

Some, such as Asay, are calling for a rethinking of licensing, and maybe even revisiting shared source. While there is no single silver bullet, the gold standard for open source has been the phenomenal success of Red Hat. They've protected their IP through a policy of keeping the source code open source but maintaining their own hold on the binaries.

Cloudera has been a student of the Red Hat approach and adopted it following its merger with Hortonworks. In Cloudera's case, the zoo animals have remained open source, but the way that Cloudera packages them into software products, such as Shared Data Experience (SDX), are proprietary. Since going this route, Cloudera has had six straight quarters of growth and is getting close to breaking into the black in a market that many wrote off as dying. Red Hat and Cloudera's strategies make the case that, regardless of whether the software is open source, it's still the product (or cloud service) that counts.