The present profusion of NoSQL databases can't last. But while it persists, two of the problems it gives businesses are finding skills for all the technologies they're trying out and then managing the resulting complex vendor relationships, according to Dave McCrory, CTO of Riak company Basho.
At the moment companies often use specific databases - graph, in-memory, key-value and object-store - for specific jobs and specific data constructs, and that trend will continue for the next few years. But firms are already starting to bridle at the prospect of running so many technologies.
"Companies are trying a bunch of different things right now. They're seeing what seems to work better and what doesn't. But they're going to converge on a small number of more broadly usable solutions," McCrory said.
"That means that some databases will go away, some will be acquired and some will simply end up going out of business."
Of the 250-plus technologies listed in the DB-Engines rankings of databases by popularity, slightly more than half are not relational and so fall into the NoSQL camp.
Last month database firm Basho, which created Riak and continues to develop it as an open-source project, unveiled the latest release of the distributed NoSQL key-value data store, which it describes as having improved write performance of between 50 and 150 percent.
McCrory sees his previous position at Warner Music Group, running Cassandra, Hazelcast, OrientDB, RabbitMQ and Elasticsearch, as epitomising the issues raised by an abundance of technology.
"You needed three people who understood each of those to be able to effectively manage and maintain them and troubleshoot them if a problem arose," he said.
"What I've seen in the past has been is if you try to take on six of these [technologies], you need a staff of 18 people minimum just to operate the storage side - say, six storage technologies. That's not scalable and it's too expensive. So there has to be some type of standardisation or something of that sort."
Of course, more tools will spring up to automate that management, making it easier for the next-generation database admin to look at data, do some modelling, and also tune the systems to fit the data and get the data to fit the systems.
"There will be software that bubbles up that will reduce the levels of skills. That's what has to happen. Once you hit that early majority, the demand for tools rises, and I saw that back to my virtualisation days," he said.
"When I started out, the command line was all you really used. The graphical interface for VMware was a web page and it was terrible. Then they came out with the administrative VI Admin. It lacked features but it still became the primary interface that people would use. Eventually, as they got to version 3, 4 and 5, it got better and the skillset required to manage a system like that became less and less."
However, computer history shows that such tools will not remove the need for less complexity.
"If you think about classic enterprises, you might have had the expert in Microsoft SQL Server and the expert in Oracle. They weren't the same person generally but you only needed one of them, maybe two. You didn't need an army of them and you didn't have literally dozens of different types," he said.
"If a company did have several systems, they would always have an effort, 'How do we combine many of these, so we're down to three or something like that'. That was always the pattern. We're going to see the same thing again."
The other force coming into play in thinning out the abundance of database technologies appears with the shift from development into production.
"We're at the point where we're seeing the early users move to critical production workloads on NoSQL technologies, and the CTO and in some cases the CEO are being involved, 'If I'm going to bet my business on this, I'm not just going to leave it up to the developers to make this choice for me'," McCrory said.
"They're actually finally ready to move this thing into production. They throw a production-class workload at it and they start to try to see operationally what it will take to run this across five or three datacentres at massive scale. They find it's going to take quite a lot - and it's going to be very expensive.
"The software might not have been expensive. But to get everything running and keep it running is going to be incredibly expensive. That's when higher powers become involved."
More on databases and big data
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release