SQL and NoSQL? Fine, but how does the hybrid database fit in?

The idea of running transactions and analytics on the same database is not new but it's been held back by technology. But things in the hybrid world have been changing.
Written by Toby Wolpe, Contributor

Earlier this year, analyst firm Gartner came up with a name for a category of hybrid processing that it believes will cause upheaval in established architectures.

The concept of HTAP — or hybrid transaction/analytical processing — has been around since the early days of computing. But only relatively recently have new architectures and in-memory technologies made it possible to run transactions and analytics on the same database.

Gartner reckons HTAP addresses four major drawbacks of traditional approaches. First, for analytics in HTAP, data doesn't have to move from operational databases to data warehouses.

Secondly, transactional data is readily available for analytics when created and, thirdly, drill-down from analytic aggregates always points to fresh HTAP application data. Finally, you eliminate or at least cut the need for multiple copies of the same data.

Gartner's coinage of the HTAP term marks an important moment for the hybrid approach, according to Clustrix CEO Robin Purohit, whose company produces the ClustrixDB SQL database, which can simultaneously run large transaction volumes and real-time analytics.

"While it's still early for the enterprise, the fact that Gartner is identifying this as a key workload — articulating that it is now possible where it wasn't — is really going to help," he said.

"Certainly for us as a company but also just to educate the market that this is an important new direction for the database landscape."

Purohit conceded that that landscape may appear confusing for companies, given the sheer choice of databases and approaches available.

"It is really hard right now but it's both hard as well as an opportunity," he said.

"Customers are realising that there's a sea change going on from the traditional scale-up relational database to a set of options that are scale-out and well matched to the commodity compute cloud model."

Scale-up entails using larger, costlier boxes while scale-out employs sophisticated software to cluster cheaper, commodity hardware.

"That's where everybody is spending their time, saying, 'OK, what are the options if I want to build for scale and on commodity compute for scale?" Purohit said.

"The reality is there are a couple of NoSQL variants that handle different types of data and workloads really well. There is the Hadoop ecosystem, which handles these very large-scale analytics very well.

"Then there are folks like us who are tackling the problem in scaling the traditional OLTP database and adding some value, whether it's on disaster recovery or analytics capability on that same live data."

Purohit said he sees most customers building for scale picking elements of all three of these data management building blocks and laying them side by side.

"Every customer we have has a Hadoop cluster or something very like it and they all have a document database, whether it's Couchbase or MongoDB or something like that," he said.

"Sometimes they even have Cassandra, if they have more of an ingestion-oriented application. So we try to focus on what part of the problem we can solve uniquely well, and the customer is very comfortable putting two or three pieces down in the data management layer and then building the best application that they can.

"You're going to want to use the right data platform for the right job, and it will be a set of things that enable that rather than one particular database — even ours. We will not be able to solve every problem, nor will MongoDB, nor will Cloudera's version of Hadoop."

The hybrid approach adopted by Clustrix is based on a redesign of the relational database for horizontal cloud architecture, combining a parallel engine for distributing data with a parallel engine that can distribute simple and complex analytic queries to all the resources in a cluster, process it concurrently and aggregate it back up.

Purohit said the result is horizontal scale combined with a full relational database model that supports extremely high concurrency OLTP applications and allows all that data to be analysed in place using classic data warehouse style analytics with no extract, transform, or load.

"So things were taken from the data warehouse world and some things that are similar to the NoSQL camp of horizontal scale but keeping full ACID properties of the database," he said.

The company originally sold a database appliance but last year moved to a purely software business model with subscription and freemium options, including free developer versions for use on smaller clusters.

MassiveMedia's dating site Twoo.com is Clustrix's largest customer, with a cluster of 336 processor cores running a single database with a hybrid workload. Clustrix describes Twoo.com as the largest the largest scale-out SQL deployment in the world.

"We've busted the myth that SQL cannot scale. Certainly, it's true that the traditional relational databases are trapped by their scale-up approach and they've never been able to crack it using multiple servers to get the new scale but that's the problem we cracked," Purohit said.

Other customers include AOL, Rakuten, Photobox and nomorerack.

Having always recommended a combination of flash storage for price performance and memory to customers with terabyte or larger databases, Clustrix has been working on improving its database's in-memory capability.

The company is providing an option that will allow certain tables to be targeted for running only in RAM with expected 'several-fold' increase in speed.

"That's the problem today with in-memory solutions. Because there are some things you have to work around, the application has to be optimised and designed for an in-memory solution," Purohit said.

"You see Oracle doing that to get the most out of their traditional database by putting it in memory and then changing all their own applications optimised for that model. SAP is doing the same thing.

"But in-memory should be a technology everybody can use. You shouldn't have to think about it when you're designing for it. The database should be smart enough to put it in the right place. Then as a database architect you can say, 'Here're a few tables that are going to be targeted for, let's say, very high speed ingestion' and you flag them as optimised for in-memory and then we'll handle the rest."

Purohit said the major changes underway in the database and analytics fields are leading to most companies experimenting with various approaches.

"They're all going into evaluation and tyre-kicking mode. It's a big deal when you move away from the relational, scale-up model that you've been successful with for 30 years," he said.

"Everybody can feel it breaking but traditionally it's been one of those things — like nobody gets fired for buying the leader.

"But the combination now of the new data volume and types and the sheer complexity and speed of interaction of that data is causing them to look for something new.

"There is going to be definitely a sea change from scale-up to scale-out and that will be by use the dominant model even in the enterprise for any new application."

Read more on big data

Editorial standards