Hadoop big data analytics: Can SQL support bring it to the masses?

For businesses that want to run analytics on low-cost Hadoop clusters, fully compliant SQL is now possible natively on the big data platform, according to Actian.
Written by Toby Wolpe, Contributor
Actian CEO Steve Shine: It's actually quite hard to get things done in MapReduce. Image: Actian

Actian is describing the newly announced Hadoop Edition of its analytics software as the first platform with full SQL support designed to run entirely natively on the distributed big-data framework.

The company, formerly Ingres, says the software combines high-performance SQL with its visual dataflow framework, all running natively in Hadoop via the YARN resource-management layer.

"What Actian seems to be saying is, 'We can put full SQL calls on top of Hadoop. We can deal with JSON- and BSON-style documents'," senior researcher at analyst firm Quocirca Clive Longbottom said.

"'So we can add all the metadata that is very difficult to deal with in a SQL environment, because we have all the capabilities of a schema-less environment'.

"'But because of the way we're building around all of this, we can also deal with it as if it were a standard column-and-row relational database'."

As a result, existing applications, which are still likely to be using a SQL relational database, can bring data through as required into the Hadoop environment, using existing SQL tools to interrogate and report on that data.

"They also say it's bi-directional, so if you make any changes to the data, respecting all the security of the application and the existing relational database, it can change data within that database as well — so it's not a silo on its own," Longbottom said.

Actian's Hadoop Edition also means the less relational, schema-less environments can be dispensed with because the workloads can now be run in Hadoop.

"As developers start to look at writing new applications, they'll go, 'Well, what's the point of writing to an Oracle, SQL Server or DB2 environment when we get the scale capabilities of an HDFS [Hadoop Distributed File System] platform?'," Longbottom said.

"'Let's write directly to that, being a SQL-type database, where we then put big-data analytics on top of the SQL capabilities of Actian Hadoop. So I find it quite exiting, which is sad really."

According to Actian, the new software places the company's X100 vector processing engine on each of the nodes in a Hadoop cluster to provide end-to-end analytic processing and accelerate processes from data blending and enrichment to analytic computation and operational business intelligence.

The analytic process offers built-in security, ACID compliance, Actian said, together with full SQL support and libraries of analytics functions and is designed to offer increased speed and easier management.

Actian CEO Steve Shine said a lot of companies are pursuing the idea of SQL on Hadoop, driven by demand from the user community.

"The key point is why? What's the point of chasing that? Because it's actually quite hard to get things done in MapReduce" Shine said.

"The moment you want to ask a specific question of the data, there are millions of SQL programmers and they are paid appropriately — they are affordable. Then you have a fraction of that, which is MapReduce skills and they come at an extremely high cost."

Quocirca's Clive Longbottom said Actian is saying companies will be able to put tools that talk SQL on top of Actian on Hadoop, and they will continue to run without any major work being done.

"Again, it's maintaining the skills because trying to get good Hadoop skills is difficult. There are very few of them around and they're tending to be picked up by the vendors rather than by commercial entities," Longbottom said.

"But there are a lot of SQL skills out there. So it gives you the chance to be able to carry on getting value out of them without having to invest in retraining and finding that you've turned them into a Hadoop expert and they've gone and worked for a vendor instead."

Actian CEO Steve Shine said his company's approach to putting SQL on top of Hadoop differs from that adopted by some other companies.

"If you look at the approaches of the Hadoop vendors, their approach is they're trying to build a database on top of the file system. They've got to start from scratch to build ACID compliance, SQL compliance, with security, with user controls and access capabilities — it's not cheap to build a database. It takes years and years and years," Shine said

"We came the other way. We've spent the 95 percent of effort on getting a really high-performance fully SQL-compliant, ACID-compliant database and then what we've done is we've made it work natively within the Hadoop environment."

Quocirca's Clive Longbottom said previous attempts to put SQL on Hadoop have not been comprehensive.

"What we've seen in the past has been a subset of just a few SQL calls," he said.

"It hasn't been a case of here you go here's the SQL library and you will get near enough all of this. It's a case of here are the top 10 calls. Great, you've now got SQL access through to Hadoop."

Longbottom said the success of Actian's new software will hinge on how simple it is to set up and integrate.

"Actian has got a history in integration anyway, so it should be able to integrate into existing environments pretty simply," he said.

"But if you find that DBAs look at this and say, 'Hang on. This has no similarities between what we're doing already and what you now want us to do', they'll keep away from it. The proof is always in the pudding and in the detail."

More on big data and Hadoop

Editorial standards