Thinking big data: Oracle embraces Hadoop and NoSQL to target unstructured data

Searching for those interesting nuggets...
Written by Jo Best, Contributor

Searching for those interesting nuggets...

Andy Mendelsohn

Andy Mendelsohn discussed the Big Data Appliance at OpenWorld, saying "you have to sift through to find those interesting nuggets"Photo: Oracle PR

Oracle is making a play for the 'big data' market - capturing, storing and analysing the vast amounts of unstructured data that businesses gather every day.

At Oracle's OpenWorld event in San Francisco this week, the company took the wraps off an appliance aimed at tackling unstructured data - information gathered from sources such as social media, email, website content and logs, location information, video files and sensor data.

"Most of [such data] is not very interesting. A lot of it is low-value, low information-density data that you have to sift through to find those interesting nuggets," Andy Mendelsohn, Oracle's SVP of database development told OpenWorld.

Oracle's Big Data Appliance, unveiled at the event, uses a software stack of both open-source and proprietary elements to analyse and organise big data in the hope of distilling it down into business intelligence.

The Big Data Appliance uses an Apache distribution of distributed computing platform Hadoop for the distillation.

Hadoop's MapReduce programming model and software framework in particular comes into play with the Big Data Appliance, by allowing the creation of applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes.

"Hadoop, as it currently stands, is a niche tech. Lots of people are talking about it [but] there are a few very sophisticated customers [using it] - the usual suspects with sophisticated programmers," Mendelsohn said, such as banks using algorithmic trading.

"In order to get Hadoop out of that niche, we need to automate or mass-produce code," he added - which is where Oracle's Data Integrator (ODI) comes in. ODI's Oracle Loader for Hadoop allows Oracle 11-g friendly datasets to be created through Hadoop MapReduce, while the Application Adapter for Hadoop is used to simplify data integration from Hadoop and an Oracle Database.

The database that features on the Big Data Appliance is Oracle's own take on NoSQL, based on the open-source Berkeley DB - which Mendelsohn described as "probably the most popular key value store out there today". Oracle has turned it from a single index into a distributed key value store for use in the Big Data Appliance.

The appliance will also carry an Oracle version of the R statistical environment, called Oracle R Enterprise. Traditionally, R was used client-side, accessing data held on an individual laptop. Oracle's R means users can run traditional R programs but now get their data out of data warehouses.

As might be expected, Oracle is pushing the Big Data Appliance as working in conjunction with its other engineered systems and appliances, such as its database machine Exadata.

According to Angela Eager, research director at analyst house TechMarketView, the appliance market is heating up.

"While enterprise-class appliances are an untapped area due to their complexity If Oracle's engineered approach can do something about this issue it will be on to a winner but so far there are precious few details about either appliance in terms of business scenarios. Hardware sales have been declining over the past couple of quarters as Oracle gets to grips with the business and switches to a high-end focus. It will be interesting to see where the appliances fit in.... Watch out for a wave of appliances and software/hardware vendor partnerships and others look to seize the appliance opportunity too," she wrote in an research note.

Providing companies with business intelligence gleaned from unstructured data sources is a nascent market and one attracting the attention of several of tech's biggest names, including Oracle and EMC, which is also targeting the same market with products from Greenplum, a company it acquired in 2010.

However, due to its novelty, much around big data is still up for grabs.

"This is a huge new market - neither one of [Oracle or EMC] could necessarily look at it and say, 'This is how it will work out over time'... It's way too presumptive to say we've got that all figured out," EMC's COO of information infrastructure products Pat Gelsinger said at OpenWorld.

According to Gelsinger, there's a period of turbulence ahead for big data.

"There's going to be a very chaotic phase figuring out how the stacks and pieces will fit together. There is so much activity in this space," he said.

Editorial standards