Oracle has introduced its Big Data Appliance, an integrated hardware and software system designed to help businesses work with unstructured data such as social-media posts and email.
The Exa-series device, announced at Oracle OpenWorld on Monday, comes with a new NoSQL database and tools to use the popular Hadoop data-processing framework with Oracle's own database technologies. Together, the hardware and software create a platform for processing big data, which are sprawling datasets made of structured, semi-structured and unstructured data from a variety of sources.
The Oracle-created NoSQL database and the Hadoop tools add up to "a set of unique software products" that will help with the analysis of big data, Thomas Kurian, the company's head of product development, said in a keynote speech at the event in San Francisco.
The package is aimed at helping businesses deal with fast-growing amounts of data, including machine-generated and social-media data, according to Oracle. Kurian envisioned customers using the package to pull data from the web and offline databases, then analyse it and perhaps feed it into the just-announced Exalytics appliance for further visualisation.
"If you have a large dataset that you're processing, for example, taking weblogs off a high-performance web farm, you can take those logs off the farm and persist them in the [NoSQL] database as key value pairs," Kurian said. "Finally, Exalytics can be used to take data out of the machine and provide analytic dashboards and reports."
The Oracle Big Data Appliance is essentially an Exadata appliance retooled with the new software designed for processing and analysing big data, Kurian said.
As part of Oracle's converged infrastructure strategy, the appliance gains performance advantages when paired with other Oracle hardware, Kurian said. For example, it can be hooked into an Exalytics appliance to provide visualisations of the data processed on it, or have data fed to it from an Exadata appliance.
In fact, Oracle expects the Big Data Appliance to be paired with an Exadata system in "99 percent of use cases", Andy Mendelsohn, head of Oracle's database and server technologies division, said in a press briefing on Monday.
Over the next three-to-five years, people will make Hadoop more accessible to less-sophisticated development organisations.– Andy Mendelsohn, Oracle
The Big Data Appliance will launch alongside the new software frameworks for processing big data. These are the Oracle NoSQL Database, the Oracle Data Integrator Application Adapter for Hadoop, Oracle Tools for Hadoop, and the Oracle Loader for Hadoop. There will also be a version of popular open-source statistical analysis framework R called Oracle R Enterprise, Kurian said.
NoSQL databases, such as MongoDB or Cassandra, differ from standard relational databases, such as the Oracle-backed MySQL, in that they work with more flexible taxonomies for classifying data and depend less on the row and columnar architecture of typical databases.
The Oracle NoSQL Database is based on Oracle's open-source BerkeleyDB database, Mendelsohn said. There are plans for the NoSQL Database to be distributed in open-source and closed-source versions, he added, but would not give dates or prices.
Oracle describes the NoSQL software in a technical overview as a "distributed key-value database" that is "designed to provide highly reliable, scalable and available data storage across a configurable set of systems that function as storage nodes." It can be administered via a web-based portal as well as via a typical command-line interface.
However, Oracle has been conflicted about NoSQL databases and their worth: in May, it published a whitepaper called Debunking the NoSQL hype, which said NoSQL databases incur greater hardware costs than Oracle databases because they require more servers and storage arrays to get the same level of work done, due to their distributed nature.
Oracle also created software to add greater levels of automation to the Hadoop data-processing framework. It did this to lower the technical overhead required to manipulate Hadoop, Mendelsohn explained.
"Hadoop as it currently stands is a niche technology," he said. "There are a very few sophisticated customers who have a lot of very sophisticated programmers, like the intelligence community or sophisticated banks... but in order to get Hadoop out of this niche, you need to automate the generation of [its] MapReduce code."
"Over the next three-to-five years, people will make it more accessible to less-sophisticated development organisations," he predicted.
As with the Exalytics appliance announced on Sunday, Oracle's product will launch into a market already crowded with hardware and software from competitors. EMC has a Hadoop play via its tie-up with Hadoop specialist MapR; IBM has a hefty analytics division bulked out by its Netezza appliances; and companies ranging from Facebook to Yahoo already develop extensions and contribute code to the open-source Hadoop framework.
EMC is not sure whether it will compete with Oracle in Hadoop-powered data analytics, even though Oracle's Big Data Appliance fulfils the same technical need as EMC's range of Greenplum analytics hardware.
"We're both pursuing this big new emerging market," EMC's chief operating officer Pat Gelsinger told ZDNet UK. "I don't know if we're gonna be co-operative or competitive yet on that level."
Few details were given on the precise nature of the NoSQL database. Launch details, dates and prices were not available for any of the announced software and hardware products. Though the software will be bundled with the Big Data Appliance, it will also be possible to run it on its own as well, Mendelsohn said.
Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.