Dell and Cloudera have created a Hadoop package of pre-integrated software and hardware that promises to cut deployment times for the open-source data analytics platform.
Introduced on Thursday, the Dell Cloudera solution for Apache Hadoop contains everything necessary to run Hadoop, along with support, management, networking and automation technologies.
"The combination simplifies the deployment and management of Apache Hadoop services, including the ability to install and configure Apache Hadoop in minutes from a central dashboard and continue to make configuration updates while the system is running," Dell said in its announcement.
The package combines Cloudera's open-source distribution of Hadoop, dubbed CDH, with Dell's Crowbar software and a PowerConnnect 6248 48-port Gigabit Ethernet Layer 3 switch. It also provides Cloudera Enterprise, which is a management and support product for Hadoop made by Cloudera, and a Dell PowerEdge C2100 server.
Hadoop 'chomps' data
Hadoop, a platform for analysing large datasets, is an Apache Foundation-administered open-source project. It evolved out of Yahoo-developed versions of Google's internal data analysis and storage tools, MapReduce and Google File System. Over time, it has gained further tools, such as Apache Hive for SQL-like queries into Hadoop datasets and Pig, a high-level language for programming in Hadoop.
Hadoop lets you chomp [through] mountains of data faster and get to insights that drive business advantage quicker.– Barton George, Dell
"Hadoop lets you chomp [through] mountains of data faster and get to insights that drive business advantage quicker. It can provide near 'real-time' data analytics for click-stream data, location data, logs, rich data, marketing analytics, image processing, social media association, text processing," Barton George, a director of marketing at Dell, wrote in a blog post on the announcement.
Crowbar, recently made open source by Dell, automates the deployment of Hadoop from the initial server boot-up to the final configuration. Dell said this means people can get mutli-node Hadoop clusters up and running "in a matter of hours, as opposed to days". Crowbar can also handle performance analysis, management and automation functions once Hadoop is running.
EMC has also produced an integrated Hadoop product with hardware from Greenplum data analytics appliances. Unlike the Dell-Cloudera package, the Hadoop distribution included in the bundle is predominantly proprietary, from startup MapR.
Rapidly growing amount of data
Noting that businesses are having to deal with a rapidly growing amount of data, Dell said the package is aimed at financial services, energy, utility and telecoms providers, as well as research institutions, retail businesses, and internet and media groups.
"Not only is the amount exploding, but so is the form data's taking, whether that's transactional, documents, IT/OT, images, audio, text, video, etc.," said George. "Additionally, much of this new data is unstructured/semi-structured, which traditional relational databases were not built to deal with."
The package will be released within the next 30 days, Dell said. A minimum configuration, which consists of six PowerConnect switches and six PowerEdge-C servers, costs between $118,000 (£72,000) and $124,000 depending on the number of support hours. It will launch in North America initially, but may expand to the UK if customers show interest, according to a spokesman for the company.
Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.