ie8 fix
madison

Databases leverage MapReduce technology to radically juice data scale, performance, analytics

By | August 27, 2008, 9:18am PDT

Summary: With better data reach and inclusion, come better results. So BI allows leaders can establish the trends early that will determine their future success or failures. In a fast-paced, global, hyper competitive business landscape these insights are the currency of success for the future. The better you do BI, the better you do business … current, near-term and long-term. There’s no better way to know your customers, competitors, employees and the variables that buffet and stir markets than effective BI.

In what could best be termed a photo finish, Greenplum and Aster Data Systems have both announced that they have integrated MapReduce into their massively parallel processing (MPP) database engines.

MapReduce, pioneered by Google for analyzing the Web, now becomes available to enterprises and service providers, giving them more access and visibility into more data from more origins. Originally created to analyze massive amounts of unstructured data, the approach has been updated to analyze structured data as well.

Greenplum, San Mateo, Calif., says that MapReduce will be part of its Greenplum Database beginning in September. Aster Data, Redwood Shores, Calif., says that MapReduce will be included in its Aster nCluster. [Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.]

Curt Monash, president of Monash Research, editor of DBMS2, and a leading authority on MapReduce, sees this as a major leap forward. He reports that both companies had completed adding MapReduce to their existing products and had been racing to the finish line to get their news out first. As it turned out, both made their announcements within hours of each other.

Curt lists some points on his blog about what this new technology marriage means.

  • Google’s internal use of MapReduce is impressive. So is Hadoop’s success. Now commercial implementations of MapReduce are getting their shots too.
  • The hardest part of data analysis is often the recognition of entities or semantic equivalences. The rest is arithmetic, Boolean logic, sorting, and so forth. MapReduce is already proven in use cases encompassing all of those areas.
  • MapReduce isn’t needed for tabular data management. That’s been efficiently parallelized in other ways. But, if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help.
  • In principle, any alphanumeric data at all can be stuffed into tables. But in high-dimensional scenarios, those tables are super-sparse. That’s when MapReduce can offer big advantages by bypassing relational databases. Examples of such scenarios are found in CRM and relationship analytics.

Greenplum customers have been involved in an early-access program using Greenplum MapReduce for advanced analytics. For example, LinkedIn is using Greenplum Database for new, innovative social networking features such as “People You May Know” and sees it as a way to develop compelling analytics products faster. A primary benefit of the new capability is that customers can combine SQL queries and MapReduce programs into unified tasks that are executed in parallel across hundreds or thousands of cores.

Part of the appeal of business intelligence and its huge ramp-up over the past five years is that IT assets play an ever larger role in providing unprecedented strategic guidance and insights to leaders of enterprises, governments, telecos and cloud providers. IT has gone from an automating business functions role to an essential crystal ball service of the highest order. By consequently gaining access to larger data sets that — more than ever before can be mined and analyzed for higher levels of process and business refinements — IT has become a member of the board.

With better data reach and inclusion, come better results. So BI allows leaders can establish the trends early that will determine their future success or failures. In a fast-paced, global, hyper competitive business landscape these insights are the currency of success for the future. The better you do BI, the better you do business … current, near-term and long-term. There’s no better way to know your customers, competitors, employees and the variables that buffet and stir markets than effective BI.

Now, by exanding the role and reach of MapReduce technologies and methods, a powerful new tool is added to the BI arsenal. More data, more data types, more data sources — all rolled into an analytical framework that can be directly targeted by developers, scripters, business analysts, exectutives, and investors.

These new MapReduce use announcements mark a significant advancement that helps makes IT another notch higher in its utility and indespensible nature to business. And it comes at a time when more data, meta data, complex events, transactions and Internet-scale inferences demand tools that can do for enterprise BI what Google has done for Web search and indexing.

Being comprehensive and deep with massive data sets analytics offers a new mantra: The database is dead, long live the data. Structured data and the containers that contain it are simply not enough to organize an access the intelligence lurking on modern networks, at Internet scale and Internet time.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Dana Gardner is president and principal analyst at Interarbor Solutions, an enterprise IT analysis, market research, and consulting firm.

Disclosure

Dana Gardner

Dana Gardner is president and principal analyst at Interarbor Solutions, LLC, a New Hampshire-based IT analysis and new media content production and consultancy firm that he founded in 2005. He produces a series of podcast/videocast/transcript/blog content shows, called BriefingsDirect[tm/sm], some of which are sponsored and which he blogs on. Such sponsored shows are declared individually as such and by what organization or company. When Dana blogs on ZDNet on companies that he does have, or has had, consulting and/or sponsorship relationships, he declares that in each blog entry. There is no connection between the negotiation of such sponsorships and the opinions expressed by Dana here on ZDNet. To date, the following organizations/companies have sponsored, or do sponsor, some BriefingsDirect content, or have consulting relationships with Dana: Active Endpoints Akamai Technologies Aster Data Systems BP Logix Business Technology Quarterly CA Compuware Electric Cloud Genuitec Gerson Lehrman Group Greenplum Hewlett-Packard iTKO JustSystems North America, Inc. Kapow Technologies LogLogic Nexaweb Technologies, Inc. The Open Group Paglo Panda Security Platform Computing Progress Software rPath Sailpoint Splunk TIBCO Software Weblayers Workday WSO2 ZDNet As a matter of CNET Networks and Interarbor Solutions policies, when Dana covers an organization that is also a sponsor of a BriefingsDirect-produced podcast, videocast or any other content, a disclosure will be included with the coverage. Updated (1/4/2010): Instead of providing a disclosure on just those editorials (blog posts, etc.) that intersect the above listed companies, we have changed the policy to include a link to this full disclosure at the end of every one of Dana's blog posts. In the case of audio or video-based coverage, such disclosures will be provided within the editorial content itself.

Biography

Dana Gardner

Dana Gardner is president and principal analyst at Interarbor Solutions, an enterprise IT analysis, market research, and consulting firm. Gardner, a leading identifier of software and cloud productivity trends and new IT business growth opportunities, honed his skills and refined his insights as an industry analyst, pundit, and news editor covering the emerging software development and enterprise infrastructure arenas for the last 18 years.

Gardner tracks and analyzes a critical set of enterprise software technologies and business development issues: Cloud computing, SOA, business process management, business intelligence, next-generation data centers, and application lifecycle optimization. His specific interests include Enterprise 2.0 and social media, cloud standards and security, as well as integrated marketing technologies and techniques.

Gardner is a former senior analyst at Yankee Group and Aberdeen Group, and a former editor-at-large and founding online news editor at InfoWorld. He is a former news editor at IDG News Service, Digital News & Review, and Design News.

3
Comments

Join the conversation!

Just In

RE: Databases leverage MapReduce technology to radically juice data scale, performance, analytics
tank33 11th Sep
Amazing! 3 I download it happy replica watch
0 Votes
+ -
Aster In-Database MapReduce Available Now
swooledge Updated - 27th Aug 2008
Dana - great article ? thanks for covering this. Many agree this is a game-changing innovation for the database industry. I'd just like to point out:

[1] Aster In-Database MapReduce is available now for evaluation within the Aster nCluster database

[2] We have several customers implementing this as we speak

[3] There is a live demo of Aster In-Database MapReduce on our site http://www.asterdata.com/product/mapreduce.html

?...Radically juice...? ? love it!
Yep-- Dana, great stuff.

We have a series of videos of Greenplum MapReduce in action at http://www.greenplum.com/resources/mapreduce/.

This technology is already in use at a number of customers including LinkedIn and O'Reilly Media.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix
Click Here
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix
ie8 fix