CEP and MapReduce: Connected in complex ways

Summary:Complex Event Handling (CEP) is the category of technology focused on handling large, continuous streams of data that must be processed in real-time. CEP is distinct from Big Data in the eyes of some, and yet inextricably tied to it as well.

In a recent post, I discussed MPP (Massively Parallel Processing) data warehouse technology and how it too is a Big Data technology. I also explained some things MPP has in common, architecturally, with Hadoop's MapReduce parallel computation approach to handling large data sets.

There's another three-letter acronym that is at once its own category and yet also a legitimate Big Data player: CEP (Complex Event Processing). CEP engines process continuous data flows (like feeds, sensor readings and other event-driven data that is large in volume and very fast-flowing), in real-time. CEP engines are available from so-called mega-vendors like Oracle, Sybase/SAP and Microsoft, integration players like TIBCO and pure play companies like StreamBase and OneMarketData.

As with other data-related domains, there's a lot of cross-pollination and conflation happening between CEP and Hadoop/MapReduce. As an extreme example, I've seen one blog post that asserted Hadoop is a CEP Engine. But there are other, perhaps less-controversial examples as well.

In August of 2011, Twitter acquired a company called BackType, and a product of theirs called Storm. Storm processes streaming data in parallel, over a cluster of machines, using a processing architecture called topologies. Technology like this works well for a social media data hotbed like Twitter. Just think of the real-time processing necessary to determine trending topics, and you'll start to get the idea. The keepers of Storm point out that MapReduce processes "jobs" in batch while Storm processes streams in real-time. Storm and MapReduce have a divide-and-conquer parallelism in common, and Storm even uses the ZooKeeper distributed process coordinator that began as an Apache sub-project of Hadoop. Twitter made Storm open source in September, 2011. So look for mashups of Storm and other data-centric technologies on the horizon.

Another interesting CEP-MapReduce crossover comes from Microsoft. It has devised a way for its StreamInsight CEP engine (which ships as part of the SQL Server relational database) to work in concert with the Microsoft-Hortonworks Hadoop on Windows distribution. Interestingly, Microsoft has figured out a way to shim the StreamInsight engine into the reducer step of Hadoop's MapReduce jobs. This is explained, complete with examples, in a Microsoft blog post.

These are just a couple of examples of how CEP and Big Data technologies can be combined in ways that are logical and beneficial. Lots of other data technologies, like data visualization, predictive analytics and more, fit in with Big Data, and stretch its definition, in a likewise manner. We'll certainly look at more of them in future posts.

Topics: Big Data, Microsoft, Oracle

About

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology. He has been a developer magazine columnist and conference speaker since the mid-90s, and a technology book writer and blogger since 200... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.