IBM is going gangbusters in the Big Data world. Its IBM Big Data Web site is a major resource, its TV commercials project authority and its products are entrenched in the space. But, for me, it took a discussion yesterday with Deepak Advani, IBM’s Vice President for Business Analytics Products and Solutions, to appreciate IBM’s Big Data play in all its depth.
Maybe old dogs do the best new tricks
IBM is a 101-year old company, based on the East Coast. It once made typewriters. It still makes mainframes. Its products interoperate with open source technology, but most of its products are anything but free. And many of those products have come through an array of acquisitions the company has made over the course of its history. On top of all that, IBM is a services company, with teams of consultants working around the globe.
Most Big Data startups have a lean team, are just a few years old, based on the West Coast, offer their core technology as open source software running on commodity hardware, and have built their IP organically. Yet IBM is still a major player in the Big Data and analytics world. How can this be when so many of its vital statistics appear counter to playing in the space? Talking to Advani reminded me of several reasons why, and he pointed out some others that were new to me. He helped me connect the dots and then added a few more.
For instance, I knew that IBM had made a number of acquisitions over the last decade in the business analytics space. One of these is especially easy to remember: IBM acquired Business Intelligence (BI) powerhouse Cognos in 2007 -- right after the latter had acquired Applix that same year. This deal gave IBM an end-to-end BI suite that includes conventional and in-memory Online Analytical Processing (OLAP), reporting, dashboards and data visualization. Again, I knew this.
But I had lost track of some of IBM’s other acquisitions, and it’s really the combination of all the deals that yields the compounded analytics value. Before the Cognos deal, IBM acquired Ascential Software in 2005, which among other assets gave it Extract Transform and Load (ETL) product DataStage. After the Cognos purchase, IBM swallowed up statistics and analytics heavyweight SPSS in 2009 and MPP (Massively Parallel Processing) Data Warehouse fixture Netezza in 2010.
So in addition to the full BI suite, IBM also has the high-performance data warehouse technology necessary to feed those BI systems and the tools necessary to run predictive analytics on the output. Speaking of analytics, it’s a field closely tied with Risk Management. With that in mind, IBM’s purchase of OpenPages in 2010 and Algorithmics in 2011 just primed the Big Data pump that much more.
These are just some of the products IBM has in the business analytics space. There are so many in fact that, as Advani explained to me, IBM has a whole “Signature Solutions” program meant to highlight the more interesting combinations of IBM’s products in the space, combined with the intellectual property developed around them by its services organization.
Homegrown products, and external ones too
Can IBM work with open source technologies from outside its own stack, even when they compete with some of its own products? Sure. IBM works with R (effectively a competitor to SPSS); it has a partnership with Cloudera (whose Hadoop distribution competes with IBM’s own, which is also open source) and can use Mahout, the machine learning component that runs on top of Hadoop. And then, of course, there’s Linux, the open source operating system that IBM has used strategically for years.
IBM also has a number of organically developed products in the analytics space. InfoSphere Streams, its CEP (Complex Event Processing) offering and InfoSphere BigInsights, its own Hadoop distribution, are two examples. One of the interesting points about the BigInsights Hadoop distribution is its integration with IBM’s DB2 relational database management system, one of the most important in the industry. And while relational data might not be Big Data, accomplishments in the former build a pedigree for success in the latter. Knowing how to handle data operationally builds a platform and competency for analyzing it later on.
This is about hardware too. IBM is almost synonymous with mainframe computers and the vast back office systems that have run on them, collecting data for decades. Handling systems with those workloads, for that long, makes Big Data a concrete concept for IBM. It doesn’t just build products and ask its customers to imagine the kinds of data they run through them. Imagination’s fine, but decades of experience make it less necessary.