Industrial Big Data

Big Data isn't just about clickstreams and status messages. It's about manufacturing turbines, jet engines and even consumer goods, too.

When people talk about Big Data, many people almost immediately think of clickstream analysis on Web sites or sentiment analysis gleaned from social media status messages.  But Big Data goes well beyond that.  In fact, it goes well beyond the purely digital realm.

Big Data for Big Manufacturing Yesterday, I spoke with Brian Courtney, who is General Electric's General Manager for Operations Data Management at a unit of the company called GE Intelligent Platforms.  I learned a lot from that call.

Perhaps talks of Big Data at GE will tempt you to think of Alec Baldwin's Jack Donaghy character in NBC's 30 Rock, making humorous commentary about Big Data and microwaves.  But you'd be better off thinking about GE's advertisements around its manufacture of jet engines, turbines and locomotives.

Billions of stars? Trillions of data readings. You might even think of personal care products.  Check out the graphic in this post. It illustrates how, in the world of consumer packaged goods (CPG), sensor readings can generate 152,000 samples per second, which translates to 4 trillion data readings per year.

That graphic comes from a White Paper written by GE called "The Rise of Industrial Big Data."  Click the link to get a copy of your own (site registration is required), then read it and you'll see that the 4 trillion data points come from just one piece of equipment used in the manufacture of a single specific CPG product.  Now do the math around including all pieces of equipment for that item, all the items in a product line and all the lines made by a manufacturer.  There's no way to describe that data as anything but "Big."

Historical expenses In such a CPG scenario, huge numbers of product units are manufactured in each shift.  Each piece of equipment is monitored closely by systems called "historians."  Things like temperature and vibration are measured at intervals timed in milliseconds.  That way, a failure can be discovered immediately and addressed quickly.

Since that data is taken to monitor equipment conditions in real time, keeping it around hasn't been the priority, and its huge volume has until recently precluded its analysis.  Storage can still be expensive.  Courtney told me that for some of his customers, SAN-based storage, when ancillary costs are included, costs $30,000 per managed terabyte per year.

Avoiding problems, not just fixing them But in the era of Big Data, we can keep more of this data.  GE's historian system, known as Proficy, is optimized for the storage and processing of the time series data that Industrial Big Data involves.  It leverages compression heavily to reduce storage requirements, and uses decompression to retrieve data far faster than either relational or common NoSQL databases could do.  And, yes, it can interface with Hadoop.

Maybe you're wondering whether analyzing this data is worth it.  It is.  In Industrial Big Data, such analysis isn't just about gaining some ethereal set of insights; it's about big bucks.  Analyzing historian data means patterns can be discerned form it.  And that means that instead of just detecting equipment failures, GE can predict them.  This allows the endangered equipment to be gracefully replaced before it breaks, saving GE's customers huge sums of money.

I have always found the “old economy” side of technology to be the most intriguing and valuable, because it’s often where tech is most useful and impactful. Big Data is certainly no exception.