Big Data and analytics: The year ahead

As a follow-up to our year-end Big Data predictions roundup, let's kick off 2014 with an additional, broader set of predictions from industry players.

My recent post, Predicting Big Data's 2014 , presented an analysis of year 2014 Big Data predictions from four different vendors.  It provided enough food for thought, it seems, that several other industry players saw fit to get into the game.  In fact, after that post was published, I received more than double the number of prediction slates than I had for that first go-round.

When a larger sampling comes into view, we end up having a better data set, and our analysis can be a bit stronger (very meta, eh?).  And with this broader set of predictions, we encounter a few themes that recur quite frequently.

Start with a Splunk
Splunk is a well-known company in the Big Data space.  Its IPO was a smashing success and its technology is well-executed and focused.  Perhaps it's appropriate then that the company sent me not one set of predictions, but two, specifically from Brian Gilmore (Solution Expert, Internet of Things and Industrial Data) and Brett Shepard (Big Data Director).

Gilmore focuses specifically on the so-called "Internet of Things" and its contribution to the world's ever-growing corpus of Big Data.  He thinks companies will have more connected, secured, data-producing devices than ever and that JavaScript will be the standard platform for building applications on and around them.

Shepard, meanwhile, offered his support for the idea of Big Data power-to-the-people this year, with the Data Scientist aristocracy becoming less in-vogue.  Specifically, Shepard says that we'll see the "emergence of an integrated, full-featured analytical platform combing real-time monitoring with ad-hoc search of raw unstructured big data that's surprisingly easy to use."

Shepard's allusion to search (versus, say, SQL or MapReduce) as the new Lingua Franca of Big Data, and "raw, unstructured" data as its fodder illuminates a very important theme dominant amongst the slates of predictions I reviewed.

Structured predictions
On the structuredness of data, opinions vary to an impressive degree.  MapR’s CEO and Co-founder, John Schroeder, forecasts that in 2014, "SQL simultaneously becomes the biggest promise and disappointment for Big Data" and that "Search emerges as the Unstructured Query Language."  

Kurt Dobbins, CEO of Deep, begs to differ, saying "Big Data becomes Structured," "Unstructured Data is a Fallacy" and forecasts that "In 2014, there will be a realization that all data needs to be structured to realize its maximum value. Structure brings context to all data."

Head spinning yet?  Turn it back around as the Active Archive Alliance a consortium that includes Crossroads, Fujifilm, QStar, SGI, and Spectra Logic, sees 2014 bringing the "Continued Growth of Unstructured Data"  and John Joseph, president and co-founder of DataGravity, says 2014 will be a year in which "Big data goes unstructured."

Storage is another hotbed topic of discussion.  Steve Lucas, President of Platform Solutions at SAP, says that "an increasing concern about the cost of storing massive data sets...will drive a new emphasis on the cloud friendly solutions."

Spectra Logic makes a rather different and unexpected prediction that "A shift will begin toward...enterprise tape drive be adopted in the broad set of markets dealing with the need for high capacity, long term data storage." In fact, Spectra Logic unifies these two predictions by saying that tape can serve as "a deep archive offered by cloud providers, in addition to traditional uses in backup, disaster recovery and compliance."

The Active Archive Alliance feels likewise in enthusiasm for tape-based storage, saying "we'll see expanded software intelligence that makes tape easier to manage, streamlined within the storage environment. New appliances that front-end tape will make it easier for customers to use low-cost tape..."

I suppose everything old is new again.  If we think about some of the underpinnings of Hadoop, like file-based storage and batch mode operation, and the time sharing-like billing model of cloud computing, we shouldn't be shocked that another mainframe-era technology like tape might also make a comeback.

Ripping YARNs
Actian's CTO, Mike Hoskins, is bullish on YARN (an acronym for Yet Another Resource Negotiator), the key component in Hadoop 2.0, which decouples Hadoop's infrastructure from the MapReduce processing algorithm.  YARN makes it possible to use Hadoop as an interactive engine, rather than a batch-driven one and that, in turn, means Hadoop can start to be used for data discovery and even processing of streaming data.

Kurt Dobbins at Deep is bullish too, listing "Hadoop will become Real-time" as one of his four predictions.  He goes on to say that "Next year, Hadoop will transition from post-processing data with MapReduce to operating on real data in real-time."

Operational Analytics
A number of predictions focus on the notion of analytics becoming more ingrained in business processes and operations.  MapR's Schroeder speaks of the "Emergence of Operational Hadoop" and putting it to use "for measurable business advantage in applications such as customized retail recommendations, fraud detection and leveraging sensor data for prescriptive maintenance."

Actian's Hoskins feels similarly, commenting that "We’ll see an important evolution in analytics in 2014, as operational analytics become universal and business processes are optimized through analytics...As a result, lines of business will be able to take advantage of small windows of market opportunity, avoid risks ahead that might have not been anticipated, and deliver valuable, personalized information to their customers."

Gaurav Vohra, CEO, Jigsaw Academy piles on too.  He believes that "Only those businesses that are able to set up a culture of data driven decision making can hope to stay ahead of the competition."  Vohra's not alone in using such dramatic language either: Mr. Schroeder of MapR predicts that "Every industry leader will deploy a new data centric application or they won’t be leading for long."

Alpine Data Labs CPO, Steven Hillion, takes things even further, saying that applied use of analytics won't just become more pervasive in business processes, but that it will be used to automate them. Hillion opines that "Machines will take over decision making based on analytics...This trend will accelerate as we teach machines to replicate logical thinking."

Looking Ahead
While there were several more predictions shared with me than I've covered here, the major themes of structured vs. unstructured data, storage challenges, interactive Hadoop and operational analytics are the ones where I saw consensus amongst the set of industry prognosticators that contacted me.

What binds all of these themes together is Big Data's growing maturity.  Technology that's more mature will be used more pervasively, impact core infrastructure more significantly, become relied upon more critically and serve as an interactive tool set by necessity.

And what of the apparent disagreement over structured vs. unstructured data?  I'll go out on a limb and say that the proponents of each position are more in agreement than not.  Yes, data is increasingly being stored in the form of files, rather than structured, relational tables.  But that unstructured data is now more frequently subjected to analysis-driven structuring so it can be processed and queried, rather than stored for passive safekeeping.  Products and companies that see this, and make it easier to do, will have a good play this year.