Big Data forces IT to press reset button on architecture

From data warehouse to 'data lake': immersing your company in new depths of information awareness.
Written by Joe McKendrick, Contributing Writer

After decades of struggling, organizations finally thought they had it all figured out: critical data could be maintained through a constellation of relational database management systems, abstracted and made available to enterprise users and BI/analytics applications through data warehouses and data marts.

Now, Big Data -- hundreds of terabytes' worth of information -- barges on the scene and starts to mess things up.  Time for a rethink on data architecture?

Yes, says Dan Woods, who explains what is needed in his latest Forbes post: "Big Data Needs a Big New Architecture." Woods makes a strong case for a new approach to data architecture:

"To take maximum advantage of big data, IT is going to have to press the re-start button on its architecture for acquiring and understanding information. IT will need to construct a new way of capturing, organizing and analyzing data, because big data stands no chance of being useful if people attempt to process it using the traditional mechanisms of business intelligence, such as a data warehouses and traditional data-analysis techniques."

Perhaps it's time to stop thinking in terms of "data warehouses," which evoke images of industrial-era, highly structured framework, and think of something more fluid and expansive, such as a "data lake." (Perhaps the term "data cloud" applies here?) Woods credits James Dixon, CTO of Pentaho, who coined the term "data lake."

The issue, Dixon says, is data warehouse architecture pre-categorizes data at the point of entry. Big Data is too unpredictable for such structure. Users simply won't know how the data needs to be interpreted and leveraged later on.  Woods describes some new forms of repositories that have "data lake" flexibility - such as Pentaho's use of Apache Hadoop, Pervasive’s Data Rush, and complex event processing tools. Enterprise search technologies will also play a key role in the new data architecture. I even speculated on the possibility a couple of years ago, after hearing about ING's database-free deployment. Plus, the ability to analyze search data may open new possibilities for understanding future directions for the business.

Editorial standards