Big Data forces IT to press reset button on architecture
Summary: From data warehouse to 'data lake': immersing your company in new depths of information awareness.
After decades of struggling, organizations finally thought they had it all figured out: critical data could be maintained through a constellation of relational database management systems, abstracted and made available to enterprise users and BI/analytics applications through data warehouses and data marts.
Now, Big Data -- hundreds of terabytes' worth of information -- barges on the scene and starts to mess things up. Time for a rethink on data architecture?
Yes, says Dan Woods, who explains what is needed in his latest Forbes post: "Big Data Needs a Big New Architecture." Woods makes a strong case for a new approach to data architecture:
"To take maximum advantage of big data, IT is going to have to press the re-start button on its architecture for acquiring and understanding information. IT will need to construct a new way of capturing, organizing and analyzing data, because big data stands no chance of being useful if people attempt to process it using the traditional mechanisms of business intelligence, such as a data warehouses and traditional data-analysis techniques."
Perhaps it's time to stop thinking in terms of "data warehouses," which evoke images of industrial-era, highly structured framework, and think of something more fluid and expansive, such as a "data lake." (Perhaps the term "data cloud" applies here?) Woods credits James Dixon, CTO of Pentaho, who coined the term "data lake."
The issue, Dixon says, is data warehouse architecture pre-categorizes data at the point of entry. Big Data is too unpredictable for such structure. Users simply won't know how the data needs to be interpreted and leveraged later on. Woods describes some new forms of repositories that have "data lake" flexibility - such as Pentaho's use of Apache Hadoop, Pervasive’s Data Rush, and complex event processing tools. Enterprise search technologies will also play a key role in the new data architecture. I even speculated on the possibility a couple of years ago, after hearing about ING's database-free deployment. Plus, the ability to analyze search data may open new possibilities for understanding future directions for the business.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
Extremely doubtful
RE: Big Data forces IT to press hit button on architecture
RE: Big Data forces IT to press hit button on architecture
RE: Big Data forces IT to press hit button on architecture
Relational is the future
The scaleability is an implementation issue - which can be solved. It makes no sense at all to abandon the enormous advantages of the relational model.
There is nothing about the relational model that makes it inherently unscaleable.
If you don't like SQL use an alternative relational language that is simpler and more faithful to the model - there are already concrete proposals for this. This should in itself have significant performance advantages as the optimizer would no longer have to deal with SQL's complexity.
I would say the relational model is the only approach that will meet the needs of the long term future.
Relational is excellent.
RE: Big Data forces IT to press hit button on architecture
RE: Big Data forces IT to press hit button on architecture
What's the logical difference between data and big data?
Every "post-relational" technology I have seen so far has proved on closer inspection to actually be pre-relational: hierarchical, pointer based and lacking a coherent logical model.
RE: Big Data forces IT to press hit button on architecture
My opinion is a smart engineer goes for relational first, then if key/value works better, uses taht.
RE: Big Data forces IT to press hit button on architecture
In the case of Microsoft's Azure, I'm sure the inner workings are likely more scalable than a standard SQL Server engine. Though applications interacting w/ Azure and SQL Server could be essentially the same, the 'engine' of these databases may be processing the requests to account for the different environment (cloud vs internal company db).
RE: Big Data forces IT to press hit button on architecture
But relational is a purely logical model
In fact relational will eliminate "big data" techniques as it becomes more scaleable in the future.
Agreed
It doesn't make sense to say relational can't scale. Relational is a mathematical model of how to represent data logically.
To say relational doesn't scale would be a little like saying that square root doesn't scale because on pressing the square root key on my (somewhat ancient) HP calculator there is a noticeable delay before getting the answer.
The reason for the so-called "big data" solutions is that with current hardware and RDBMS implementations you possibly need to revive some obsolete (but inflexible and error prone) methods to deal with very large data volumes.
As, inevitaby, RDBMSs become more scaleable, these half-baked stop-gap "big data" solutions will disappear and never be heard of again.
RE: Big Data forces IT to press hit button on architecture