What's missing from all the conversations about big data is a focus on the infrastructure necessary to support it — and in particular its use in real time.
For many companies, big data means opening up access to the data warehouses they have always maintained. Data warehousing has been and continues to be a critical component of enterprise-class organisations.
Such systems provide the aggregation of data from across the organisation and enable it to be sliced and diced into consumable chunks allowing business analysts to provide insights into business conditions.
A properly designed architecture focused on scalability is paramount
It is this form of the data — parsed and processed into actionable information — that will be integrated back into the datacentre, into applications and infrastructure, to serve as input to the myriad systems and processes making near real-time decisions.
But data warehouses were not designed for the volume of integration and access required by such models — nor are the various business-intelligence systems that assist in processing the data.
The sheer volume of incoming data can be at times enough to overwhelm the supporting systems. Add to those volumes the number of systems trying to access the refined data and it is unlikely such applications will withstand the onslaught.
If big data in the enterprise is to be a successful platform that business and operations can exploit, it will need to be treated as a more critical datacentre asset. That approach will require a long, hard look at infrastructure and architecture to ensure access to and from such systems can scale to meet the coming demand.
The same architectures we use to scale public-facing applications will almost certainly need to be applied to realise a model in which big data can continue to be exploited traditionally — daily or even weekly — as well as in near real time. This is where we are told the next-generation datacentre models are headed and where the most value will be derived.
Blockages to data retrieval
Reliability is of central importance, particularly where infrastructure is concerned. The integration of services with infrastructure and applications is often a blockage that makes a system trying to retrieve data from a service in real time wait for a response. It just cannot continue processing until that process completes, successfully or otherwise.
When services are working well, blocking is not an issue. Data is retrieved almost immediately and processing continues. But when services are overwhelmed, dependent systems become bogged down waiting for responses.
This delay affects the entire chain — from the service itself to dependent systems and ultimately to the end-user, who usually has no idea why the system is unresponsive because there's no way to notify them from a system that's deeply hidden within several layers of architecture.
Thus, the reliability and performance of big-data systems should be an imperative. A properly designed architecture focused on scalability is paramount to enabling the inter-connectedness that will be the hallmark of the big data-driven organisation.
A focus today on putting into place the architectural framework necessary to achieve that scalability will certainly go a long way towards enabling more expansive use of big data throughout the datacentre.