Big data: Why it's really an architecture challenge

Big data: Why it's really an architecture challenge

Summary: For big data to amount to a serious asset that business and operations can exploit, datacentres will have to take a long, hard look at infrastructure and architecture.

SHARE:

What's missing from all the conversations about big data is a focus on the infrastructure necessary to support it — and in particular its use in real time.

For many companies, big data means opening up access to the data warehouses they have always maintained. Data warehousing has been and continues to be a critical component of enterprise-class organisations.

Such systems provide the aggregation of data from across the organisation and enable it to be sliced and diced into consumable chunks allowing business analysts to provide insights into business conditions.

A properly designed architecture focused on scalability is paramount

It is this form of the data — parsed and processed into actionable information — that will be integrated back into the datacentre, into applications and infrastructure, to serve as input to the myriad systems and processes making near real-time decisions.

But data warehouses were not designed for the volume of integration and access required by such models — nor are the various business-intelligence systems that assist in processing the data.

The sheer volume of incoming data can be at times enough to overwhelm the supporting systems. Add to those volumes the number of systems trying to access the refined data and it is unlikely such applications will withstand the onslaught.

If big data in the enterprise is to be a successful platform that business and operations can exploit, it will need to be treated as a more critical datacentre asset. That approach will require a long, hard look at infrastructure and architecture to ensure access to and from such systems can scale to meet the coming demand.

The same architectures we use to scale public-facing applications will almost certainly need to be applied to realise a model in which big data can continue to be exploited traditionally — daily or even weekly — as well as in near real time. This is where we are told the next-generation datacentre models are headed and where the most value will be derived.

Blockages to data retrieval

Reliability is of central importance, particularly where infrastructure is concerned. The integration of services with infrastructure and applications is often a blockage that makes a system trying to retrieve data from a service in real time wait for a response. It just cannot continue processing until that process completes, successfully or otherwise.

When services are working well, blocking is not an issue. Data is retrieved almost immediately and processing continues. But when services are overwhelmed, dependent systems become bogged down waiting for responses.

This delay affects the entire chain — from the service itself to dependent systems and ultimately to the end-user, who usually has no idea why the system is unresponsive because there's no way to notify them from a system that's deeply hidden within several layers of architecture.

Thus, the reliability and performance of big-data systems should be an imperative. A properly designed architecture focused on scalability is paramount to enabling the inter-connectedness that will be the hallmark of the big data-driven organisation.

A focus today on putting into place the architectural framework necessary to achieve that scalability will certainly go a long way towards enabling more expansive use of big data throughout the datacentre.

Topics: Big Data, Data Centers, TechLines

Lori MacVittie

About Lori MacVittie

Lori MacVittie is responsible for application services education and evangelism at app delivery firm F5 Networks. Her role includes producing technical materials and participating in community-based forums and industry standards organisations. MacVittie has extensive programming experience as an application architect, as well as in network and systems development and administration.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

1 comment
Log in or register to join the discussion
  • I completely agree that the reliability and performance of Big Data systems should be top of mind. We have found that by using a decentralized and symmetric architecture, all storage nodes share the same functionality and responsibilities. The symmetric architecture also enables easy scalability simply by adding more storage nodes. The more storage nodes in the cluster, the shorter time it takes to complete the task, ultimately benefiting the end user. I'd be interested to see what you think: http://www.youtube.com/watch?v=9916BeLq4MM
    Stefan Bernbo