REIMAGINING THE ENTERPRISE | A ZDNet Multiplexer Blog What's this?

Flexing the muscles of big data

When the term 'big data' gets mentioned, what springs to mind? Massive databases? Hadoop clusters? Business analytics engines?

When the term 'big data' gets mentioned, what springs to mind? Massive databases? Hadoop clusters? Business analytics engines?

How about data aggregation systems, filters, metadata creation systems, indexers, results renderers and reporting systems? All of these should be part of a big data strategy and require different resources at different times.

For example, a user inputs a query. The requisite data required to respond to this query has to be pulled together - from formal and informal data sources. 'It all has to be normalised through processes similar to extraction, transformation and load actions (ETL) and positioned in a suitable environment for analysis to take place The analysis has to run and the visual output for the user needs to be constructed and presented to them.

The process includes some disk-intensive I/O, some computer-intensive number crunching and some network-intensive data gathering and reporting. However, these activities are not all happening at the same time.

If a business architects a physical, in-house platform, the organisation will need to create separate areas that can handle each workload specifically (with the required headroom to deal with spikes), or create a private cloud so that resources can be shared. This private cloud still needs to be constructed carefully - should any task overload the available resources, all other workloads will suffer.

Instead, a public cloud platform can offer the flexibility and elasticity required for dealing with mixed big data workloads. Separate virtual areas can be set up that are optimised for specific workloads - but that can also be allocated spare capacity as necessary. The massive scale of some public clouds means that such sharing is easy and cost-effective: the likelihood of a global public cloud finding itself inadequate to meet the needs of all its workloads at any time is minuscule.

By using a public cloud, the business logic and data can be co-located with all the tools and systems that a big data environment needs. Therefore, the IT people get what they want - a fully centralised, manageable and flexible system. The users get what they want - a responsive and usable system. And finally the business gets what it wants - big data insights in the quickest time possible and with the minimum cost and effort.