BI data is useful; after all, it's a well-known maxim that what gets measured gets improved, and collecting such data was one of the drivers for the rise of customer relationship management (CRM) systems, as one example, detailing all touch points between the business and its customers, providing better service through better information.
Yet, as data volume grows, data complexity also increases. This in turn directly impacts the business' ability to produce real-time intelligence, driving smart, timely decisions and seizing new opportunities.
Typically, large companies employ a traditional centralised data warehouse, bringing in data through an extract-transform-load (ETL) process. Collocating data in a massive and robust database warehouse delivers a performance benefit compared to manually extracting data, plus data can be heavily indexed for fast search results without risking slowing down transactional systems.
Yet, collocating data is not without cost. Consolidating data from various sources is laborious and time consuming, and often leads to a delayed result where, for example, reports show yesterday's information. This time delay directly affects the ability to make the right decision at the right time.
Further, the proven and tested relational database management system (RDBMS) approach is not necessarily appropriate in every circumstance. This is especially true when facing a voluminous mix of structured and unstructured complex data sets.
With greater amounts of information being retained, and with distinct specialised purposes that a "one size fits all" enterprise data warehouse view no longer fits, a multi-container approach has been resultant, leading to a need for fast, cost-effective ways to access and process data in these containers.
Such an approach allows for a data warehouse to handle the analysis of structured data; another environment for analysis of raw, unstructured data such as Apache Hadoop; custom, independent data warehouses to analyse structured, normalised data; in-memory solutions; and cloud-based systems for their extremely rapid creation time and ability to integrate data sets external to the organisation.
The modern challenge with such large and diverse data collections is providing agile solutions that allow business groups to quickly solve problems, discover efficiencies, and improve results.
Seeking to resolve this challenge for its own internal applications, Intel IT researched the data virtualisation capabilities of a reporting tool and an ETL tool. Analysing the performance led to some interesting discoveries that enterprises should consider when evaluating their own BI agility.
Data virtualisation provides a middle layer between consumption (such as web and mobile apps and BI tools) and storage (where the data actually resides), providing a consistent abstracted format.
Three key discoveries were:
- Data virtualisation solutions required just one week to set up, compared to approximately eight weeks for a traditional collocation approach that copies all data sets to a single container
- Optimisations that push filtering down to the source container resulted in the best performance
- The more processing that can be pushed down to the source container, the higher the performance.
Intel is continuing to research the application of data virtualisation, but the results already provide interesting lessons for enterprises to consider.