Few enterprise data warehousing (EDW) professionals regard the key rival approach--data federation--to be a best practice. Usually, the reasons for this disdain are valid, such as the fact that federated environments are not optimized for heavy-hitting data matching, merging, transformation and cleansing, all of which are essential functions to deliver a "single version of the truth" for business intelligence (BI).
But data federation is refusing to die as an alternative to EDW--and is taking on new importance in organizations' data management strategies. Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.
Nevertheless, they are complementary approaches, each with its respective pros and cons. For example, data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations. In practice, data federation and EDW (aka data consolidation) are not mutually exclusive. Many real-world data federation deployments are in fact hybrid approaches that involve EDWs to varying degrees. Federation environments can coexist with, extend, virtualize, and enrich EDWs to help users pull a wide range of disparate data into their reports, queries, dashboards, and analytic applications.
To determine whether an operational BI scenario requires a federated solution--in lieu of or supplementing an EDW-hubbed topology--Information and Knowledge Management (I&KM) professionals should determine whether their data management environment fits any or all of the following criteria:
As decentralized service-oriented architectures (SOA) gain traction in operational BI environments, enterprise requirements for data federation--with or without an EDW in the loop--will continue to grow. Also, as EDWs begin to manage petabyte-scale data sets, batch transfer of this data will prove ever more costly and cumbersome--and federated query of this "too massive to move" data will become the only viable option.