Data services may help address a major SOA unknown -- data quality

SOA enables data to be pulled from multiple sources and quickly distributed across service-oriented systems and applications - what if it's bad data?

A couple of months ago, as reported here, Neal Fishman released a book that warned of SOA-based infrastructures helping to spread "epidemics" of viral data across enterprises -- since data can be pulled from multiple, formerly siloed sources and quickly distributed across service-oriented systems and applications.

How much data pulled from multiple sources is bad data?

Informatica's Ash Parikh, a long-time advocate of the data services approach to SOA, has also been warning of this scenario. I have gotten to know Ash through our participation on the Informatica Perspectives site, and recently had a chance to talk to him and Informatica's Chris Boorman prior to the launch of Informatica 9, which embraces the SOA data services concept.

Ash proposes that organizations adopt a data services layer that provides "a model and standards-based reusable data abstraction layer that can make holistic, accurate and timely information available to an enterprise integration infrastructure, without all the typical complexity and maintenance costs."  He defines a data service as "a modular, reusable, business-relevant service that enables the access, integration, and delivery of complex enterprise data throughout the enterprise and across corporate firewalls in batch, near real-time and real-time modes, including federation."

As companies move into the next level of SOA maturity, in which services start reaching across enterprise boundaries, many have been struggling to improve SOA's ability to deliver business value. One factor is companies can't trust the data that is being pulled in from all the stovepipes into enterprise services. SOA, as Chris puts it, "has lacked the data abstraction layer that enables organizations to basically define the data objects and the rules associated with data objects, that can then be permeated through -- whether it be Web services or SQL or batch or anything else -- to the applications that are using that data."

Ash, who has been warning the industry about the quality of data -- or lack thereof -- surging through SOA-based infrastructures for some time now, says SOA data services open up many new avenues for connecting SOA with enterprise data management. "It's much more than just data access," he points out. "It's making sure the data that is delivered is of the greatest quality."

SOA data services also helps create a more collaborative environment between IT, data managers, and business data owners. In the real world, Ash says, "when people talk about data, they never talk about 'data source X' or 'data source Y' that's sitting in a corner somewhere," he says. "They report the data as a business representation of data -- my customer data, my product data, things like that" This brings things in line with the perspective required of SOA architects, who need to better assure more timely and accurate and consistent views of their data and the product data.

Given this backdrop, I saw that Mike Kavis also has been doing work in this area, and just posted a business case for data services at his site. He describes the issues that can be rectified via an abstracted data layer: real time failover among multiple virtual data centers; managing multi-channel partners with multiple data structures; regulations and laws affecting data management and movement; and data security against direct access to  databases.

Maintaining a loosely coupled data services layer takes away the complexities and inconsistencies of attempting to manage multiple data sources. "By abstracting the data layer and creating configurable services as access points to the data, teams can quickly implement solutions in a controlled and standardized manner," Mike says. For example, they can move quickly "due to the simplicity of the data access and the fact that they don’t need extensive knowledge of the underlying data."

Ash Parikh also talks about the divide between data management and real-life business needs, something that SOA and data services can help address. For example, he observes, many companies have built great data models, but these models tend to be static. "It's great to have a model, but I also need a way to find all that information, and to make sure that information I'm finding across a multitude of these data sources --  which can be varied in structures and formats -- is relevant to me." Many of these issues can be resolved at the data services abstraction layer, he says.