IBM tool aims to bring data sources together

MineLink uses heuristic techniques to work out related information even where data fields are labelled differently
Written by Angus Kidman, Contributor

IBM is finishing work on database software designed to more effectively link related information from multiple data sources, a problem area for large data-warehousing projects.

The technology, code-named MineLink and developed here at the company's Almaden research centre, uses heuristic techniques to identify data fields that contain related information even though they may be labelled differently.

For instance, a field labelled "Surname" in one database may be labelled as "First Name" in another, which can cause problems in integrating the data. While that example is simplistic, matching fields often requires complex analysis of their contents, especially if businesses want to drill further into the collected data.

A prototype of MineLink for use in the life sciences field was demonstrated by IBM researchers as far back as 2002. That project used existing DiscoveryLink analytic technologies in DB2 database software but added extra data-mining features in order to provide a unified view of complex information.

Although Big Blue hasn't been vocal in promoting the technology, plans for integration into its flagship DB2 database are already well advanced.

"It should be in the DB2 product in the next year or two," said Steve Cousins, senior manager for the user experience research group at Almaden.

That timetable would likely see MineLink elements incorporated in the successor release to Stinger, the next incarnation of DB2, which is currently in beta and expected to be released before the end of the year.

The enterprise database field is now a three-horse race among Oracle, IBM and Microsoft, which in total account for three-quarters of relational database revenue worldwide. Oracle has 39.8 percent of the global market, compared with IBM's 31.3 percent, according to 2003 figures from market researcher IDC. Open-source products such as MySQL don't figure in those totals because many people download them at no cost.

Editorial standards