Machine learning and information architecture: Success factors

Quantity and quality of data are not enough to take full advantage of machine learning. The structures built around data -- and the way data is structured -- influence the value you can derive from machine learning.
Written by James Sanders, Contributor

The enterprise has long since eclipsed the days of manually analyzing data, as doing so is both expensive and impractical considering the sheer amount of data organizations generate. For years, this task has been delegated to programmers, who often were tasked with creating custom scripts requiring frequent revision and fine tuning.

Those days are quickly coming to an end, as both the quantity of data and variety of sources from which that data is collected have increased beyond the practicality of this strategy. Now, organizations are rapidly adopting machine learning to generate insights from data. However, this transition is not a completely seamless one. Understanding how to efficiently utilize machine learning and the data regulations for information processed by machine learning, as well as contending with how computers inherit bias from human decision making, are vital to a successful adoption in your organization.

SEE: Free machine learning courses from Google, Amazon, and Microsoft: What do they offer? (Tech Pro Research)

How to prepare unstructured data for processing with AI and ML

The preparations for unstructured data depend on what type of data it is and how you define unstructured. "'Unstructured' is often a misnomer, as lots of data types associated with 'big data,' such as JSON files (associated with mobile and social feeds), log files, text documents, email messages, and more have structure," Doug Henschen, principal analyst on data-driven decision making at Constellation Research, told ZDNet. "In the case of this semi-structured data, parsing, filtering and transformation steps can be applied in ETL and ETL-like processes. When this happens at scale, [Apache] Spark is often used instead of old-school commercial integration servers. This processing can bring more structure and consistency to the data."

What is necessary to generate actionable intelligence from AI/ML-powered analysis?

Think back to an introductory statistics class you may have taken as a student: Without a sufficiently large sample size -- or in this case, data set -- no meaningful conclusions can be drawn. According to Henschen, some machine learning systems "require at least 10,000 rows of data before you can achieve adequate accuracy."

Repeatability and scale are key to success for utilizing machine learning effectively. "If you can find decisions that happen at scale and that can be made by humans in seconds, they're probably good candidates for automation," Henschen said. "If they're more complex, but still high-scale and consistent, then they may be good candidates for recommendations."

The role that machine learning plays in your organization is also worth reconsidering -- using it as a drop-in replacement for analytics scripts undercuts the benefits that machine learning offers. "AI has a job to do. You're defining the model to automate and scale your decisions and actions, to take care of a job," Forrester enterprise architecture analyst Michele Goetz told ZDNet. "What you're doing is training a system to be a co-bot with the rest of your organization, not to just say, 'Oh, look how great our performance was,' or, 'This is just where your forecast is going.'"

According to Goetz, "The way that you imagine, or envision, design, develop, roll out, and then run an AI capability, is not in a traditional IT technology fashion. It's not even in a traditional product fashion. It changes the organization, more than the organization changes its technology." These deployments require a mindfulness about how your organization operates, not how your organization uses a specific technology.

SEE: Securing IoT in your organization: 10 best practices (free TechRepublic PDF)

What's the difference between data architecture and information architecture?

Considering the quantity and quality of data is not quite enough to take full advantage of machine learning. The structures built around your data -- and the way your data is structured -- influences the extent to which you can effectively use machine learning. Data architecture applies "specifically to structured data," Goetz said. "Information architecture tends to look at things more holistically, regardless of the structure of the data. When thinking about information architecture, it's how do you bring together disparate data -- structured versus unstructured and semi-structured, and [harmonize] them… you want to take advantage of all possibilities that create appropriate and representative views of the world that AI is going to operate in."

Understanding data regulation and compliance requirements

Naturally, the extent to which data is regulated depends on the applicable jurisdictions in play: most of these regulations are not unique to machine learning; likewise, machine learning is not a shield, workaround, or otherwise a pass enabling organizations to flout data regulations. "Applicable jurisdictions" is also intentionally broad -- regulations like GDPR apply to American firms under specific (though broad) circumstances.

Bias is the primary issue when using machine learning. "In regulated industries such as banking and insurance, they've long faced regulatory oversight to ensure that decisions on loans, claims, policy issuance, and so on, are explainable and unbiased," Henschen told ZDNet. "As the use of ML and AI spread, I think we'll see more general interest and demand for explainability and transparency. Bias is not just a matter of the models; it's also a matter of the data."

Also see

Editorial standards