X
Business

7 ways to make sure your data is ready for generative AI

Industry leaders are concerned about whether enterprises can handle the huge data influx that is required to make the most of generative AI.
Written by Joe McKendrick, Contributing Writer
AI data blocks stacked together
Eugene Mymrin/Getty Images

Everyone wants to tap into the power of generative artificial intelligence (AI) and large language models, but there's a rub. Getting AI to meet its sky-high expectations takes viable, quality data -- and that's where many organizations are falling short. 

A recent McKinsey report, led by authors Joe Caserta and Kayvaun Rowshankish, points out there is unrelenting pressure to "do something with generative AI". However, that pressure comes with other issues: "If your data isn't ready for generative AI, your business isn't ready for generative AI."

Also: If AI is the future of your business, should the CIO be the one in control?

The report authors suggest IT and data managers "will need to develop a clear view of the data implications of generative AI." Data might be consumed through pre-existing services via application programming interfaces or a business' own models, which will require "a sophisticated data labeling and tagging strategy, as well as more significant investments."

Perhaps most challenging "is generative AI's ability to work with unstructured data, such as chats, videos, and code," according to Caserta and his team. "Data organizations have traditionally had capabilities to work with only structured data, such as data in tables."

Also: Businesses need a new operating model to compete in an AI-powered economy

This shift in data concerns means organizations need to rethink the overall data architecture supporting generative AI initiatives. "While this might sound like old news, the cracks in the system a business could get away with before will become big problems with generative AI. Many of the advantages of generative AI will simply not be possible without a strong data foundation," they caution.

Across the industry, increasing numbers of leaders are expressing concern about enterprises' ability to handle the huge data influx needed to manage emerging challenges such as generative AI. "Digital transformations, driven by relentless innovation and technological advancements mean a shift in how organizations operate," says Jeff Heller, VP of technology and operations at Faction, Inc. 

Also: 4 ways generative AI can stimulate the economy

"In this swiftly evolving environment, virtually every department, from research and development to daily operational functions, is experiencing a remarkable expansion, with the proliferation of devices and cutting-edge technologies."  

What's more, AI isn't the only factor driving the need for more effective and responsive data architectures. "Customers will continue to expect tailored services and communications, which of course rely heavily on accurate data," says Bob Brauer, founder and CEO of Interzoid. 

Also: 5 ways to sell your game-changing idea to the rest of the business

"A burgeoning reliance on analytics and visualization tools, vital for strategic decisions, will require a heavy dependence on data. And as artificial intelligence becomes more prominent, data becomes essential as the foundation for training these AI models."  

The message, suggests Heller, is clear -- the time has come for businesses to develop strategies and adopt advanced technologies to "ensure that data remains an invaluable asset rather than an overwhelming liability."

The experts suggest the following elements needs to be considered in order to prepare data for the fast-emerging era of AI:

  1. Establish a data governance strategy: "With the right priorities, staff, governance, tools and an executive mandate, enterprises can transform their data quality challenges from a liability to significant competitive advantage," says Brauer. A step toward gaining organizational support for the data behind AI and other initiatives could be the creation of a "task force, or the appropriate equivalent for various sizes of organizations, to study how the emerging innovation of generative AI, large language models, and other new AI-driven technologies can be applied to gain a competitive advantage." .    
  2. Establish a data storage strategy: Finding a place to put all that data -- and enabling it to be discoverable and accessible -- is an essential piece of the puzzle. Recent industry surveys find that "over half of all stored data -- 60% -- is inactive, meaning it is rarely or never accessed again," says Brian Pawlowski, chief development officer at Quantum. "Even so, businesses don't want to part with it since they understand the data may offer valuable solutions and business value in the years to come, especially given the advent of widespread generative AI usage." This conundrum calls for a re-evaluation of existing capabilities to "establish modern, automated storage architectures that allow people to easily access and work with both active and inactive data throughout its entire lifecycle," Pawlowski adds. 
  3. Ensure you have a data quality strategy: Preparing data architecture to handle new AI-powered demands needs to "start with making high levels of data quality a strategic priority," Brauer advises. "A good start would be the appointment of a chief data officer or equivalent role, with the budget and resources specifically for data quality initiatives."
  4. Ensure you measure progress: "Leadership priorities should include enterprise-wide data assessments, and establishing metrics and goals to measure success," Brauer says. 
  5. Ensure you deal with unstructured data capabilities: Data quality issues become more pronounced with generative AI models than classical machine-learning models "because there's so much more data and much of it is unstructured, making it difficult to use existing tracking tools," Caserta and the McKinsey team states. "Unstructured data represents about 90% of the data being created moving forward, and the worldwide capacity is growing 25% CAGR for the next five years," says Pawlowski. "This unstructured data is what's stored in files and objects: high resolution video and images, complex medical data, genome sequencing, the input to machine-learning models, captured scientific data about the natural world -- such as mapping oil and gas fields -- and reality simulation, including special effects, animation and augmented reality. It's critical that organizations deploy solutions that manage the lifecycle of data in a way that's automated and makes use of cutting-edge technologies, like AI, to help extract enhanced business value."  
  6. Build capabilities into the data architecture to support broad use cases: "Build relevant capabilities (such as vector databases and data pre- and post-processing pipelines) into the existing data architecture, particularly in support of unstructured data," Caserta and his co-authors point out.
  7. Employ AI to help build AI: "Use generative AI to help you manage your own data," the McKinsey team suggests. "Generative AI can accelerate existing tasks and improve how they're done along the entire data value chain, from data engineering to data governance and data analysis."

AI promises to do amazing things, but it takes well-managed data to get to the right destination.

Editorial standards