Stack Overflow joins Reddit and Twitter in charging AI companies for training data

The new changes could come as soon as the middle of the year.
Written by Maria Diaz, Staff Writer
Robot counting money
Bing Image Creator/ZDNET

The bill for companies specializing in artificial intelligence continues to grow: Stack Overflow joins Reddit and Twitter as another platform planning to start charging AI companies that want to use its data for training. 

AI models like those used to create ChatGPT, Google Bard, and Bing Chat, all require a massive dataset for training. The companies behind them, like OpenAI and Google, gather data from all over the internet to train their large language models (LLM) on parameters that result in successful natural language processing (NLP). 

Also: This new technology could blow away GPT-4 and everything like it

This training data includes different subjects, from world history to software development to build its "intelligence," as well as grammar, speech nuances, and styles derived from conversations to generate human-like responses.

According to reporting from Wired, Stack Overflow could begin charging AI companies this summer to access its forum with over 50 million questions and answers for training in AI projects.

Stack Overflow is a programming forum that offers a collaborative environment to its users, which are mostly developers. It's a popular place for programmers to ask about coding problems and programming language, and works as a learning resource for its over 20 million users.

Also: The best AI chatbots: ChatGPT and alternatives to try

In a recent post on the company's site, the Stack Overflow CEO, Prashanth Chandrasekar, explained that "allowing AI models to train on the data developers have created over the years, but not sharing the data and learnings from those models with the public in return, would lead to a tragedy of the commons."

The forum made headlines last fall for banning the use of ChatGPT-generated text to create posts, deeming the practice "harmful" to the site and its users. "Unless we all continue contributing knowledge back to a shared, public platform, we risk a world in which knowledge is centralized inside the black box of AI models that require users to pay in order to access their services," Chandrasekar added in the separate post. 

Editorial standards