What is a 'AI drift' and why is it making ChatGPT dumber?

You may want to rethink using GPT-4, especially for math problems.
Written by Sabrina Ortiz, Editor
ChatGPT on a phone
NurPhoto/Contributor/Getty Images

Whether you have experienced it yourself using ChatGPT or read about it, the rumors are true, ChatGPT is getting progressively dumber. 

This phenomenon is especially perplexing because generative AI models use user input to continuously train themselves, which should make them more intelligent as they accumulate more user entries over time. 

Also: How to use ChatGPT to create an app

The answer may lie in a concept called "drift."

A "drift" refers to when large language models (LLMs) behave in unexpected or unpredictable ways that stray away from the original parameters. This may happen because attempts to improve parts of complicated AI models cause other parts to perform worse. 

Researchers from the University of California at Berkeley and Stanford University conducted a study to evaluate drifts and examine how ChatGPT's popular large language models (LLMs), GPT 3.5 (the LLM behind ChatGPT) and GPT-4 (the LLM behind Bing Chat and ChatGPT Plus) changed over time. 

Also: The best AI chatbots

The study compared the ability of both LLMs to solve math problems, answer sensitive questions, answer opinion surveys, answer multi-hop knowledge-intensive questions, perform code generation, US Medical License exams, and complete visual reasoning tasks in March and June.

Stanford University/UC Berkeley study
Stanford University/UC Berkeley

As seen by the study results above, GPT-4's March version outperformed the June version in many instances, with the most glaring being basic math prompts where the March version of GPT-4 outperformed the June version in both examples (a) and (b). 

GPT-4 also worsened at code generation, answering medical exam questions, and answering opinion surveys. All of these instances can be attributed to the drift phenomenon. 

Regarding the drifts, one of the researchers, James Zou told the Wall Street Journal, "We had the suspicion it could happen here, but we were very surprised at how fast the drift is happening."

Also: GPT-3.5 vs GPT-4: Is ChatGPT Plus worth its subscription fee?

Despite the deteriorating intelligence, there were also some instances of improvement in both GPT-4 and GPT-3.5. 

As a result, the researchers encourage users to keep using LLMs but to have caution when using them and constantly evaluate them. 

Editorial standards