X
Innovation

ChatGPT answers more than half of software engineering questions incorrectly

You may want to stick to Stack Overflow for your software engineering assistance.
Written by Sabrina Ortiz, Editor
Person using ChatGPT on a laptop
June Wan/ZDNET

ChatGPT's ability to provide conversational answers to any question at any time makes the chatbot a handy resource for your information needs. Despite the convenience, a new study finds that you may not want to use ChatGPT for software engineering prompts.  

Before the rise of AI chatbots, Stack Overflow was the go-to resource for programmers who needed advice for their projects, with a question-and-answer model similar to ChatGPT's. 

Also: How to block OpenAI's new AI-training web crawler from ingesting your data

However, with Stack Overflow, you have to wait for someone to answer your question while with ChatGPT, you don't. 

As a result, many software engineers and programmers have turned to ChatGPT with their questions. Since there was no data showing just how efficacious ChatGPT is in answering those types of prompts, a new Purdue University study investigated the dilemma. 

To find out just how efficient ChatGPT is in answering software engineering prompts, the researchers gave ChatGPT 517 Stack Overflow questions and examined the accuracy and quality of those answers. 

Also: How to use ChatGPT to write code

The results showed that out of the 512 questions, 259 (52%) of ChatGPT's answers were incorrect and only 248 (48%) were correct. Moreover, a whopping 77% of the answers were verbose. 

Despite the significant inaccuracy of the answers, the results did show that the answers were comprehensive 65% of the time and addressed all aspects of the question. 

To further analyze the quality of ChatGPT responses, the researchers asked 12 participants with different levels of programming expertise to give their insights on the answers. 

Also: Stack Overflow uses AI to give programmers new access to community knowledge

Although the participants preferred Stack Overflow's responses over ChatGPT's across various categories, as seen by the graph, the participants failed to correctly identify incorrect ChatGPT-generated answers 39.34% of the time.  

Study graph
Purdue University

According to the study, the well-articulated responses ChatGPT outputs caused the users to overlook incorrect information in the answers. 

"Users overlook incorrect information in ChatGPT answers (39.34% of the time) due to the comprehensive, well-articulated, and humanoid insights in ChatGPT answers," the authors wrote. 

Also: How ChatGPT can rewrite and improve your existing code

The generation of plausible-sounding answers that are incorrect is a significant issue across all chatbots because it enables the spread of misinformation. In addition to that risk, the low accuracy scores should be enough to make you reconsider using ChatGPT for these types of prompts. 

Editorial standards