Innovation

Generative AI fails in this very common ability of human thought

OpenAI's GPT-3.5 and GPT-4 both fell down when challenged with greater and greater levels of this basic computer task, pointing to a fundamental omission in their design.

Written by Tiernan Ray, Senior Contributing Writer Jan. 31, 2024 at 5:01 p.m. PT

matryoshka-smaller-size — The famous "matryoshka" dolls, where every doll has another doll nested inside of it, and another and another, is a classic example of the principle of recursion, where the thing contains itself. People instantly recognize such examples of recursion, but generative artificial intelligence has no concept of it.
Tiernan Ray for ZDNET

Among the sublime concepts that humans grasp intuitively is the concept of recursion, where something contains a kind of repetition of itself stretching out to infinity. Any "movie within a movie" could be an example of recursion. The famous Matryoshka "nesting dolls" from Russia, where each wooden doll opens to reveal a smaller doll, is an endearing example of recursion.

A picture of yourself holding a picture of yourself in which you are holding a picture of yourself, ad infinitum, is a classic recursive visual fantasy that creates delight in anyone.

Also: I'm taking AI image courses for free on Udemy with this little trick - and you can too

But artificial intelligence programs, including generative AI programs such as ChatGPT, cannot deal with recursion, at least, not consistently, according to scholars at the University of Illinois Urbana-Champaign. The inability to handle recursion hampers AI's performance in tasks. In particular, AI can fail in programming tasks where some bit of code repeats a smaller, atomic version of the same code.

"For example, models trained for programming tasks may misunderstand programs with nested blocks or produce imbalanced brackets," according to lead author Shizhuo Dylan Zhang of U of Illinois and colleagues in the paper, "Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion," posted on the arXiv pre-print server this month.

To examine how Gen AI gets stuck by recursion, the authors devised a simple test for large language models (LLMs). Given a piece of code -- such as adding numbers recursively -- can an LLM produce a description of the pattern of recursion in the code?

Zhang and team choose a classic programming task, called a tree traversal. A data type in computing is called a tree type if it has branching parts that lead to further branches -- kind of tree that contains more trees. To "traverse" the tree means to inspect each part of the tree in a certain order.

Also: 2024 may be the year AI learns in the palm of your hand

Into the language model -- in this case, GPT-3.5 "Turbo" and GPT-4 (OpenAI's most powerful models) -- they enter a prompt that has the problem statement and an example of the tree traversal and an instruction to perform tree traversal. The output from the GPTs should be the solution to the tree traversal and also an explanation of the rule used to perform that traversal.

What they find is the programs start to fail in solving the traversal as the tree gets bigger, or -- in computing terms -- has greater depth. "The performance of large language models exhibits a significant decline as the tree depth increases, correlating with a rise in the complexity and number of reduction steps required," they write.

A qualitative analysis, they write, reveals that the language models cannot carry out the right "reduction," meaning, replacing some element in the tree with the recursive element it contains, to successfully continue the traversal. "This type of error underscores the challenges LLMs face in maintaining algorithmic consistency over extended sequences, particularly when the task requires adherence to a precise order of operations," they write.

Also: How Apple's AI advances could make or break the iPhone 16

Not only do the LLMs fail to properly solve some traversals, but the programs fail to explain, given an example in the prompt, what the rule is that might be used for a successful traversal.

"LLMs struggle to find the correct rules from in-context examples," write Zhang and team, "and fail in interesting ways when performing in-context step-wise reduction."

What does it mean that the programs can't handle recursion?

Also: For the age of the AI PC, here comes a new test of speed

At one level, lacking the ability to work with recursion speaks to a lack of logic, write Zhang and team. "Performing each reduction step in a tree traversal requires more sophisticated logical reasoning," and that's just not in the programs.

However, even humans who don't have a very good grasp of formal logic can instantly appreciate the kind of recursion that occurs in the "movie within a movie" type and other examples.

On a much simpler level, the design of the GPTs needs to be reconsidered by their creators, the authors suggest.

"We hypothesize the fundamental cause [of failures] is that the model has not been optimized to represent recursive patterns well enough," they write. "Our in-context learning experiments revealed the lack of ability for LLMs to 'think recursively' as it tends to mine non-recursive rules from data.

"Also, the models lack accuracy when performing step-wise reduction, where they may get lost in the middle or fail to apply the stopping condition."

Also: How SuperDuperDB delivers an easy entry to AI apps

Left out of the study are all the ways that the inability to handle recursion may hamper the performance of Gen AI on various tasks, from essay writing to computer programming. It's true that, as Zhang and team note, the programs can compensate, up to a point, with little tricks they pick up.

And yet, recursion is such a fundamental phenomenon that at some point, its absence is bound to have consequences for the programs.

Artificial Intelligence

Editorial standards

Show Comments

Generative AI fails in this very common ability of human thought

Artificial Intelligence

Related

Google's DeepMind AI takes home silver medal in complex math competition

The best robot mops you can buy: Expert tested

The flagship Roborock S7 Mav Ultra robot vacuum mop is still $500 off after Prime Day