The bot was programmed to use evolutionary strategy (ES) algorithms, which includes machine learning (ML) and allows the AI to learn, adapt, and change tactics depending on the situation and other players.
ES and reinforcement learning (RL), which is based on behavioral psychology and revolves around a simple reward system, have already been used to beat human players in games including Chess and Texas Hold 'em.
In the poker game, as AI learned how its competitors played, it was able to reach levels of "superhuman performance."
When it comes to Q*bert, however, the AI didn't seem to mind cheating to rack up those points.
As reported by The Register, researchers from the University of Freiburg, Germany, implemented ES in a gaming AI to compare the success of ES in comparison to RL.
In a paper, the researchers found that ES can beat RL in a number of cases.
Q*bert requires players to jump from cube to cube in order to change their color while avoiding obstacles and enemies in order to progress to the next round.
However, the AI was able to find and exploit a bug which caused the platforms to blink, allowing it to bounce and rack up close to a million points.
Speaking to the publication, co-authors of the paper Patryk Chrabaszcz and Frank Hutter said that the AI was fed roughly 1.5 million parameters to create the ES system and as high scores were the goal, exploiting the ancient bug was the most natural path to take.
"To find the bug, the agent had to first learn to almost complete the first level -- this was not done at once but using many small improvements," the researchers said. "We suspect that at some point in the training one of the offspring solutions encountered the bug and got a much better score compared to its siblings, which in turn increased its contribution to the update -- its weight was the highest one in the weighted mean."
The AI's exploration of different paths and tactics during tests did not always find and exploit this bug, but in eight out of 30 tests, the exploit was used. RL systems used as a comparison did not reach the high scores of its ES counterpart.
However, the latter tended to outperform ES in racing and shooting games where understanding context, rather than patterns, was crucial.