New research from OpenAI and Google's Brain offer novel approaches to reinforcement learning, the technique for developing computer programs to win at video games. They have some fascinating reflections on the nature of the worlds they're building and exploring.
Another version of the poet Yeat's famous question, in the context of artificial intelligence, might be: Are you analyzing the game or are you analyzing the player?
New research reports out Monday explore new aspects of reinforcement learning, the AI system by which "agents," the computer competitors in games, learn to win in goal-driven scenarios.
Both studies are learning something about intelligence but also something about the environment of games that shapes intelligence, and how the two are intertwined.
Interns Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch at the non-profit OpenAI foundation developed an "open-world" video game in which agents try to stay alive by fighting for scarce resources. Their system reflects the belief that complex online games are the closest to the "real world" in terms of producing complex behavior in populations.
And machine learning scientists Łukasz Kaiser and colleagues at Google's Brain unit developed a faster way to train agents to master the basics of classic 1980s Atari arcade games such as Pong, Freeway, and Battle Zone. Their intuition is that by creating a model of the games, a computer can predict some of the basics of the games in ways that humans manage to predict game physics within minutes.
The former approach prizes complexity, moving beyond simple toy challenges, while the latter work values efficiency of insight.
In both cases, the choices are determined to an extent by the games they've selected, so that notions of AI are shaped by the choice of challenge.
The OpenAI research, Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents, posted on the arXiv pre-print server, offers a virtual world made up of a grid of tiles that have resources in them such as water to drink and vegetation to pluck.
Agents move through the virtual world, fighting one another to get the precious resources to survive. It's a familiar scenario going back to early projects such as the "artificial life" simulations of David Ackley and Michael Littman of Bellcore in 1991.
That study was at the dawn of reinforcement learning's application to artificial worlds. The OpenAI study goes much further, testing as many as 100 million agent "lifetimes," taking 100 GPU cores one week to compute.
The authors emphasize that they wanted to create something like the "massively multiplayer online role-playing games," or MMORPG, because, as they see it, "only MMOs contextualize [the learning curriculum] within persistent social and economic structures approaching the scale of the real world." It's all about scale, in other words.
They find some neat things, such as that fighters in this world do better than agriculturalists, given "all of the populations trained with combat handily outperform all of the populations trained with only foraging." On a broader level, the more agents playing at once in the game, the more that each individual agent explores new areas of the game grid, looking for resources with less competition.
They also found the agents start to develop individual skills, "niches," as the authors put it, thus diversity goes up with bigger and bigger numbers of agents. "The presence of other populations force agents to discover a single advantageous skill or trick," they write.
Perhaps the most interesting part of this paper, however, is the self-reflective passages toward the end. The authors acknowledge the MMORPG may not be the only kind of simulation that can model real-world learning. But they insist it is the one that has proven to work in terms of encouraging development.
"While some may see our efforts as cherrypicking environment design," they write, "we believe this is precisely the objective: the primary goal of game development is to create complex and engaging play at the level of human intelligence.
The Google research paper, Model Based Reinforcement Learning for Atari, also posted on arXiv, uses a combination of convolutional neural networks, long short-term memory, and full-connected neural networks to create a simulation of the Atari game that predicts future frames after every four frames.
While other researchers have used neural networks to predict frames in video, the authors write, none of that work ever led to ways to play the game competitively.
The trick in case of Kaiser and team-mates was not to train the neural network on images of the game from the "replay buffer," meaning, an actual game history, but instead to construct an imagined "world," based on those frame-by-frame predictions of game play.
This "model-based" approach, the authors write, "is more sample-efficient than a highly tuned Rainbow baseline on almost all games, requires less than half of the samples on more than half of the games and, on Freeway is more than 10x more sample-efficient." Specifically, the best scores, once the neural network was tested on a real game, were better on almost every game out of the total 26 Atari games, when training was restricted to just 100,000 "time steps" in the game, about two hours of game play, the authors estimate.
To the authors, the agent is learning that these Atari games have somewhat predictable physics, which the simulated world is capturing and which the older neural networks don't capture.
As they put it, it's a bit how humans quickly figure out the basics and master such video games in minutes. "Human players can learn to play Atari games in minutes. Humans possess an intuitive understanding of the physical processes that are represented in the game: we know that planes can fly, balls can roll, and bullets can destroy aliens." (The paper also has a nice blog post.)
Both papers offer intriguing possibilities as the authors continue to explore the world's they've either created, in the case of OpenAI, or simulated, in the case of Google. The OpenAI team notes that future research should reflect styles of combat for each agent that depend on how another agent is fighting. "We believe that the learned targeting setting is likely to useful for investigating the effects of concurrent learning in large populations."
And in the case of the Google group, they haven't yet been able to turn that fast early learning of the games into game play that is competitive over long stretches of playing. They hypothesize that their simulated models of the world have more information to yield about the game that will enhance future results.
Best of MWC 2019: Cool tech you can buy or pre-order this year