How can we know the dancer from the dance?
Another version of the poet Yeat's famous question, in the context of artificial intelligence, might be: Are you analyzing the game or are you analyzing the player?
New research reports out Monday explore new aspects of reinforcement learning, the AI system by which "agents," the computer competitors in games, learn to win in goal-driven scenarios.
Both studies are learning something about intelligence but also something about the environment of games that shapes intelligence, and how the two are intertwined.
Interns Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch at the non-profit OpenAI foundation developed an "open-world" video game in which agents try to stay alive by fighting for scarce resources. Their system reflects the belief that complex online games are the closest to the "real world" in terms of producing complex behavior in populations.
And machine learning scientists Łukasz Kaiser and colleagues at Google's Brain unit developed a faster way to train agents to master the basics of classic 1980s Atari arcade games such as Pong, Freeway, and Battle Zone. Their intuition is that by creating a model of the games, a computer can predict some of the basics of the games in ways that humans manage to predict game physics within minutes.
The former approach prizes complexity, moving beyond simple toy challenges, while the latter work values efficiency of insight.
In both cases, the choices are determined to an extent by the games they've selected, so that notions of AI are shaped by the choice of challenge.
The OpenAI research, Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents, posted on the arXiv pre-print server, offers a virtual world made up of a grid of tiles that have resources in them such as water to drink and vegetation to pluck.
Agents move through the virtual world, fighting one another to get the precious resources to survive. It's a familiar scenario going back to early projects such as the "artificial life" simulations of David Ackley and Michael Littman of Bellcore in 1991.
That study was at the dawn of reinforcement learning's application to artificial worlds. The OpenAI study goes much further, testing as many as 100 million agent "lifetimes," taking 100 GPU cores one week to compute.
The authors emphasize that they wanted to create something like the "massively multiplayer online role-playing games," or MMORPG, because, as they see it, "only MMOs contextualize [the learning curriculum] within persistent social and economic structures approaching the scale of the real world." It's all about scale, in other words.
They find some neat things, such as that fighters in this world do better than agriculturalists, given "all of the populations trained with combat handily outperform all of the populations trained with only foraging." On a broader level, the more agents playing at once in the game, the more that each individual agent explores new areas of the game grid, looking for resources with less competition.
They also found the agents start to develop individual skills, "niches," as the authors put it, thus diversity goes up with bigger and bigger numbers of agents. "The presence of other populations force agents to discover a single advantageous skill or trick," they write.
This is a bit reminiscent of some of the findings of Google's DeepMind unit as they developed the "AlphaStar" system to play at the MMORPG StarCraft, where the development of niche skills was noted. In fact, the authors note a connection between their work and earlier work by the DeepMind team on sampling different populations for superiority.
Perhaps the most interesting part of this paper, however, is the self-reflective passages toward the end. The authors acknowledge the MMORPG may not be the only kind of simulation that can model real-world learning. But they insist it is the one that has proven to work in terms of encouraging development.
"While some may see our efforts as cherrypicking environment design," they write, "we believe this is precisely the objective: the primary goal of game development is to create complex and engaging play at the level of human intelligence.
"The player base then uses these design decisions to create strategies far beyond the imagination of the developers."
(A blog post by OpenAI has a nice blog post on the research as well, with videos of gameplay.)
The Google research paper, Model Based Reinforcement Learning for Atari, also posted on arXiv, uses a combination of convolutional neural networks, long short-term memory, and full-connected neural networks to create a simulation of the Atari game that predicts future frames after every four frames.
While other researchers have used neural networks to predict frames in video, the authors write, none of that work ever led to ways to play the game competitively.
In this project, the Google team was able to beat the benchmark "Rainbow" neural network developed by Google's DeepMind unit in 2017, and also OpenAI's "PPO" approach in 2017, both of which represent the state of the art in reinforcement learning.
The trick in case of Kaiser and team-mates was not to train the neural network on images of the game from the "replay buffer," meaning, an actual game history, but instead to construct an imagined "world," based on those frame-by-frame predictions of game play.
This "model-based" approach, the authors write, "is more sample-efficient than a highly tuned Rainbow baseline on almost all games, requires less than half of the samples on more than half of the games and, on Freeway is more than 10x more sample-efficient." Specifically, the best scores, once the neural network was tested on a real game, were better on almost every game out of the total 26 Atari games, when training was restricted to just 100,000 "time steps" in the game, about two hours of game play, the authors estimate.
To the authors, the agent is learning that these Atari games have somewhat predictable physics, which the simulated world is capturing and which the older neural networks don't capture.
- 'AI is very, very stupid,' says Google's AI leader (CNET)
- How to get all of Google Assistant's new voices right now (CNET)
- Unified Google AI division a clear signal of AI's future (TechRepublic)
- Top 5: Things to know about AI (TechRepublic)
As they put it, it's a bit how humans quickly figure out the basics and master such video games in minutes. "Human players can learn to play Atari games in minutes. Humans possess an intuitive understanding of the physical processes that are represented in the game: we know that planes can fly, balls can roll, and bullets can destroy aliens." (The paper also has a nice blog post.)
Both papers offer intriguing possibilities as the authors continue to explore the world's they've either created, in the case of OpenAI, or simulated, in the case of Google. The OpenAI team notes that future research should reflect styles of combat for each agent that depend on how another agent is fighting. "We believe that the learned targeting setting is likely to useful for investigating the effects of concurrent learning in large populations."
And in the case of the Google group, they haven't yet been able to turn that fast early learning of the games into game play that is competitive over long stretches of playing. They hypothesize that their simulated models of the world have more information to yield about the game that will enhance future results.
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.