Some more clues are leaking out about the description two weeks ago of Google's AlphaStar machine learning system that competes in the video game StarCraft.
A key element may be the mysterious "polytope."
What is a polytope? A Euclidean geometric figure of N dimensions, of which two-dimensional polygons and three-dimensional polyhedra are the familiar examples. The polytope is emerging as a way to think about the landscape of possible solutions in a game such as StarCraft.
There's no paper yet for AlphaStar, but following Google's blog post about the program on Jan. 24, clues began to emerge.
As mentioned in a separate post last week, AlphaStar builds upon work by Google's DeepMind group, specifically researcher David Balduzzi and colleagues, regarding something called "Nash averaging," where multiple computer agents that play the game against one another are surveyed by the neural network across multiple games. That survey finds different attributes that can be combined to create a kind of ideal player built from strengths of various agents in those multiple games. The exploration of players, what's referred to by Balduzzi and colleagues as the "gamescape," is expressed as a polytope.
Now, Google researchers have offered up another examination of the polytope, in a two papers released simultaneously late last week, one building upon the next.
The first paper, The Value Function Polytope in Reinforcement Learning, is written by Google Brain's Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, and Marc G. Bellemare, with Taïga also serving at Montreal's MILA organization for machine learning, and Schuurmans having an appointment at the University of Alberta. The paper is posted on the arXiv pre-print server.
Here's how the polytope works in Dadashi & Co.'s study. Reinforcement learning tasks such as AlpaStar often rely on computing what the future reward will be from taking a given action for a given state of affairs in the game. That state-action assessment is known as the value function. Finding the right function can be what lets the agent win the game.
Dadashi shows in the paper that all the value functions that can result from a set of different policies that an agent may use form a polytope. That's important because then one can see how different policies "move" through the polytope, until they land on an "optimal" value function that wins the game. The optimal value function is located at a certain corner of the polytope, so winning a game in a sense becomes a matter of navigating the polytope to the right corner the way you might walk through a room looking for something hidden in one corner.
It's easy to see how this work could inform Balduzzi & Co.'s Nash averaging: navigating the polytope for value functions could be replaced by navigating the polytope for ideal players of StarCraft.
The second Google paper takes the polytope of value functions and uses it to plumb something that may be more profound: the problem of "representations."
A key theme in AI from the beginning is whether a machine can "represent" its world. It's one thing for a machine learning system to solve a problem, it's another thing for there to be "intelligence" in what it does. The ability of a neural network to not just do tasks, but to depict aspects of the world around it in a way that leads to sophisticated abstractions about the world, is what in theory distinguishes AI from a mere mechanical system.
In the second paper, A Geometric Perspective on Optimal Representations for Reinforcement Learning, Dadashi and the other authors are joined by another Google Brain researcher, Pablo Samuel Castro, and two researchers from DeepMind, Will Dabney and Tor Lattimore, and Oxford U.'s Clare Lyle.
This time, Dadashi and colleagues say that the value functions that are at the corners of that polytope are "adversarial value functions," which just means they are the ones that are going to lead to a deterministic set of actions to win the game. Finding the adversarial value functions requires making a representation that "approximates" a given value function. A representation in this case is a combination of a "feature vector," a vector representing a given state in the game, and a weight vector that is adjustable through the familiar back-propagation technique. Getting closer to the corner where the value function is involves moving through the polytope in a way that minimizes the error rate between the approximation and the adversarial value function.
This has two important results. One, it makes reinforcement learning stronger by setting up multiple "auxiliary tasks" that direct the agent during the course of the game, rather than a single big reward function at the end.
And more important, solving those tasks makes the representation better and better. As the authors put it, "an agent that predicts AVFs, by themselves or concurrently with some primary objective, should develop a better state representation."
The authors tested out their work on a common AI task, the "four-room domain," where an agent has to navigate from one room in a two-dimensional grid world, moving around walls and going in and out of entryways, until it arrives at a corner designated as victory. They compare how representations found with the adversarial value function compare to representations invented at random. The authors write that the randomly chosen representations "capture the general distance to the goal but little else." In contrast, "the representation by AVF [adversarial value function] … exhibits beautiful structure," including showing things such as "focal points," and a "bias toward the goal-room."
- 'AI is very, very stupid,' says Google's AI leader (CNET)
- How to get all of Google Assistant's new voices right now (CNET)
- Unified Google AI division a clear signal of AI's future (TechRepublic)
- Top 5: Things to know about AI (TechRepublic)
"All in all, our results demonstrate that the AVF method can learn surprisingly rich representations," they write.
None of this immediately pertains to AlphaStar, of course. But it suggests a new, higher level of abstraction in searching for policies to solve a game by first thinking hard about how the computer represents what its choices are.
Expect, therefore, to see the polytope popping up more and more in research from Google's and others. It may be mysterious in some senses, but it seems to work in practice, at least on some tasks, and it opens up a new avenue for understanding representations. It also expands the debate over what those representations mean as far as actual intelligence.
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.