Google's StarCraft II victory shows AI improves via diversity, invention, not reflexes
Google's added some new tricks to its machine learning system to explore goal-directed behavior, specifically in the video game StarCraft II. It added what's called "meta-games," where a neural network learns to select out the best attributes of multiple competitors as they are developed across numerous training games.
How well machines do against humans in competitive situations may not be the typical things you'd expect, such as response time, but rather the ability to maximize good choices through long experience.
That's one of the takeaways from the Dec. 19 match-up in the real-time strategy computer game StarCraft II between a computer, AlphaStar, developed by Google, against a human champion, Poland's Grzegorz Komincz, known by his gamer handle MaNa.
AlphaStar came back from many losses in 2017 to roundly trounce MaNa by five games to zero in the December match. "The first system to beat a top [human] pro," as AlphaStar's creators tweeted on Thursday.
The critical difference may be a strategy of training AlphaStar that employed new "meta-game" techniques for cultivating a master player.
The machine is not faster than humans at taking actions. In fact, its average number of actions in StarCraft II is 280 per minute, "significantly lower than the professional [human] players."
Instead, its strength seems to be coming up with novel strategies or unusual twists on existing strategies by amassing knowledge over many games. Google's DeepMind team used a novel "meta-game" approach to train their network, building up a league of players over thousands and thousands of simultaneous training matches, and then selecting the optimal player from the results of each.
StarCraft II, the latest in the StarCraft franchise from Santa Monica-based video game maker Activision-Blizzard, requires players to martial workers who move through a two-dimensional terrain, gathering resources such as minerals, constructing buildings, and assembling armies, to achieve dominance against other players. The game first came out in 1998 and has been a tournament game ever since.
It's been a hotbed of AI innovation, because Google and others see in the game several factors that make it much more challenging than other video games, and classic strategy games such as Chess or Go. These include the fact StarCraft has a "fog of war" aspect, in that each players, including the AI "agents" being developed, have limited information because they can not see aspects of the terrain where their opponents may have made progress.
In 2017, when Google's DeepMind unit, and programers at Blizzard published their initial work, they wrote that they were able to get their algorithms to play the game "close to expert human play" but that they couldn't even teach it to beat the built-in AI that ships with StarCraft.
At its core, AlphaStar, like the 2017 version, still is based on a deep learning approach made of what are known as recurrent neural networks, or RNNs, which maintain a sort of memory of previous inputs, which allows them to build upon knowledge amassed over the course of training the neural network.
The authors, however, augmented the typical "long short-term memory," or LSTM, neural network with something called a "transformer," developed by Google's Ashish Vaswani and colleagues in 2017. It is able to move a "read head" over different parts of a neural network to retrieve prior data selectively. There are a whole bunch of new things like this.
But one of the most provocative ways the game plan has changed is incorporating an approach to culling the best players, called "Nash averaging," introduced last year by David Balduzzi and colleagues at DeepMind. The authors observed that neural networks have a lot of "redundancy," meaning, "different agents, networks, algorithms, environments and tasks that do basically the same job." Because of that, the Nash average is able to kind of selectively rule out, or "ablate," the redundancies to reveal fundamental underlying advantages of a particular AI "agent" that plays a video game (or does any task).
As Balduzzi and colleagues wrote in their paper, "Nash evaluation computes a distribution on players (agents, or agents and tasks) that automatically adjusts to redundant data. It thus provides an invariant approach to measuring agent-agent and agent-environment interactions."
Nash averaging was used to pick out the best of AlphaStar's players over the span of many games. As the AlphaStar team write, "A continuous league was created, with the agents of the league - competitors - playing games against each other […] While some new competitors execute a strategy that is merely a refinement of a previous strategy, others discover drastically new strategies."
But it's not just electing one player who shines, the Nash process is effectively crafting a single player that fuses all the learning and insight of the others. The final AlphaStar agent consists of the components of the Nash distribution -- in other words, the most effective mixture of strategies that have been discovered."
Key is that the training of all these competitors affords each AI agent unique goals and objectives, so that the number of possible solutions to the game explored expands steadily. It's a kind of survival of the fittest of video games, with the players that go up against humans benefitting from rapid evolution in the months of game play.
In echoes of what happened with Go, where DeepMind's AlphaGo was able to invent totally novel strategies, champ MaNa is quoted as saying, "I was impressed to see AlphaStar pull off advanced moves and different strategies across almost every game, using a very human style of gameplay I wouldn't have expected."
It will be interesting to see, when the paper comes out, whether, as Hassabis and colleagues promise, this mash up of various machine learning techniques produces dividends in other fields of research. As they write in the post, "We believe that this advanced model will help with many other challenges in machine learning research that involve long-term sequence modelling and large output spaces such as translation, language modelling and visual representations."
Scary smart tech: 9 real times AI has given us the creeps