After mastering dozens of 2D Atari games, and whopping humans at Go, Google's DeepMind artificial intelligence (AI) is now taking on new 3D navigation and puzzle-solving games.
One of these new games that DeepMind's AI agents are tackling is ant soccer, in which it's learned how to chase down a ball, dribble, and then score a goal.
What's impressive, DeepMind's David Silver explained in a blogpost, is that its AI is capable of solving the ant soccer challenge "without any prior knowledge of the dynamics", reflecting recent advances it's made in 'reinforcement learning' (RL), or learning through trial and error.
To get these results, DeepMind has combined RL with deep learning of neural networks and its Deep Q-Network (DQN), an algorithm that stores a bot's experiences and estimates rewards it can expect after taking a particular action.
It was this algorithm that allowed it to master dozens of 2D games on an Atari 2600, but Silver says it's now developed a far better version of the algorithm.
For example, it can now train a single neural network to learn multiple Atari games. The technique is also being used to power recommendations in Google.
"We have also built a massively distributed deep RL system, known as Gorila, that utilises the Google Cloud platform to speed up training time by an order of magnitude; this system has been applied to recommender systems within Google," Silver noted.
However, the ability to learn how to play soccer comes from DeepMind's newly developed 'asynchronous actor-critic algorithm, A3C', which the company set out in a revised paper last week, demonstrating that it could use standard multi-core CPUs to surpass GPU-based algorithms at solving motor-control problems and navigating random 3D mazes using visual input.
"It achieves state-of-the-art results, using a fraction of the training time of DQN and a fraction of the resource consumption of Gorila," Silver noted.
DeepMind is testing this against Labyrinth, "a challenging suite of 3D navigation and puzzle-solving environments". The agents only use visual cues to figure out the map to "discover and exploit" rewards, Silver said.
"Amazingly, the A3C algorithm achieves human-level performance, out of the box, on many Labyrinth tasks," he wrote.