Forget Go, Google helps AI learn to book flights on the Web
Researchers at Google's AI labs created a couple of novel neural networks that can succeed in navigating Web forms, such as an online flight-booking site. Although baby steps at the moment, the program succeeds as well or better than some models trained using human demonstrations of pointing and clicking.
That's the intriguing prospect raised by the latest research from Google AI investigators.
In a new paper from the team, they trained a neural network to understand the structure of Web pages and the choices it can make when filling out forms in an airline ticket booker, or interacting with a social media site.
The work broadly employs the same category of machine learning as Google's Go-winning AlphaZero software, what is known as "reinforcement learning." In RL, a neural network develops strategies of steps to take at each stage of trying to solve a problem as it receives rewards for good choices.
The researchers figured out a way to train a neural network without being given human examples of how to navigate an online booking form. The approach makes the task of learning webpages and social media networks more "scalable," they write, where the possible combinations of states and actions can reach into the tens of millions.
The point is not necessarily to actually book a flight; it's more an exercise in how a neural network can find solutions to a problem with numerous variables, where human guidance, or "supervision," in training is infeasible.
This is more than just bots to crawl the Web. The authors describe the problem as being intractable when "learning from large set of instructions" that can include fields of a Web form that have to be filled out, and long lists of things in the kind of drop-down menu picker a person would encounter on a flight booking site.
"As an example, in the flight-booking environment the number of possible instructions/tasks can grow to more than 14 millions, with more than 1700 vocabulary words and approximately 100 Web elements at each episode."
The work picks up where another left off, last year's "World of Bits," by Tianlin Shi and colleagues at Stanford University. That paper tested the ability of a computer to learn to carry out mouse clicks and keyboard strokes to complete tasks on the Web, based on demonstrations provided by people.
Like the authors of that paper, the Google folks employ reinforcement learning, in this case the "Deep Q-Network" approach, where the neural network adjusts its estimation of future rewards as it steps through problem tasks, making choices.
But the Google researchers couldn't use human demonstrations, as in World of Bits case, so they came up with what they assert are two "novel neural network architectures."
The first, "QWeb," is a Deep Q-Network that is enhanced by breaking up the webpage into rewards for each step in a travel booking exercise, such as entering the date of a flight. That tends to increase the rewards that the neural net receives as it goes along.
The second, called "INET," is another Deep Q-Network that gets rewards as it properly generates instructions for QWeb to follow. It's the INET's job to digest the Web page, in the form of a "document-object model," or "DOM," and come up with the steps QWeb should take to make choices in the Web form, such as picking an airport code from a drop-down list of "destinations" in the form.
There are numerous other details where the authors tried things a little bit differently from previous approaches. For example, they used a technique called "curriculum learning," to break down big tasks into smaller ones, to help the neural net get through the multiple steps of a Web form.
They also used what are known as "shallow encodings," to enhance the neural net's understanding of the webpage. That way, it doesn't just see a vast list of airport names, it also acquires some sense of the structure of the webpage it's on.
The authors report that when they compared their results against those of the Stanford group, they could match its human-driven examples just as well with no human demonstrations on simple tasks such as clicking on a dialogue box, or logging in a user in a form.
In more complex tasks, tests developed by the Stanford group as a benchmark, referred to as "social-media-all," the computer must do things like block a given user on Twitter. The Google researchers relate that their enhanced neural network was able to succeed "where previous approaches failed to generate any successful episodes."
In the challenge of booking a flight, the little tricks, they report, such as shallow encoding, helped the neural network achieve success each time. Without those little tricks, they note, their network behaved in a fashion that sounds like a bored Web surfer: "QWeb starts clicking submit button at first time step to get the least negative reward." Sounds just like an actual human experience of booking a flight online.
The authors write that they plan in future work to test their network in more complex environments with still more steps.
Perhaps they can teach it to figure out how to solve the captchas, as most humans seem often flummoxed by them.