Most of the time, artificial intelligence seems to live in two mutually exclusive realms, either in the academic world where amazing intellectual breakthroughs occur or in the industrial world where the focus is just to keep everything running on time.
However, there are hints that a kind of cross-pollination can occur between the two worlds when a problem in the industrial setting of machine learning can spark intriguing theoretical questions.
"This is a problem we stumbled upon that we would never have thought of in our academic offices," says Stefano Soatto. He is vice president of AI applied science at Amazon's AWS cloud computing service.
See also: Ethics of AI: Benefits and risks of artificial intelligence.
Soatto straddles the two environments of AI. At the same time that he runs applied AI at Amazon AWS, he is also a professor of computer science at UCLA, giving him a privileged position in which to participate in fundamental academic research into AI. Soatto's office is physically on the Caltech campus, where he got his PhD.
The problem Soatto was describing for ZDNet, in an interview via Amazon Chime, would sound familiar to any CIO or dev who is not an AI expert and who has to handle production hiccups.
"This actually arose out of an escalation with a customer," recalled Soatto.When an artificial intelligence program is put into production, it is one part in a whole chain of computer processes. With deep learning forms of AI, the values of the software can change as new and improved versions are produced. That can break things in the chain of processes, causing headaches for the customer.
"In reality, this is only a very small part of a production system," explained Soatto, referring to an AI program such as an image classifier. In the customer complaint, he related, the new technology was breaking stuff the customer was using that was much older. "Customers came back and said, 'we had problems with your model,' and we realized they're using a model from four years ago!"
Thus began a quest by Soatto and colleagues into a new realm of exploration, how to make newer AI programs compatible with their predecessors.
That customer complaint led to a paper this past June, presented at the CVPR conference, an academic gathering studying computer vision. Soatto and his team approached the issue of compatibility as a constraint-satisfaction problem, taking a neural net and asking it to have guarantees beyond just being accurate in making predictions.
Specifically, they asked if the neneural net versionet could get more accurate without introducing new errors. Think of a classifier of cats and dogs: if the new neural net gets better overall but suddenly mis-categorizes some pictures of cats or dogs that the old one got right, that's not a good trade-off. Making a mistake where a previous program was fine is called a "negative flip."
In the paper, "Positive-Congruent Training: Towards Regression-Free Model Updates," Soatto and colleagues solve the constraint-satisfaction problem of reducing such negative flips by setting the goal in a novel way. They took the traditional objective function, the so-called cross-entropy loss, which governs how well the neural net predicts the cat or dog, and they added to it a second objective function, requiring the neural net to make sure to do well on the same predictions the old neural net got right.
Moving beyond a single objective function, instead of framing the matter as solving a constraint satisfaction problem, Soatto and his team invented a new area of applied deep learning, which they have christened "Graceful AI."
The name is an umbrella term that encapsulates the principle that there are multiple goals in a problem.
"The models we develop have to play nicely with everything around them, not just train the best model you can," said Soatto. Another way to look at it is being "respectful for criteria beyond just optimizing for performance."
The paper demonstrates that the academic focus on producing the most highly performing AI program is not the only way to arrive at really interesting problems, Soatto told ZDNet.
See also: AI in sixty seconds.
"As an academic, you spend most of your time trying to invent problems that don't exist," observed Soatto. "Very rarely you get lucky, and you end up with something that's useful to the world."
Being at AWS, by contrast, "you get constantly exposed and bombarded with real problems that are fascinating that don't have a solution."
The positive-congruent work began in 2018, and the first solution that was arrived at went into service in AWS in 2020 and is currently running in the AWS cloud. Aspects of the research are employed across AWS products such as Amazon Comprehend, Amazon Lex, Amazon Rekognition, and Amazon Textract, said Soatto.
The practical result is that "any [AWS] customer who employs one of these models knows that from that point on, they will be able to ingest any subsequent improvement without having to change any of their post-processing."
In Soatto's view, every customer complaint is an opportunity to understand what may be intriguing questions, he said. When a customer complaint comes up, "There is something to be understood, something is not working the way we thought."
In the case of positive congruent, he said, "we stopped and asked, Why is it that we don't train models that are compatible with whatever is surrounding them?"
There is a payoff here for pure research. The practical question opened the door to deeper matters that touch upon theoretical issues, such as why machine learning is or is not able to generalize beyond the training data.
"The elephant in the room for machine learning is you really don't care how you do in the training set because you will never, ever see it again," said Soatto. "What you care about are a small number of errors in the test set, which is sequestered, and you don't see it."
The "schism" between the two is all about what are called inductive biases, the thing that "connects the test data, which you don't have access to, and the training data."
That, in turn, leads to the whole area of research in the AI field of what's called representation learning, something Amazon has been doing work on for many years, said Soatto.
"This is a problem that has been open and obsessing us for decades," he said. It goes back to the grandfathers of AI, Alan Turing and Norbert Wiener. The mystery of AI programs is a conundrum: "You cannot create information by torturing the data, but everything we do to data is torturing the data -- we do stuff to it."
Theoretical questions of learning representations get to the heart of what scientist Claude Shannon theorized as the very nature of information, said Soatto, to wit, What representations are optimal for a task, in terms of being maximally informative?
On that score, "there are some very strange and fascinating phenomena," said Soatto.
The positive-congruent work and other research that Soatto and his team have produced share certain themes. In particular, there is a current running through the works of equivalence, the question of what makes two neural nets the same or different for a given task.
The AI team's projects sometimes come down to searching for neural nets that have an overlap, despite differences. The exact nature of the overlap may not always be clear but is tantalizing nonetheless.
For example, in the positive-congruent paper, Soatto and the team tested what happens when old and new networks are developed as ensembles, groups of similar neural nets with varying hyper-parameters. They found they could "future-proof" old neural nets, in a way, because the ensemble of old neural nets collectively had less divergence from the new neural nets in terms of examples that went wrong.
"Ensembles are very interesting," said Soatto. "We have not solved it completely."
In another piece, "Compatibility-Aware Heterogenous Visual Search," Soatto and colleagues ask whether it's possible to develop neural nets that are more efficient in compute requirement while giving up the least amount of accuracy. They use a popular approach to automatically design neural nets, called "neural architecture search," and they set another constraint-satisfaction problem: the new network must be "compatible" with an existing neural net. A smaller neural net may save on processing as long as it produces a representation that is compatible with a larger neural network -- larger in terms of the number of parameters.
Soatto and colleagues have also inverted the question of similarity between neural nets by, for example, asking what happens to a neural net if a given sample of data is left out. In the paper "Estimating Informativeness Of Samples With Smooth Unique Information," they define the information value of a single sample in a data set by asking how the weight values of a neural net are different with and without that individual data point.
Again, the work has multiple theoretical implications. It points the way to possible bounds on the generalization capability of a neural net, meaning how well it can be applied to new examples beyond training data when making predictions in the real world. And the work can provide insight into how much information may "leak out" of a neural net or be disclosed about a given example. That theoretical question is also a crucial practical issue in terms of privacy because leaked information can potentially de-anonymize entities in a data set.
Much of the Graceful AI work has been done with computer vision types of problems, though Soatto notes "the framework is general, so the considerations apply to other forms of architectures."
All of these explorations into things such as backward compatibility touch upon a broad area of continued exploration, known as "continual learning." That challenge is "still an open problem," said Soatto.
The steady pace of publication by Soatto and colleagues is a change for Amazon, which wasn't always into publishing science. "When I joined, Amazon was not visible as a contributor to the open science community," he said. "That's changed."
Nowadays, every scientist working at Amazon has to publish and present, be vetted and to contribute.
"We want to get an opportunity to work on new problems that are impactful and meaningful and end up in the hands of thousands of developers," he said. "We hire people so that we can get them exposed to real problems that don't have a solution," he said. "Here we have people who join our team, and in six months, their work is in the hands of tens of thousands of people," something that is unheard of for most scientists.
The cross-pollination makes sense for Amazon, which funds programs at Caltech and other numerous academic institutions as part of cultivating talent in AI.
"Amazon realizes the importance of forming the talent of the next generation," he said. "If tech hires all the professors, who's going to form the next students?" he asked rhetorically.
Although the positive-congruent training has been implemented in AWS, Soatto and the team acknowledge the intriguing questions that aren't fully answered.
At the end of the paper, Soatto and colleagues write that the new training regimen still involves tricky trade-offs. The best solution, the ensembles, are impractical when making live predictions. But the simpler approach, known as focal distillation, brings with it some increase in error rate, which is undesirable.
The paper ends with the caveat that the authors have "only scratched the surface" of PC training. More work remains to be done.
As intriguing as the theoretical implications might be, Soatto is quick to emphasize the practical. The goal is what Amazon terms "customer-obsessed research," he said.
"These are not hypothetical academic questions," said Soatto, "These are questions that, if we are able to successfully address, they will really simplify the life of customers and developers who need to insert these models into their pipeline."