In the State of AI Report 2020, Benaich and Hogarth outdid themselves. While the structure and themes of the report remain mostly intact, its size has grown by nearly 30%. This is a lot, especially considering their 2019 AI report was already a 136 slide long journey on all things AI.
The State of AI Report 2020 is 177 slides long, and it covers technology breakthroughs and their capabilities, supply, demand, and concentration of talent working in the field, large platforms, financing, and areas of application for AI-driven innovation today and tomorrow, special sections on the politics of AI, and predictions for AI.
ZDNet caught up with Benaich and Hogarth to discuss their findings.
AI democratization and industrialization: Open code and MLOps
We set out by discussing the rationale for such a substantial contribution, which Benaich and Hogarth admitted to having taken up an extensive amount of their time. They mentioned their feeling is that their combined industry, research, investment, and policy background and currently held positions give them a unique vantage point. Producing this report is their way of connecting the dots and giving something of value back to the AI ecosystem at large.
Coincidentally, Gartner's 2020 Hype cycle for AI was also released a couple of days back. Gartner identifies what it calls 2 megatrends that dominate the AI landscape in 2020: Democratization and industrialization. Some of Benaich and Hogarth's findings were about the massive cost of training AI models, and the limited availability of research. This seems to contradict Gartner's position, or at least imply a different definition of democratization.
Benaich noted that there are different ways to look at democratization. One of them is the degree to which AI research is open and reproducible. As the duo's findings show, it is not: only 15% of AI research papers publish their code, and that has not changed much since 2016.
Hogarth added that traditionally AI as an academic field has had an open ethos, but the ongoing industry adoption is changing that. Companies are recruiting more and more researchers (another theme the report covers), and there is a clash of cultures going on as companies want to retain their IP. Notable organizations criticized for not publishing code include OpenAI and DeepMind:
"There's only so close you can get without a sort of major backlash. But at the same time, I think that data clearly indicates that they're certainly finding ways to be close when it's convenient," said Hogarth.
As far as industrialization goes, Benaich and Hogarth pointed towards their findings in terms of MLOps. MLOps, short for machine learning operations, is the equivalent of DevOps for ML models: Taking them from development to production, and managing their lifecycle in terms of improvements, fixes, redeployments, and so on.
Some of the more popular and fastest-growing Github projects in 2020 are related to MLOps, the duo pointed out. Hogarth also added that for startup founders, for example, it's probably easier to get started with AI today than it was a few years ago, in terms of tool availability and infrastructure maturity. But there is a difference when it comes to training models like GPT3:
"If you wanted to start a sort of AGI research company today, the bar is probably higher in terms of the compute requirements. Particularly if you believe in the scale hypothesis, the idea of taking approaches like GPT3 and continuing to scale them up. That's going to be more and more expensive and less and less accessible to new entrants without large amounts of capital.
The other thing that organizations with very large amounts of capital can do is run lots of experiments and iterates in large experiments without having to worry too much about the cost of training. So there's a degree to which you can be more experimental with these large models if you have more capital.
Obviously, that slightly biases you towards these almost brute force approaches of just applying more scale, capital and data to the problem. But I think that if you buy the scaling hypothesis, then that's a fertile area of progress that shouldn't be dismissed just because it doesn't have deep intellectual insights at the heart of it."
How to compete in AI
This is another key finding of the report: huge models, large companies, and massive training costs dominate the hottest area of AI today: NLP (Natural Language Processing). Based on variables released by Google et. al., research has estimated the cost of training NLP models at about $1 per 1000 parameters.
That means that a model such as OpenAI's GPT3, which has been hailed as the latest and greatest achievement in AI, could have cost tens of millions to train. Experts suggest the likely budget was $10 million. That clearly shows that not everyone can aspire to produce something like GPT3. The question is: Is there another way? Benaich and Hogarth think so and have an example to showcase.
PolyAI is a London-based company active in voice assistants. They produced and open-sourced a conversational AI model (technically, a pre-trained contextual re-ranker based on transformers) that outperforms Google's BERT model in conversational applications. PolyAI's model not only performs much better than Google's, but it required a fraction of the parameters to train, meaning also a fraction of the cost.
The obvious question is: How did PolyAI do it? This could be an inspiration for others, too. Benaich noted that the task of detecting intent and understanding what somebody on the phone is trying to accomplish by calling is solved in a much better way by treating this problem as what is called a contextual re-ranking problem:
"That is, given a kind of menu of potential options that a caller is trying to possibly accomplish based on our understanding of that domain, we can design a more appropriate model that can better learn customer intent from data than just trying to take a general purpose model -- in this case BERT.
BERT can do OK in various conversational applications, but just doesn't have kind of engineering guardrails or engineering nuances that can make it robust in a real world domain. To get models to work in production, you actually have to do more engineering than you have to do research. And almost by definition, engineering is not interesting to the majority of researchers."
Long story short: You know your domain better than anyone else. If you can document and make use of this knowledge, and have the engineering rigor required, you can do more with less. This once more pointed to the topic of using domain knowledge in AI. This is what critics of the brute force approach, also known as the "scaling hypothesis," point to.
What the proponents of the scaling hypothesis seem to think, simplistically put, is that intelligence is an emergent phenomenon relating to scale. Therefore, by extension, if at some point models like GPT3 become large enough, complex enough, the holy grail of AI, and perhaps science and engineering at large, artificial general intelligence (AGI), can be achieved.
On the way to general AI?
How to make progress in AI, and the topic of AGI, is at least as much about philosophy as it is about science and engineering. Benaich and Hogarth approach it in a holistic way, prompted by the critique to models such as GPT3. The most prominent critic of approaches such as GPT3 is Gary Marcus. Marcus has been consistent in his critique of models predating GPT3, as the "brute force" approach does not seem to change regardless of scale.
Benaich referred to Marcus' critique, summing it up. GPT3 is an amazing language model that can take a prompt and output a sequence of text that is legible and comprehensible and in many cases relevant to what the prompt was. What's more, we should add, GPT3 can even be applied to other domains, such as writing software code for example, which is a topic in and of its own.
However, there are numerous examples where GPT3 is of course, either in a way that expresses bias, or it just produces irrelevant results. An interesting point is how we can measure the performance of models like GPT3. Benaich and Hogarth note in their report that existing benchmarks for NLP, such as GLUE and SuperGLUE are now being aced by language models.
These benchmarks are meant to compare the performance of AI language models against humans at a range of tasks spanning logic, common sense understanding, and lexical semantics. A year ago, the human baseline in GLUE was beaten by one point. Today, GLUE is reliably beat, and its more challenging sibling SuperGLUE is almost beat, too.
This can be interpreted in a number of ways. One way would be to say that AI language models are just as good as humans now. However, the kind of deficiencies that Marcus points out show this is not the case. Maybe then what this means is that we need a new benchmark. Researchers from Berkeley have published a new benchmark, which tries to capture some of these issues across various tasks.
Benaich noted that an interesting extension towards what GPT3 could do relates to the discussion around PolyAI. It's the aspect of injecting some kind of toggles to the model that allows it to have some guardrails, or at least tune what kind of outputs it can create from a given input. There are different ways that you might be able to do this, he went on to add.
Previously, the use of knowledge bases and knowledge graphs was discussed. Benaich also mentioned some kind of learned intent variable that could be used to inject this kind of control over this more general=purpose sequence generator. Benaich thinks the critical view is certainly valid to some degree and points to what models like GPT3 could use, with the goal of making them useful in production environments.
Causality, the next frontier in AI
Hogarth on his part noted that Marcus is "almost a professional critic of organizations like DeepMind and OpenAI.". While it's very healthy to have those critical perspectives when there is a reckless hype cycle around some of this work, he went on to add, OpenAI has one of the more thoughtful approaches to policy around this.
Hogarth emphasized the underlying difference in philosophy between proponents and critics of the scaling hypothesis. However, he went on to add, if the critics are wrong, then we might have a very smart but not very well-adjusted AGI on our hands as evidenced by some of these early instances of bias as you scale these models:
"So I think it's incumbent on organizations like OpenAI if they are going to pursue this approach to tell us all how they're going to do it safely, because it's not obvious yet from their research agenda. How do you marry AI safety with this kind of this kind of throw more data and compute to the problem and AGI will emerge approach."
This discussion touched on another part of the State of AI Report 2020. Some researchers, Benaich and Hogarth noted, feel that progress in mature areas of machine learning is stagnant. Others call for advancing causal reasoning and claim that adding this element to machine learning approaches could overcome barriers.
Causality, Hogarth said, is arguably at the heart of much of human progress. From an epistemological perspective, causal reasoning has given us the scientific method, and it's at the heart of all of our best world models. So the work that people like Judea Pearl have pioneered to bring causality to machine learning is exciting. It feels like the biggest potential disruption to the general trend of larger and larger correlation driven models:
"I think if you can crack causality, you can start to build a pretty powerful scaffolding of knowledge upon knowledge and have machines start to really contribute to our own knowledge bases and scientific processes. So I think it's very exciting. There's a reason that some of the smartest people in machine learning are spending weekends and evenings working on it.
But I think it's still in its infancy as an area of attention for the commercial community. We really only found one or two examples of it being used in the wild, one by faculty at a London based machine learning company and one by Babylon Health in our report this year."
If you thought that's enough cutting edge AI research and applications for one report, you'd be wrong. The State of AI Report 2020 is a trove of references, and we'll revisit it soon, with more insights from Benaich and Hogarth.