François Chollet, a scientist in Google's artificial intelligence unit, is a member of a new generation of pioneers in machine learning. In 2015, he introduced the world to an application programming interface that has become wildly popular for implementing deep learning networks, called Keras. It is most commonly used as an interface to Google's TensorFlow framework. It vastly simplifies the matter of assembling neural networks of various sorts.
In that way, Chollet has helped in very concrete fashion to advance the development and testing of deep learning. It may seem surprising, then, that one of Chollet's foci at the moment is the very big picture of how to advance artificial intelligence beyond merely getting better on benchmarks.
Chollet is not entirely satisfied with where AI is at the moment. "A lot of well-funded, large-scale gradient-descent projects get carried out as a way to generate bombastic press articles that misleadingly suggest that human-level AI is perhaps a few years away," wrote Chollet in a communication with ZDNet in email. "Many people have staked a lot on this illusion. But it's still an illusion."
ZDNet reached out to Chollet after he published a paper three weeks ago offering a remarkable critique of deep learning's strengths and weaknesses. The paper, titled, On the Measure of Intelligence, proposes a new definition of intelligence, and materials to help scientists develop systems that may achieve it, called the "Abstraction and Reasoning Corpus," or ARC. ARC is a collection of challenges for intelligent systems, a new benchmark. The idea is to guide AI toward "more intelligent and more human-like artificial systems."
Chollet identifies as one of the chief limits of today's deep learning its obsession with incremental improvements on narrow skills tests.
Instead of drills on tests, ARC would lead to evaluating systems based on how efficient they are in the acquisition of skills. A solution to ARC, he hypothesizes, would be a system that has developed some "core knowledge priors," broad information about the world, such as object permanence, but different from what people casually call "common sense." The goal would be greater "generalization," meaning, an ability for a system to succeed in held-out, hidden tasks that have been designed to be solvable with those priors.
All this is "highly speculative," writes Chollet in the paper, and currently, "to the best of our knowledge, ARC does not appear to be approachable by any existing machine learning technique," he writes.
ZDNet asked Chollet several questions about the effort, which he answered in written form. The questions and the answers are printed below in their entirety.
In his written responses, Chollet describes ARC as a product of fifteen years of trying to "'understand the mind.'" He was lately triggered, he writes, by the "narrow-mindedness" of pronouncements he's heard made in the AI field, and an ahistoricity he observes in much recent work in reinforcement learning and such.
Such systems have made amazing progress and are valuable, but they are not the "end-all-be-all," he writes. Deep learning looks up past data and performs interpolation, he observes. "But intelligence as I formally define it in the paper needs to feature extrapolation rather than mere interpolation."
Chollet's goal, he writes, is to "nudge researchers into looking at questions they're not currently asking, into trying ideas they would not normally pursue."
Chollet writes that he's made some progress toward solutions to ARC, and expresses hope others will too. It's so far led him into some "interesting and quite unique research directions."
ZDNet: Please describe briefly how you came to the train of thought that brought you to building ARC and writing the paper. What was your intellectual path to this point, however that question makes sense to you?
Francois Chollet: This paper is my attempt to write down and formalize things I've been saying for many years, in talks, in blog posts or on Twitter, in personal conversations. I mean it to be actionable, useful to others, not merely a set of opinions -- a formal framework for rigorously expressing certain ideas about generalization and intelligence, and a concrete challenge for others to take on.
I've been trying to "understand" the mind (in a broad sense) as my primary area of focus for a long time, for the past 15 years or so. Initially I was coming at it from the perspective of neuropsychology and developmental psychology. I then moved towards AI, in particular "cognitive developmental robotics,'' which is the AI subfield I identified with as a university student -- building computational models of human cognitive developmental, sometimes physically embodied into robots or at least simulations. In 2009, I started working on a fairly ambitious general AI architecture I called ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System), which I worked on it for a few years before gradually moving on to other things. I am now using several of the ideas from that project as a basis for building an ARC solver. General AI research wasn't very popular back then, so at some point I had to pick up marketable skills and get a job. It has been a distraction, but I've never really stopped thinking about it.
Something that has been a trigger for me to write these ideas down has been the renewed interest in general AI and reinforcement learning over the past few years, and what I perceive as a certain narrow-mindedness and ahistoricity in the sweeping pronouncements I've been hearing about it. A lot of this paper is about bringing much-needed context and grounding to the discussion, and framing things in a historical perspective. Before you start coming up with sweeping answers, you need to know what the right questions are, and where these questions are coming from.
ZDNet: How do you hope the international community of researchers will receive ARC? What are your goals for it, especially given the mention of having AI competitions? Is the "hypothetical ARC solver" an immediate goal?
FC: The goal is to nudge researchers into looking at questions they're not currently asking, into trying ideas they would not normally pursue. I want people to look at ARC and ask, what would it take to solve these tasks? The performance of existing techniques on ARC is basically zero, whereas humans can solve it without any prior training or explanations, so that's a big red neon sign saying that there's something going on here and that we're in need of novel ideas. ARC should serve both as a benchmark of progress and as a source of inspiration.
Personally, ARC has already led me towards interesting and quite unique research directions, and I have made decent progress on starting to solve it, reusing old ideas I've been playing with for a long time. I hope this will soon be true of other people as well. Fully solving ARC is probably not within immediate reach, but ARC as an AI challenge is at a level of conceptual difficulty where meaningful progress can be made right away. That was one of the goals: ARC would be pointless if it were impossible to approach it. The ideal challenge is something for which our performance starts at 0 -- which makes it intriguing and highlights the need for fresh ideas -- but very quickly becomes non-zero -- which is a sign that it is triggering substantial conceptual progress.
ZDNet: When will we know if ARC is having constructive effects? Meaning, is there a measure of its impact on the research community you expect or hope to see in the near- to intermediate-term?
FC: I don't know how much interest it will generate in the first place. But I am reasonably hopeful. I don't care much for academic tokens of impact such as citations, so my personal metric of success will be the rate at which ARC gets solved. If it gets solved within a couple years, it was probably flawed and not sufficiently challenging. If our performance is still near-zero in 10 years, ARC would have been a valid challenge, but one not conducive to much progress. It will have been successful if we see a steady rate of meaningful progress over a span of several years. And of course, if the ideas and techniques that lead to this progress actually generalize, that is to say, if they eventually find useful applications in real-world systems. Pragmatically, the measure of success is your eventual impact on the world, not how much you capture the attention of AI researchers or the general public.
ZDNet: What is the value of existing work on deep learning across the spectrum of efforts from DeepMind's work on AphaZero and AlphaStar to the many adaptations of Transformer (e.g., BERT, GPT2, XLNet, etc.), especially given your point on page 52 that no existing deep learning system appears able to solve ARC, and your comment on page 55 about the potential to "adapt" existing games or new tests? Are these deep learning systems valuable? Are they misguided? Are they squandering resources that should be spent in a different way?
FC: I say this a lot, but deep learning is immensely valuable. What deep learning does is to map an input space X to a target space Y, via a geometric morphing, learned using large amounts of human-annotated data (or sometimes, data with automatically-generated annotations). So deep learning is pattern recognition, input-to-output mapping given a dense sampling of a data manifold. But it's very good at pattern recognition.
Being good at this is a game-changer in just about any industry. You can understand it as a way to encode and operationalize existing human abstractions -- to automate known solutions to known problems when we're in a position to collect a vast number of examples. This opens the door to a whole new world of automation. So to be clear, I'm not trying to downplay the profound significance of being good at this kind of thing. I've spent years of my life working on deep learning. I've seen it lead to solving countless problems that we thought impossible to solve just a few years ago. Always using the exact same basic recipe.
However, it would be a mistake to believe that existing deep learning techniques represent the end-all-be-all of AI. By construction, by training, what deep learning does is looking up past data and performing interpolation. This can implement local generalization -- at best, systems that can robustly do what they're trained to do, that can make sense of what they've seen before, that can handle the kind of uncertainty that their creators have planned for. But intelligence as I formally define it in the paper needs to feature extrapolation rather than mere interpolation -- it needs to implement broad or even extreme generalization, to adapt to unknown unknowns across previously unknown tasks. Intelligence is not curve-fitting.
This is something that deep learning is fundamentally not adapted for, and the practical results of the past few years give this view a resounding empirical confirmation. Deep learning models are brittle, extremely data-hungry, and do not generalize beyond their training data distribution. This is an inescapable consequence of what they are and how we train them. They can at best encode the abstractions we explicitly train them to encode, they cannot autonomously produce new abstraction. They simply don't have the machinery for it -- it's like expecting a car to start flying if only its wheel would turn fast enough. Cars can be very useful, but if you think they can go anywhere and are the only vehicle we're ever going to need, you're mistaken.
In my opinion, it is absolutely true that it is a waste of resources to be building single-use, special-purpose, multi-million dollar AI systems that play popular video games at superhuman level. It's a gimmick. The purpose of scientific research should be to answer open questions, to produce new technology -- in a word, to generate new knowledge that is relevant to the real world, knowledge that generalizes. The purpose of research should not be to generate splashy headlines to impress the public. These are multi-million dollar efforts that, in my opinion, do not teach us anything, and do not produce reusable artifacts that we can use to solve new problems. The state of our knowledge is the same at project completion as it was when the project started.
I know this is a very heretic thing to say in the current climate, where a lot of well-funded large- scale gradient-descent projects get carried out as a way to generate bombastic press articles that misleadingly suggest that human-level AI is perhaps a few years away. Many people have staked a lot on this illusion. But it's still an illusion.
ZDNet: How should we reconcile your discussion of "priors" in this paper with past discussion of priors in deep learning, such as, for example, the notion that convolutions are a sort of "broad prior" underlying convolutional neural networks? Is your notion of priors contiguous/compatible with those notions of priors as described in the writings of, for example, Yann LeCun and Yoshua Bengio? (Cf., Lecun, Bengio, 2007, "Scaling learning algorithms toward AI", page 5, "The flat prior assumption must be rejected: some wiring must be simpler to specify (or more likely) than others. In what seems like an incredibly fortunate coincidence, a particularly good (if not "correct") wiring pattern happens to be one that preserves topology.")
FC: I'm actually talking about the exact same kind of knowledge priors. Convolution in deep learning represents the double assumption that, if you have a 2D grid of variables encoding visual data, first, spatially close variables are more likely to be correlated than spatially distant variables, and second, spatial correlation patterns are independent from location (translation invariance). These assumptions are actually a subset of the objectness prior from Spelke's Core Knowledge theory.
In general, wiring topology in deep learning encodes assumptions about the structure of correlations in the input-cross-output space -- about the shape of the space of information. A good topology can dramatically reduce the size of the search space and can improve the feasibility of finding good input-output mappings via gradient descent (the big question in deep learning isn't so much whether your search space includes configurations that would solve your problem, but whether these configurations are learnable using gradient descent and the data you have available). This renders tractable problems that would be impossible to solve if you didn't make sufficient assumptions. To learn from data, you need to make assumptions about it. Such assumptions represent "prior knowledge about the external world" that belongs in the same category of priors as Core Knowledge.
ZDNet: What is the significance of stochasticity to intelligence? You have noted a process can be stochastic in several areas of the intelligent system you describe. Is stochasticity essential to the principles you've outlined, is it of marginal importance/disposable? Why or why not?
FC: The real world and real intelligent agents (like animals or humans) have many factors of uncertainty, so a model of their interaction should account for this uncertainty by involving randomness and probability. But that is really a detail. The model of intelligence I proposed in the paper could be reformulated with deterministic tasks and deterministic intelligent systems without substantially changing the nature of the model and its conclusions. Although that would be quite a bit less realistic and quite a bit less general.
ZDNet: Is there any value to a pursuit of intelligence that doesn't follow an "anthropocentric focus" as you put it on page 24? Some non-human intelligence?
FC: Oh, absolutely. I do believe that intelligence that greatly differs from our own could exist and would have intrinsic value. But we probably won't call it "intelligence" if it isn't relatable. It is a fact that we only make sense of other minds, or value their cognitive abilities, relatively to our own. We recently started noticing that animals were intelligent only because of the ways their behavior resemble human behavior, because of the fact they seem to be able to solve problems that we value. If they weren't human-like in at least some ways, we wouldn't even *notice* -- much less value -- the richness or complexity of their information-processing abilities and their adaptation faculties. We don't perceive companies, markets, or science, to be intelligent -- yet they may be modelled as intelligent systems, and they often feature greater-than-human intelligence in a certain sense.
A good definition of intelligence should stay close to what people mean when they talk about intelligence. And humans have a fundamentally anthropocentric view of intelligence. So I believe it is necessary to explicitly acknowledge this fact, instead of using definitions of intelligence that ostensibly aspire to universality but that are implicitly describing human cognition and operating within a human value system.
ZDNet: Would a system like Keras take a different form if you had built it starting from what you've outlined here, or, asked differently, Is there a technology artifact similar to Keras that would be the output of the principles you've outlined here?
FC: Given what I've learned from my ongoing attempts to solve ARC, I do believe that there will eventually be software frameworks that will package these principles in an easy-to-use way for developers to leverage them in their own intelligent applications. There will be a Keras for neuro-symbolic program synthesis. An operating system for intelligence. However, this is still quite far away.
ZDNet: With the train and evaluation test files in JSON form posted on GitHub, can you be sure that the tests in ARC cannot be "gamed" as you put it? (Is this the private test set of which you write?)
FC: At this time, it is impossible to tell with certainty whether ARC can be "gamed" or not. My first step to answer this question will be to organize a public competition around ARC, with a monetary incentive, and see what happens. If there is a non-intelligent shortcut to solve ARC, chances are such a competition would quickly bring it to light.
The competition will leverage the private test set -- a completely unknown set of ARC tasks. This guarantees that the algorithms used in the competition will have to be able to autonomously handle new tasks, rather than being mere records of past human-generated solutions.
We'll see what happens! We're only just getting started.