We don't know why deep learning forms of neural networks achieve great success on many tasks; the discipline has a paucity of theory to explain its empirical successes. As Facebook's Yann LeCun has said, deep learning is like the steam engine, which preceded the underlying theory of thermodynamics by many years.
But some deep thinkers have been plugging away at the matter of theory for several years now.
On Wednesday, the group presented a proof of deep learning's superior ability to simulate the computations involved in quantum computing. According to these thinkers, the redundancy of information that happens in two of the most successful neural network types, convolutional neural nets, or CNNs, and recurrent neural networks, or RNNs, makes all the difference.
Amnon Shashua, who is the president and chief executive of Mobileye, the autonomous driving technology company bought by chip giant Intel last year for $14.1 billion, presented the findings on Wednesday at a conference in Washington, D.C. hosted by The National Academy of Sciences called the Science of Deep Learning Conference.
In addition to being a senior vice president at Intel, Shashua is a professor of computer science at the Hebrew University in Jerusalem, and the paper is co-authored with colleagues from there, Yoav Levine, the lead author, Or Sharir, and with Nadav Cohen of the Institute for Advanced Study in Princeton, New Jersey.
The report, "Quantum Entanglement in Deep Learning Architectures," was published this week in the prestigious journal Physical Review Letters.
The work amounts to both a proof of certain problems deep learning can excel at, and at the same time a proposal for a promising way forward in quantum computing.
In quantum computing, the problem is somewhat the reverse of deep learning: lots of compelling theory, but as yet few working examples of the real thing. For many years, Shashua and his colleagues, and others, have pondered how to simulate quantum computing of the so-called many-body problem.
Physicist Richard Mattuck has defined the many-body problem as "the study of the effects of interaction between bodies on the behaviour of a many-body system," where bodies have to do with electrons, atoms, molecules, or various other entities.
What Shashua and team found, and what they say they've proven, is that CNNs and RNNs are better than traditional machine learning approaches such as the "Restricted Boltzmann Machine," a neural network approach developed in the 1980s that has been a mainstay of physics research, especially quantum theory simulation.
"Deep learning architectures," they write, "in the form of deep convolutional and recurrent networks, can efficiently represent highly entangled quantum systems."
Entanglements are correlations between those interactions of bodies that occur in quantum systems. Actual quantum computing has the great advantage of being able to compute entanglements with terrific efficiency. To simulate that through conventional electronic computing can be extremely difficult, even intractable.
"Our work quantifies the power of deep learning for highly entangled wave function representations," they write, "theoretically motivating a shift towards the employment of state-of-the-art deep learning architectures in many-body physics research."
The authors pursued the matter by taking CNNs and RNNs and applying to them "extensions" they have devised. They refer to this as a "simple 'trick'," and involves that redundancy mentioned earlier. It turns out, according to Shashua and colleagues. It turns out, they write, that the structure of CNNs and RNNs involves an essential "reuse" of information.
In the case of CNNs, the "kernel," the sliding window that is run across an image, overlaps at each moment, so that parts of the image are ingested to the CNN multiple times. In the case of RNNs, the recurrent use of information at each layer of the network is a similar kind of reuse, in that case for sequential data points.
In both cases, "this architectural trait […] was shown to yield an exponential enhancement in network expressivity despite admitting a mere linear growth in the amount of parameters and in computational cost." In other words, CNNs and RNNS, by virtues of redundancy, achieved via stacking many layers, have a more efficient "representation" of things in computing terms.
For example, a traditional "fully-connected" neural network — what the authors term a "veteran" neural network, requires computing time that scales as the square of the number of bodies being represented. A RBM, they write, is better, with compute time that scales linearly in terms of the number of bodies. But CNNs and RNNs can be even better, with their required compute time scaling as the square root of the number of bodies.
Those properties "indicate a significant advantage in modeling volume-law entanglement scaling of deep-convolutional networks relative to competing veteran neural-network based approaches," they write. "Practically, overlapping-convolutional networks […] can support the entanglement of any 2D system of interest up to sizes 100 × 100, which are unattainable by competing intractable approaches."
- 'AI is very, very stupid,' says Google's AI leader (CNET)
- How to get all of Google Assistant's new voices right now (CNET)
- Unified Google AI division a clear signal of AI's future (TechRepublic)
- Top 5: Things to know about AI (TechRepublic)
To make that work, the authors had to use their "trick": The traditional way of representing quantum computation, a "Tensor Network," doesn't support the reuse of information. So, the authors created modified versions of the CNN and the RNN. The first is called a "convolutional arithmetic circuit," or CAC. It's an approach they've been developing in work of recent years, here brought to greater fruition. The trick is "duplication of the input data itself" in the CAC, which effectively replicates the reuse seen in the overlapping of the CNN. In the case of the RNN, they created a "recurrent arithmetic circuit," in which they duplicate the input information.
"Importantly, since the output vector of each layer of the deep RAC at every time step is used twice (as an input of the next layer up, but also as a hidden vector for the next time-step), there is an inherent reuse of data during network computation," they write. "Therefore, we duplicate the inputs as in the overlapping-convolutional network case, and obtain the TN of the deep RAC."
The results of all this are two-fold: proofs for deep learning, and a way forward for quantum simulations.
The formal proofs of the efficiency of CACs and RACs, included in supplementary material, amount to a proof that deep learning approaches can tackle quantum entanglement more efficiently.
The authors end on the hopeful note that their findings "can help bring quantum many-body physics and state-of-the-art machine learning approaches one step closer together."
Both quantum computing and deep learning may never again be the same.
How much progress do you think deep learning will make on the theory side? Talk Back and Let Me Know.
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.