Boon has turned to bane, as the explosive growth in the size of neural networks presents the AI community with ungainly computer workloads that tax existing resources.
Happily, the new year has kicked off with a plethora of solutions to make neural networks more manageable, both for training them, and when they are running on devices to answer questions, known as the "inference" phase of machine learning.
A bevy of papers at the end of December and this week propose a variety of solutions to make networks more manageable. They include, in no particular order:
- Compressing the math needed to compute the weights of a neural network, in some cases reducing them from 32-bit floating point to 8-bit fixed-point (integer) numbers, or "binarizing" them, reducing them to either 1 or 0, or using "symmetry" of matrices, reducing the amount of storage needed to represent them;
- "Pruning" the parameters, meaning removing some of the weights and the "activation function," the computation that makes a given neuron in the neural net respond to the data;
- Reducing the amount of data sharing over a network when running on many distributed computer systems, such as by selectively deciding with neurons exchange information about weights, or by selectively partitioning the computation across different processors;
- New kinds of neural network algorithms, and ways to partition them, to make more efficient use of new kinds of hardware.
As one paper puts it, by Charbel Sakr and Naresh Shanbhag of the University of Illinois at Urbana-Champaign, "Deep neural networks' enormous computational and parameter complexity leads to high energy consumption, makes their training via the stochastic gradient descent algorithm very slow often requiring hours and days, and inhibits their deployment on energy and resource-constrained platforms such as mobile devices and autonomous agents." Many of the papers offer similar concerns.
The search for solutions to making neural nets more efficient to process goes back decades. It was an issue when pioneers Rumelhart and McClelland wrote their massive treatise Parallel Distributed Processing in 1986, and when Facebook's Yann LeCun proposed "Optimal brain damage" in 1990, as a way to reduce the number of parameters needed to be calculated. Work by Google Brain researchers Jeff Dean and others grappled with the problem again as deep learning took off, in 2012.
The search for improvement has been constant but the problem seems only to be getting worse, as the success of deep learning makes networks bigger and bigger in the areas where machine learning has really taken off, such as image recognition and natural language processing.
The authors of the various reports all claim to reduce in some way the computing intensity, including how long it takes to train a neural net, or use it to infer things, or both. They're all burdened by the daunting trade-off that neural networks always labor under: if you dump too much of the complexity of a neural network, you may improve its performance by hours or days, but you may also lose the accuracy of the network. All the authors claim to have successfully navigated that fearsome trade-off.
Here, in no particular order, are the optimizations:
- Yoshua Bengio and team at Montreal's MILA explore what happens if they quantize parts of a neural net to make processing of inference tasks more efficient, especially for mobile devices. They also suggest a new approach for neural net hardware that takes "a few watts" of power to perform the operations.
- Xu Shell Hu and colleagues of Paris's Université Paris-Est propose making the weights of a neural network "symmetric," in a selective fashion, which they insist "can significantly reduce the number of parameters and computational complexity by sacrificing little or no accuracy, both at train and test time."
- Sakr and Shanbhag at U of Illinois show how to quantize all the weights in a network for the training phase, by determining the minimal accepted precision for each part of the network, claiming such measures noticeably reduce the "representational, computational, and communication costs of training" compared to floating point computations.
- Xiaorui Wu and colleagues at the City University of Hong Kong, along with Yongqiang Xiong of Microsoft, improve on the typical design of a distributed machine learning system, reducing drastically the communications bandwidth taken up going between calculating nodes, the result being that it "improves training time substantially over parameter server systems with latest datacenter GPUs and 10G bandwidth."
- Hajar Falahati and colleagues from the Iran University Science and Technology created a scheme for selectively splitting up the atomic functions of neural nets, such as computing non-linearities or updating network weights. The team used that partition approach to make the most of a novel chip they built, called "Origami," which is a combination of an application-specific integrated circuit (ASIC) and multiple layers of stacked DRAM, what's known in the chip world as a "Hybrid Memory Cube."
- Zhiri Tang and colleagues at China's Wuhan University and Chinese Academy of Sciences come up with a novel way to reduce the number of weights in a network by modeling them after an electrical circuit called a "memristor," whose circuit properties are affected by a history of current flow, akin to a spiking neuron.
- Mohammad Mohammadi Amiri and Deniz Gündüz with Imperial College London distribute the training function to multiple wirelessly connected computers, and then have each computer send its computation back over the noisy wireless link, which automatically produces the sum of the multiple gradients, achieving a global gradient when transmission arrives at a central server, thus saving on communications bandwidth.
The over-arching impression given by all these reports is that today's neural networks have not changed from a decade ago in many fundamental aspects; the same problems of scale that were confronted by Google and others back then still apply. And many of the designs of these neural networks -- their "architecture" -- is inefficient, in the sense of having a lot of redundant information.
Another conclusion left lingering is whether all this work will lead to better neural networks. The notion of "representing" the world in a neural network has always rested on the idea of in some way "constraining" that neural network to force it to find higher levels of abstraction.
- 'AI is very, very stupid,' says Google's AI leader CNET
- Baidu creates Kunlun silicon for AI
- Unified Google AI division a clear signal of AI's future TechRepublic
The limits of neural networks are explained in another paper out this week, sponsored by the Defense Advance Research Projects Agency. Pierre Baldi and Roman Vershynin, professors at UC Irvine. In their paper, "The Capacity of Feedforward neural networks," they endeavor to describe just how much information about the world a neural network can realize. As they put it, "The capacity of a network can be viewed as an upperbound on the total number of bits that can be stored in a network, or the number of bits that can be "communicated" from the outside world to the network by the learning process."
While these research papers are dealing with how to manage complexity in a practical sense, it's conceivable the things they find may eventually have an impact on the theoretical problem Baldi and Vershynin are contemplating: Whether those representations in machine learning can be made better —that is, more sophisticated, closer to some high level of understanding about the world by computers.
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.