Most computers in the world tend to do one thing and then move on to the next thing, a series of sequential tasks. For decades, computer scientists have struggled to get machines to do multiple things in parallel.
With the boom in artificial intelligence in recent years, an ideal workload has arrived, a kind of software programming that naturally gets better as its mathematical operations are spread across either many chips, or across circuits inside of a chip that work in parallel.
For upstart chip technology vendors, the surge in popularity of AI means, they are convinced, that their time has come, the chance to sell new kinds of parallel processing computers.
"It's fundamental," Nigel Toon, co-founder and chief executive of computer startup Graphcore, told ZDNet in a video interview last week from his home in England.
"We've got a very different approach and a very different architecture" from conventional computer chips, said Toon. "The conversations we have with customers are, here's a new tool for your toolbox that allows you to do different things, and to solve different problems."
Graphcore, founded in 2016 and based in the quaint medieval town of Bristol, a couple hours west of London, has spent the last several years amassing an amazing war chest of venture money in a bid to be one of the companies that can make the dream of parallel computing a reality.
Last week, Toon had a nice proof of concept to offer of where things might be going.
Microsoft machine learning scientist Sujeeth Bharadwaj gave a demonstration of work he's done on the Graphcore chip to recognize COVID-19 in chest X-rays, during a virtual conference about AI in healthcare. Bharadwaj's work showed, he said, that the Graphcore chip could do in 30 minutes what it would take five hours to do on a conventional chip from Nvidia, the Silicon Valley company that dominates the running of the neural network.
Why should that be? Bharadwaj made the case that the way his program, called SONIC, is constructed, it needs a different kind of machine, a machine where more things can run in parallel.
"There's a very strong synergy," Bharadwaj asserted, between the SONIC program and the Graphcore chip.
If Bharadwaj's point is broadly right, it means tomorrow's top-performing neural networks, generally referred to as state of the art, would open a big market opportunity for Graphcore, and for competitors who have novel computers of various sorts, presenting a big threat to Nvidia.
Graphcore has raised over $450 million, including a $150 million D round in February, "Timing turned out to be absolutely perfect" for raising new money, Toon said. The latest infusion gives Graphcore a post-money valuation "just shy of two billion dollars." The company had $300 million in the bank as of February, he noted.
Investors include "some of the biggest public-market investors in tech," such as U.K. investment manager Baillie Gifford. Other giant backers include Microsoft, Bosch, BMW, and the CEO and co-founder of Google's DeepMind AI unit, Demis Hassabis.
Toon noted that large public equity investors such as Baillie Gifford are "investing here in a private company obviously anticipating that we might at some point in the future go public," Toon remarked.
As for when Graphcore might go public, "I've no idea," he said with a laugh.
A big part of why SONIC, and programs like it, are able to achieve the parallel carrying out of tasks is computer memory. Memory may be the single most important aspect that's changing in chip design as a result of AI. In order for many tasks to work in parallel, the need for memory capacity to store data rises rapidly.
Memory on chips such as Nvidia's, or Intel's, is traditionally limited to tens of millions of bytes. Newer chips such as Graphcore's intelligence processing unit, or IPU, beef up the memory count, with 300 million bytes. The IPU, like other modern chips, spread that memory throughput the silicon die, so that memory is close to each of the over 1,000 individual computing units.
The result is that memory can be accessed much quicker than going off of the chip to a computer's main memory, which is still the approach of Nvidia's latest GPUs. Nvidia has ameliorated the situation by amplifying the conduit that leads from the GPU to that external memory, in part through the acquisition of communications technology vendor Mellanox, last year.
But the movement from GPU to main memory is still no match for the speed of on-chip memory, which can be up to 45 billion bytes per second. That access to memory is a big reason why Bharadwaj's SONIC neural network was able to see a dramatic speed-up in training compared to how long it took to run on an Nvidia GPU.
SONIC is an example to Toon of the new kinds of emerging neural nets that he argues will increasingly make the IPU a must for doing cutting-edge AI development.
"I think one of the things that the IPU is able to help innovators do is to create these next generation image perception models, make them much more accurate, much more efficiently implemented," said Toon.
An important question is whether SONIC's results are a fluke, or whether the IPU can speed up many different kinds of AI programs by doing things in parallel.
To hear Bharadwaj describe it, the union of his program and the Graphcore chip is somewhat specific. "SONIC was designed to leverage the IPU's capabilities," said Bharadwaj in his talk.
Toon, however, downplayed the custom aspect of the program. "There was no tweaking backwards and forwards in this case," he said of SONIC's development. "This was just an amazing output that they found from using the technology and the standard tools."
The work happened independent of Graphcore, Toon said. "The way this came about was, Microsoft called us up one day and they said, Wow, look what we were able to do."
Although the IPU was "designed so that it will support these types of more complex algorithms," said Toon, it is built to be much broader than a single model, he indicated. "Equally it will apply in other kinds of models." He cited, for example, natural language processing systems, "where you want to use sparse processing in those networks."
The market for chips for both training, and, especially, for inference, has become a very crowded one. Nvidia is the dominant force in training, while Intel commands the most market share in inference. Along with Graphcore, Cerebras Systems of Los Altos, in Silicon Valley, is shipping systems and getting work from major research labs such as Argonne National Laboratory in the U.S. Department of Energy. Other major names have gotten funding and are in the development stage, such as SambaNova Systems, with a Stanford University pedigree.
Toon nevertheless depicted the market as a two-horse race. "Every time we go and talk to customers it's kind of us and Nvidia," he said. The competition has made little progress, he told ZDNet. In the case of Cerebras, the company "have shipped a few systems to a few customers," adding, "I don't know what traction they're getting."
In the case of Intel, which last year acquired the Israeli startup Habana, "They still have a lot to prove," said Toon. "They haven't really delivered a huge amount, they've got some inference products out there, but nothing for training that customers can use," he said.
Some industry observers view the burden of proof lying more heavily on Graphcore's shoulders.
"Intel's acquisition of Habana makes it the top challenger to Nvidia in both AI inference and training," Linley Gwennap, editor of the prestigious chip newsletter Microprocessor Report, told ZDNet. Habana's benchmark results for its chips are better than the numbers for either Nvidia's V100, its current best chip, or Graphcore's part, contended Gwennap. "Once Intel ports its extensive AI software stack to the Habana hardware, the combination will be well ahead of any startup's platform."
Nvidia two weeks ago announced its newest chip for AI, called the "A100." Graphcore expects to leapfrog the A100 when Graphcore ships its second-generation processor, sometime later this year, said Toon. "When our next generation products come, we should continue to stay ahead."
Gwennap is skeptical. The Nvidia part, he said, "raises the performance bar well above every existing product," and that, he says, leaves all competitors "in the same position: claiming that their unannounced next-generation chip will leapfrog the A100's performance while trying to meet customers' software needs with a far smaller team than either Intel or Nvidia can deploy."
Technology executives tend to over-use the tale of David and Goliath as a metaphor for their challenge to an incumbent in a given market. With a viral pandemic spreading around the world, Toon chose a different image, that of Graphcore's technology spreading like a contagion.
"We've all learned about R0 and exponential growth," he said, referring to the propagation rate of COVID-19, known as the R-naught. "What we've got to do is to keep our R0 above 1."