Few people, Nvidia's competitors included, would dispute the fact that Nvidia is calling the shots in the AI chip game today. The announcement of the new Ampere AI chip in Nvidia's main event, GTC, stole the spotlight last week.
The gist of Ray's analysis is on capturing Nvidia's intention with the new generation of chips: To provide one chip family that can serve for both "training" of neural networks, where the neural network's operation is first developed on a set of examples, and also for inference, the phase where predictions are made based on new incoming data.
Ray notes this is a departure from today's situation where different Nvidia chips turn up in different computer systems for either training or inference. He goes on to add that Nvidia is hoping to make an economic argument to AI shops that it's best to buy an Nvidia-based system that can do both tasks.
"You get all of the overhead of additional memory, CPUs, and power supplies of 56 servers ... collapsed into one," said Nvidia CEO Jensen Huang. "The economic value proposition is really off the charts, and that's the thing that is really exciting."
Jonah Alben, Nvidia's senior VP of GPU Engineering, told analysts that Nvidia had already pushed Volta, Nvidia's previous-generation chip, as far as it could without catching fire. It went even further with Ampere, which features 54 billion transistors, and can execute 5 petaflops of performance, or about 20 times more than Volta.
So, Nvidia is after a double bottom line: Better performance and better economics. Let us recall that recently Nvidia also added support for Arm CPUs. Although Arm processor performance may not be on par with Intel at this point, its frugal power needs make them an attractive option for the data center, too, according to analysts.
On the software front, besides Apache Spark support, Nvidia also unveiled Jarvis, a new application framework for building conversational AI services. To offer interactive, personalized experiences, Nvidia notes, companies need to train their language-based applications on data that is specific to their own product offerings and customer requirements.
However, building a service from scratch requires deep AI expertise, large amounts of data, and compute resources to train the models, and software to regularly update models with new data. Jarvis aims to address these challenges by offering an end-to-end deep learning pipeline for conversational AI.
Jarvis includes state-of-the-art deep learning models, which can be further fine-tuned using Nvidia NeMo, optimized for inference using TensorRT, and deployed in the cloud and at the edge using Helm charts available on NGC, Nvidia's catalog of GPU-optimized software.
Working backward, this is something we have noted time and again for Nvidia: Its lead does not just lay in hardware. In fact, Nvidia's software and partner ecosystem may be the hardest part for the competition to match. The competition is making moves too, however. Some competitors may challenge Nvidia on economics, others on performance. Let's see what the challengers are up to.
Freund also highlights the importance of the software stack. He notes that Intel's AI software stack is second only to Nvidia's, layered to provide support (through abstraction) of a wide variety of chips, including Xeon, Nervana, Movidius, and even Nvidia GPUs. Habana Labs features two separate AI chips, Gaudi for training, and Goya for inference.
Intel is betting that Gaudi and Goya can match Nvidia's chips. The MLPerf inference benchmark results published last year were positive for Goya. However, we'll have to wait and see how it fares against Nvidia's Ampere and Nvidia's ever-evolving software stack.
If Intel has a lot for catching up to do, that certainly also applies to GraphCore. Both vendors seem to be on a similar trajectory, however. Aiming to innovate on the hardware level, hoping to be able to challenge Nvidia with a new and radically different approach, custom-built for AI workloads. At the same time, working on their software stack, and building their market presence.
Fractionalizing AI hardware with a software solution by Run:AI
Last but not least, there a few challengers who are less high-profile and have a different approach. Startup Run:AI recently exited stealth mode, with the announcement of $13 million in funding for what sounds like an unorthodox solution: Rather than offering another AI chip, Run:AI offers a software layer to speed up machine learning workload execution, on-premise and in the cloud.
The company works closely with AWS and is a VMware technology partner. Its core value proposition is to act as a management platform to bridge the gap between the different AI workloads and the various hardware chips and run a really efficient and fast AI computing platform.
Run:AI recently unveiled its fractional GPU sharing for Kubernetes deep learning workloads. Aimed at lightweight AI tasks at scale such as inference, the fractional GPU system gives data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU, thus lowering costs.
Omri Geller, Run:AI co-founder and CEO told ZDNet that Nvidia's announcement about "fractionalizing" GPU, or running separate jobs within a single GPU, is revolutionary for GPU hardware. Geller said it has seen many customers with this need, especially for inference workloads: Why utilize a full GPU for a job that does not require the full compute and memory of a GPU?
"We believe, however, that this is more easily managed in the software stack than at the hardware level, and the reason is flexibility. While hardware slicing creates 'smaller GPUs' with a static amount of memory and compute cores, software solutions allow for the division of GPUs into any number of smaller GPUs, each with a chosen memory footprint and compute power.
In addition, fractionalizing with a software solution is possible with any GPU or AI accelerator, not just Ampere servers - thus improving TCO for all of a company's compute resources, not just the latest ones. This is, in fact, what Run:AI's fractional GPU feature enables."
An accessibility layer for FPGAs with InAccel
InAccel is a Greek startup, built around the premise of providing an FPGA manager that allows the distributed acceleration of large data sets across clusters of FPGA resources using simple programming models. Founder and CEO Chris Kachris told ZDNet there are several arguments regarding the advantages of FPGAs vs GPUs, especially for AI workloads
Kachris noted FPGAs can provide better energy efficiency (performance/watt) in some cases, and they can also achieve lower latency compared to GPUs for deep neural networks (DNNs). For DNNs, Kachris went on to add, FPGAs can achieve high throughput using low-batch size, resulting in much lower latency. In applications that latency and energy efficiency are critical, FPGAs can prevail.
However, scalable deployment of FPGA clusters remains challenging, and this is the problem InAccel is out to solve. Its solutions aim to provide scalable deployment of FPGA clusters, proving the missing abstraction -- OS-like layer for the FPGA world. InAccel's orchestrator allows easy deployment, instant scaling, and automated resource management of FPGA clusters.
Kachris likened InAccel to VMware / Kubernetes, or Run.ai / Bitfusion for the FPGA world. He also claimed InAccel makes FPGA easier for software developers. He also noted that FPGA vendors like Intel and Xilinx have recognized the importance of a strong ecosystem and formed strong alliances that help expand their ecosystem:
"It seems that cloud vendors will have to provide a diverse and heterogeneous infrastructure as different platforms have pros and cons. Most of these vendors provide fully heterogeneous resources (CPUS, GPUS, FPGAs, and dedicated accelerators), letting users select the optimum resource.
Several cloud vendors, such as AWS and Alibaba, have started deploying FPGAs because they see the potential benefits. However, FPGA deployment is still challenging as users need to be familiar with the FPGA tool flow. We enable software developers to get all the benefits of FPGAs using familiar PaaS and SaaS model and high-level frameworks (Spark, Skcikit-learn, Keras), making FPGAs deployment in the cloud much easier."
Hedge your bets
It takes more than fast chips to be the leader in this field. Economics is one aspect potential users need to consider, ecosystem and software are another. Taking everything into account, it seems like Nvidia still is ahead of the competition.
It's also interesting to note, however, that this is starting to look less and less like a monoculture. Innovation is coming from different places, and in different shapes and forms. This is something Nvidia's Alben acknowledged too. And it's certainly something cloud vendors, server vendors, and application builders seem to be taking note of.
Hedging one's bets in the AI chip market may be the wise thing to do.