As data generation and accumulation accelerates, we've reached a tipping point where using machine learning just works. Using machine learning to train models that find patterns in data and make predictions based on those is applied to pretty much everything today. But data and models are just one part of the story.
Another part, equally important, is compute. Machine learning consists of two phases: Training and inference. In the training phase patterns are extracted, and machine learning models that capture them are created. In the inference phase, trained models are deployed and fed with new data in order to generate results.
Both of these phases require compute power. Not just any compute in fact, as it turns out CPUs are not really geared towards the specialized type of computation required for machine learning workloads. GPUs are currently the weapon of choice when it comes to machine learning workloads, but that may be about to change.
AI chips just got more interesting
GPU vendor Nvidia has reinvented itself as an AI chip company, coming up with new processors geared specifically towards machine learning workloads and dominating this market. But the boom in machine learning workloads has whetted the appetite of others players, as well.
Cloud vendors such as Google and AWS are working on their own AI chips. Intel is working on getting FPGA chips in shape to support machine learning. And upstarts are having a go at entering this market as well. GraphCore is the most high profile among them, with recent funding having catapulted it into unicorn territory, but it's not the only one: Enter Habana.
Habana has been working on its own processor for AI since 2015. But as Eitan Medina, its CBO told us in a recent discussion, it has been doing so in stealth until recently: "Our motto is AI performance, not stories. We have been working under cover until September 2018". David Dahan, Habana CEO, said that "among all AI semiconductor startups, Habana Labs is the first, and still the only one, which introduced a production-ready AI processor."
As Medina explained, Habana was founded by CEO David Dahan and VP R&D Ran Halutz. Both Dahan and Halutz are semi-conductor industry veterans, and they have worked together for years in semiconductor companies CEVA and PrimeSense. The management team also includes CTO Shlomo Raikin, former Intel project architect.
Medina himself also has an engineering background: "Our team has a deep background in machine learning. If you Google topics such as quantization, you will find our names," Medina said. And there's no lack of funding or staff either.
Habana just closed a Round B financing round of $75 million, led by Intel Capital no less, which brings its total funding to $120 million. Habana has a headcount of 120 and is based in Tel Aviv, Israel, but also has offices and R&D in San Jose, US, Gdansk, Poland, and Beijing, China.
This looks solid. All these people, funds, and know-how have been set in motion by identifying the opportunity. Much like GraphCore, Habana's Medina thinks that the AI chip race is far from over, and that GPUs may be dominating for the time being, but that's about to change. Habana brings two key innovations to the table: Specialized processors for training and inference, and power efficiency.
Separating training and inference to deliver superior performance
Medina noted that starting with a clean sheet to design their processor, one of the key decisions made early on was to address training and inference separately. As these workloads have different needs, Medina said that treating them separately has enabled them to optimize performance for each setting: "For years, GPU vendors have offered new versions of GPUs. Now Nvidia seems to have realized they need to differentiate. We got this from the start."
Habana offers two different processors: Goya, addressing inference; and Gaudi, addressing training. Medina said that Goya is used in production today, while Gaudi will be released in Q2 2019. We wondered what was the reason inference was addressed first. Was it because the architecture and requirements for inference are simpler?
Medina said that it was a strategic decision based on market signals. Medina noted that the lion's share of inference workloads in the cloud still runs on CPUs. Therefore, he explained, Habana's primary goal at this stage is to address these workloads as a drop-in replacement. Indeed, according to Medina, Habana's clients at this point are to a large extent data center owners and cloud providers, as well as autonomous cars ventures.
The value proposition in both cases is primarily performance. According to benchmarks published by Habana, Goya is significantly faster than both Intel's CPUs and Nvidia's GPUs. Habana used the well-known RES-50 benchmark, and Medina explained the rationale was that RES-50 is the easiest to measure and compare, as it has less variables.
Medina said other architectures must make compromises:
"Even when asked to give up latency, throughput is below where we are. With GPUs / CPUs, if you want better performance, you need to group data input in big groups of batches to feed the processor. Then you need to wait till entire group is finished to get the results. These architectures need this, otherwise throughput will not be good. But big batches are not usable. We have super high efficiency even with small batch sizes."
There are some notable points about these benchmarks. The first, Medina pointed out, is that their scale is logarithmic, which is needed to be able to accommodate Goya and the competition in the same charts. Hence the claim that "Habana smokes inference incumbents." The second is that results become even more interesting if power efficiency is factored in.
Power efficiency and the software stack
Power efficiency is a metric used to measure how much power is needed per calculation in benchmarks. This is a very important parameter. It's not enough to deliver superior performance alone, the cost of delivering this is just as important. A standard metric to measure processor performance is IPS, Instructions Per Second. But IPS/W, or IPS per Watt, is probably a better one, as it takes into account the cost of delivering performance.
Higher power efficiency is better in every possible way. Thinking about data centers and autonomous vehicles, minimizing the cost of electricity, and increasing autonomy are key requirements. And in the bigger picture, lowering carbon footprint, is a major concern for the planet. As Medina put it, "You should care about the environment, and you should care about your pocket."
Goya's value proposition for data centers is focused on this, also factoring in latency requirements. As Medina said, for a scenario of processing 45K images/second, three Goya cards can get results with a latency of 1,3 msec, replacing 169 CPU servers with a latency of 70 msec plus 16 Nvidia Tesla V100 with a latency of 2,5 msec with a total cost around $400,000. The message is clear: You can do more with less.
TPC, Habana's Tensor Processor Core at the heart of Goya, supports different form factors, memory configurations, and PCIe cards, as well as mixed-precision numeric. It is also programmable in C, and accessible via what Habana calls the GEMM engine (General Matric Multiplication). This touches upon another key aspect of AI chips: The software stack, and integrations with existing machine learning frameworks.
As there is a slew of machine learning frameworks people use to build their models, supporting as many of them as seamlessly as possible is a key requirement. Goya supports models trained on any processor via an API called SynapseAI. At this point, SynapseAI supports TensorFlow, mxnet and ONNX, an emerging exchange format for deep learning models, and is working on adding support for PyTorch, and more.
Users should be able to deploy their models on Goya without having to fiddle with SynapseAI. For those who wish to tweak their models to include customizations, however, the option to do so is there, as well as IDE tools to support them. Medina said this low-level programming has been requested by clients who have developed custom ways of maximizing performance on their current setting and would like to replicate this on Goya.
The bigger picture
So, who are these clients, and how does one actually become a client? Medina said Habana has a sort of screening process for clients, as they are not yet at the point where they can ship massive quantities of Goya. Habana is sampling Goya to selected companies only at this time. That's what's written on the form you'll have to fill in if you're interested.
Not that Goya is a half-baked product, as it is used in production according to Medina. Specific names were not discussed, but yes, these include cloud vendors, so you can let your imagination run wild. Medina also emphasized its R&D on the hardware level for Goya is mostly done.
However, there is ongoing work to take things to the next level with 7 nanometer chips, plus work on the Gaudi processor for training, which promises linear scalability. In addition, development of the software stack never ceases in order to optimize, add new features and support for more frameworks. Recently, Habana also published open source Linux drivers for Goya, which should help a lot considering Linux is what powers most data centers and embedded systems.
- Top 5: Things to know about AI (TechRepublic)
- AI chips soon will power PCs, cars, security cameras (CNET)
Habana, just like GraphCore, seems to have the potential to bring about a major disruption in the AI chip market and the world at large. Many of its premises are similar: A new architecture, experienced team, well funded, and looking to seize the opportunity. One obvious difference is on how they approach their public image, as GraphCore has been quite open about their work, while Habana was a relative unknown up to now.
And the obvious questions -- which one is faster/better, which one will succeed, can they dethrone Nvidia -- we simply don't know. GraphCore has not published any benchmarks. Judging from an organization maturity point of view, Habana seems to be lagging at this point, but that does not necessarily mean much. One thing we can say is that this space is booming, and we can expect AI chip innovation to catalyze AI even further soon.
The takeaway from this, however, should be to make power efficiency a key aspect of the AI narrative going forward. Performance comes at a price, and this should be factored in.