The empire strikes back: How Intel is stepping up its AI game

The old approach of throwing out silicon building blocks and expecting developers to get on board is over as Intel looks to claw back developer mindshare in the machine-learning training space.
Written by Chris Duckett, Contributor

When you ask Intel how it is doing in the realm of artificial intelligence, you will likely hear a figure saying how 97 percent of all AI workloads run on Xeon hardware. But that number is far from being the full story.

In the realm of neural networks, the chip giant has a real competitor in the guise of Nvidia, after Intel has, by its own admission, been "absent" in the area.

"It's really not too late ... we're in the first half of the first innings," Barry Davis, Intel general manager of Accelerated Workloads Group, told ZDNet last week.

Intel is changing the way it approaches AI, after its old playbook didn't have the best results.

"[In the past] we would have said: 'If you build it, they will come. Here's our silicon, here's our software, and the BIOS and low-level code, have at it.' It's not enough anymore," Davis said.

The general manager conceded that Intel didn't understand the market well enough, and has moved from focusing purely on silicon to looking at solutions.

"We would talk a lot about building blocks, we would talk a lot about technology," he said. "Intel is one of the largest software developers in the world, [but] nobody knows it.

"Our partners, our customers are looking to us to not just give them this silicon, a driver, and maybe a little piece of application on top as a reference code and have them do all the work themselves.

"We've got to prevalidate, which is a huge task ... we have to build the silicon, build the software or work with parties on software, validate all that together -- which is the new thing making sure that validation is done -- make sure there is an ecosystem of support out there, and then hand that to someone to sell."

One facet of how Intel is redoing its AI approach is to make sure frameworks run on its hardware. An example of this is the partnership struck this week between the chip giant and Preferred Networks, the Japanese maker of deep-learning framework Chainer. Under the deal, Chainer will also have Xeon support alongside its existing Cuda implementation.

Davis said that in the future, the chip maker will be making further such deals around frameworks, as the battle for developer mindshare moves up the stack into applications.

"If you go to [Preferred Networks] website, they're doing a lot of work with Nvidia. Frankly, we were absent for a while, we weren't there, now we are -- Chainer is a good example, Tensorflow is a good example with our buddies at Google ... the work we've done in Caffe that has all been upstreamed ... working on MXNet for Amazon.

"If I go back to the parallel in the HPC market, years ago it was all about tuning the application, getting those applications written. Now in AI, with the way this market has developed with the frameworks as that intermediate layer, now I think more general-purpose AI applications will start coming out, and they will come out at Intel, and GPUs as well and that's okay, because it's all going to be a kind of level playing field."

According to Davis, part of the reason why it is still early in the AI game is because it is far from being standard enterprise functionality at the moment.

"Every enterprise will have some datacentre with some level of workload -- I cannot say that about AI in those datacentres," he said.

"Lots of AI today is happening at the Googles, Microsofts, Amazons of the world -- the big CSPs [cloud service providers] -- and they do all the work themselves, and that's true for all workloads. They can do whatever they want, whenever they want, it's not deployed into the general market yet -- not really."

When those CSPs begin to launch AI-as-a-service, Davis expects both Intel and Nvidia to be supported.

"When the cloud guys put out a service, typically they want to put out everything. If Intel is there and Nvidia is there, they are going to want to have both, because they don't want to have to presuppose what their customer base is going to want to use."

With a fast-moving market -- such as Intel's new Lake Crest architecture for AI training, Google boosting inference on its custom-built Tensor Processing Unit, and IBM and Nvidia joining forces to offer Tesla P100s within its IBM Cloud -- Davis said anyone that says they know where the market is heading is lying.

"It's changing so fast," he said. "You can't draw a line anymore, because it is all over the place, which means it is exciting and scary at the same time."

Releasing entirely new architectures is not something that Intel is commonly known for, but that is exactly what it is doing with Lake Crest and its upcoming Crest lineup.

Chief technology officer of Intel Artificial Intelligence Product Group Amir Khosrowshahi told ZDNet last week that Lake Crest came about after his company Nervana, which was subsequently purchased by Intel, had been looking at optimising parts of Nvidia's AI ecosystem.

"There is so much circuitry in a GPU that is not necessary for machine learning ... this is crud that has accumulated over time, and it's quite a lot of stuff," Khosrowshahi said. "You don't need the circuitry which is quite a large proportion of the chip and also high cost in energy utilisation.

"Neural networks are quite simple, they are little matrix multiplications and non-linearities, you can directly build silicon to do that. You can build silicon that is very faithful to the architecture of neural networks, which GPUs are not."

To Davis, the timing is right to introduce developers to a new architecture.

"You do have to take a leap of faith at some point, but you don't want to do it every generation," Davis said. "We want to establish Lake Crest, Spring Crest as an architectural capability, but two years from now, three years from now, I don't want to have to start all over again -- developers will kill us.

"You take the leap when you need to, but you don't keep thrashing them."

Lake Crest will be an accelerator card, but as the architecture matures, Intel will bring it closer to the CPU.

"You have to start that somewhere, you have to start the ecosystem development, you have to establish the architecture, and you do that in a discrete solution and then you migrate it towards the CPU over time. Now you only do that if there is a good reason to -- we believe there is a chance there's a good reason to," Davis stated.

"If I just said to myself: 'Nvidia is good enough' ... that might be good. Three or four years from now, when deep learning is tightly integrated into the workflow of a user in the datacentre, what's going to happen at that point in time?

"Now all of a sudden, this thing is odd man out. It's not going to be integrated, it's going to be an attached co-processor; it's going to have execution flows that stop, go up here, do something up here, come back down; latency is going to be increased; performance is going to be reduced."

By bringing Lake Crest closer to the CPU, Davis said that does not necessarily mean similar silicon, but it could be instructions and architecture features.

"Lake Crest as a capability today, it's plus 300W, that's not going to show up in a car any time soon, or in an edge device.

"It's the algorithms and the IP in the architecture which will show up. It's the capability, it's the processing elements. It's Lake Crest the architecture, not the chip, will impact our microcode and the way we develop instruction sets.

"There's already some instructions we are looking at for CPUs that came out of the Nervana guys."

As devices at the edge of networks become more sensitive to power-consumption restraints, Davis said he doesn't believe there will be a great deal of training going on, but rather inference. And for power-efficient inference, Intel has an ace up its sleeve in the form of field-programmable gate arrays [FPGAs] due to its Altera purchase in 2015.

"A year ago, it was all about 32-bit training, overnight it went to 16-bit -- now we went from single precision to half-precision -- now they are talking about 8-bit, 4-bit, 3-bit, and 2-bit. Yes, a GPGPU or a CPU for that matter can do 8-bit training, but not with any power savings against 16-bit," Davis said. "That's where a FPGA comes in.

"If overnight, all of a sudden, 8-bit inference becomes important, we can do that in an FPGA. You want to put that into a GPGPU, you need two years ... you need that hardware development cycle to get that out.

"I'll admit I don't think we bought Altera for AI -- but we did get a unique solution that will support our AI strategy, and we got a little lucky."

Wherever the market chooses to go, Intel is more than likely to be there with a silicon offering. The chip giant's AI portfolio now contains a quartet of processors: The standard Xeon, the higher-performance Xeon Phi, a Xeon coupled with Lake Crest, and a Xeon paired with Arria 10 FPGA. Despite criticism that it is taking a shotgun approach, Davis retorted it is a best-tool-for-the-job approach.

"Everyone is going to have a little bit of a different solution, and we are not exactly sure where everything is going to go."

Disclosure: Chris Duckett attended Intel AI Day as a guest of Intel

Editorial standards