Amazon Web Services on Tuesday announced a new EC2 instance powered by its custom-built Inferentia chip. Inferentia, first announced at last year's re:Invent conference, is a high-performance machine learning inference chip. It offers very high throughput, low latency and sustained performance -- at a cost-effective price, AWS says.
"If you do a lot of machine learning at scale and in production… you know the majority of your costs are in predictions," AWS CEO Andy Jassy said at the AWS re:Invent conference in Las Vegas.
The Inf1 instances will have low latency, 3X higher throughput and up to 40 percent lower cost per inference compared to Nvidia G4 chips, he said.
Machine learning -- which comprises training algorithms and inference -- is quickly becoming an integral part of every application, but it comes with some unique demands. Inference can be costly, and it demands low latency and high throughput.
Inference -- during which a trained machine learning model is actually put to work -- can easily account for the vast majority of the cost associated with a machine learning system. For instance, every time Alexa interprets a command from a user, it's performing inference. Every time a machine learning model trained to perform object recognition for a self-driving car spots an object in the road, it's performing inference.
In these scenarios, latency is of clear importance, to varying degrees. The faster Alexa interprets your command, the faster it can respond. The faster a self-driving car identifies an object in the road, the faster it can avoid a collision.
The custom chip puts new competitive pressure on Amazon's suppliers -- namely, Intel and Nvidia. AWS's investments in custom chips sends the clear message that AWS isn't going to let its tech supply chain constrain its pace of innovation.