Microsoft makes performance, speed optimizations to ONNX machine-learning runtime available to developers

Microsoft is making new additions to the open-sourced ONNX Runtime to provide developers with access to advances it has made to deep-learning models used for natural-language processing.

onnxoptimization.jpg

Credit: Microsoft

Microsoft is open sourcing and integrating some updates it it has made in deep-learning models used for natural-language processing. On January 21, the company announced it is making available to developers these optimizations by integrating them into the ONNX Runtime.

Microsoft and Facebook announced the ONNX (Open Neural Network Exchange) format in 2017 in the name of enabling developers to move deep-learning models between different AI frameworks. Microsoft open sourced the ONNX Runtime, which is the inference engine for models in the ONNX format, in 2018.

The new ONNX optimizations come from work the Bing team has done around BERT (Bidirectional Encoder Representations from Transformers). BERT is unlike previous deep-neural-network architectures that process words individually. Instead BERT uses a model type called transformers.

Microsoft execs have said that deep learning is widely used across the Bing search stack to run a number of its "intelligent" features. Natural-language models are used to improve Bing's understanding of search intent and related web pages.

Microsoft officials say that inferencing BERT at high scale can be extremely costly and sometimes not possible due to strict latency constraints. But in November, the Bing team announced it had delivered "its largest improvement in search experience" by using Azure graphics processing units (GPUs). (Google also has used BERT to make big advances in its own search work.)

Microsoft Bing and ONNX Runtime team have been working together to build the fastest, cheapest, and easiest way to run large transformer networks in production. Microsoft officials say the resulting technology offers more than ten times improvements in latency and over 800 times improvements in throughput.

Microsoft is incorporating these updates in the ONNX Runtime. Officials say that developers will be able to use the updated ONNX Runtime on any cloud or on premises with a choice of CPU or GPU.

Microsoft increasingly is using the ONNX Runtime to run advanced AI models across the company's various products and services, including Bing, Office, Azure computer vision and more. Many of the deep learning models from Microsoft's Project Turing -- machine reading comprehension for the semantic search portion of Microsoft Search -- runs on the ONNX runtime, as well.