Apple builds a slimmed-down AI model using Stanford, Google innovations

The phone giant's open-source large language model beats previous models by melding the insights of many researchers.
Written by Tiernan Ray, Senior Contributing Writer
Apple logo
Jeenah Moon/Bloomberg via Getty Images

The world is watching to see what Apple will do to counter the dominance of Microsoft and Google in generative AI. Most assume the tech giant's innovations will take the form of neural nets on the iPhone and other Apple devices. Small clues are popping up here and there hinting at what Apple is working on.

Also: How Apple's AI advances could make or break the iPhone 16

Apple last week introduced OpenELM, an "embedded" large language model (LLM) that runs on mobile devices and essentially mashes together the breakthroughs of several research institutions, including Google's deep learning scholars and academics at Stanford and elsewhere.

All of OpenELM's code is posted on GitHub, along with various documentation for its training approach. Apple has also detailed its work in a paper by Sachin Mehta and team, "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework", posted on the arXiv pre-print server.

Apple's researchers used a neural net with just 1.3 billion neural weights, or, parameters, suggesting the company is focusing on mobile devices. That number is far below the hundreds of billions of parameters used by models such as OpenAI's GPT-4 and Google's Gemini. More parameters directly increases the amount of memory required -- a smaller neural net could fit into a mobile device more easily.

OpenELM would be rather unremarkable without a key contribution: efficiency. The researchers adjust the layers of the deep neural network so that the AI model is more efficient than earlier models in how much data needs to be computed when training the neural network. Specifically, they can meet or beat the results of a slew of neural nets for mobile computing "while requiring 2× fewer pre-training tokens", where tokens are the individual characters, words, or sentence fragments in the training data.

Also: 2024 may be the year AI learns in the palm of your hand

Apple starts from the same approach as many LLMs: a transformer. The transformer is the signature neural net in language understanding, introduced by Google scientists in 2017. Every major language model since, including Google's BERT and OpenAI's GPT family of models, has adopted the transformer.

Apple achieves high efficiency by melding the transformer with a technique introduced in 2021 by researchers at the University of Washington, Facebook AI Research, and the Allen Institute for AI, called DeLighT. That work broke away from the conventional approach in which all the neural weights are the same for every "layer" of the network, the successive mathematical computations through which the data passes.

Instead, the researchers selectively adjusted each layer to have a different number of parameters. Because some layers have relatively few parameters, they called their approach a "deep and light-weight transformer," hence the name, DeLighT.

Also: Snowflake says its new LLM outperforms Meta's Llama 3 on half the training

The researchers say that "DeLighT matches or improves the performance of baseline Transformers with 2 to 3 times fewer parameters on average." Using DeLighT, Apple created OpenELM, where each layer of the neural net has a distinct number of neural parameters, a non-uniform approach to parameters. 

"Existing LLMs use the same configuration for each transformer layer in the model, resulting in a uniform allocation of parameters across layers," Mehta and his team wrote. "Unlike these models, each transformer layer in OpenELM has a different configuration (e.g., number of heads and feed forward network dimension), resulting in variable number of parameters in each layer of the model."

The non-uniform approach, they write, "lets OpenELM better utilize the available parameter budget for achieving higher accuracies."

Also: Yikes! Microsoft Copilot failed every single one of my coding tests

The competition Apple measures itself against uses similarly small neural nets, such as MobiLlama from Mohamed bin Zayed University of AI and collaborating institutions, and OLMo, introduced in February 2024 by researchers at the Allen Institute for Artificial Intelligence and scholars from the University of Washington, Yale University, New York University, and Carnegie Mellon University.

Apple's experiments are not carried out on a mobile device. Instead, the company uses an Intel-based Ubuntu Linux workstation with a single Nvidia GPU.

On numerous benchmark tests, OpenELM achieves better scores, despite being smaller and/or using fewer tokens. For example, on six out of seven tests, OpenELM beats OLMo despite having fewer parameters -- 1.08 billion versus 1.18 billion -- and only 1.5 trillion training tokens versus 3 trillion for OLMo.

Also: How to avoid the headaches of AI skills development

Although OpenELM can produce more accurate results more efficiently, the authors noted further research areas where OpenELM is slower in some cases at producing its predictions.

Reports have suggested that Apple may license AI tech for iOS 18 integration from Google, OpenAI, or another leading AI company. Apple's investment in open-source software confers the intriguing possibility that the company might be trying to reinforce an open ecosystem from which its own devices can benefit.

Editorial standards