Apple claims its on-device AI system ReaLM 'substantially outperforms' GPT-4

Apple's new AI models could let Siri remember your conversation history, understand what's on your iPhone screen, and be aware of surrounding activities, such as recognizing the music playing in the background.
Written by Maria Diaz, Staff Writer
Siri on iPhone

A smarter Siri could be coming to the iPhone. 

Maria Diaz/ZDNET

We know Apple is working on a series of AI announcements for WWDC 2024 in June, but we don't yet know exactly what these will entail. Enhancing Siri is one of Apple's main priorities, as iPhone users regularly complain about the assistant. Apple's AI researchers this week published a research paper that may shed new light on Apple's AI plans for Siri, maybe even in time for WWDC.

The paper introduces Reference Resolution As Language Modeling (ReALM), a conversational AI system with a novel approach to improving reference resolution. The hope is that ReALM could improve Siri's ability to understand context in a conversation, process onscreen content, and detect background activities. 

Also: OpenAI's Voice Engine can clone a voice from a 15-second clip. Listen for yourself

Treating reference resolution as a language modeling problem breaks from traditional methods focused on conversational context. ReaLM can convert conversational, onscreen, and background processes into a text format that can then be processed by large language models (LLMs), leveraging their semantic understanding capabilities.

The researchers benchmarked ReaLM models against GPT-3.5 and GPT-4, OpenAI's LLMs that currently power the free ChatGPT and the paid ChatGPT Plus. In the paper, the researchers said their smallest model performed comparatively to GPT-4, while their largest models did even better.

"We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for onscreen references," the researchers explained in the paper. "We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it."

Also: An AI model with emotional intelligence? I cried, and Hume's EVI told me it cared

The paper lists four sizes of the ReALM model: ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B. The "M" and "B" indicate the number of parameters in millions and billions, respectively. GPT-3.5 has 175 billion parameters while GPT-4 reportedly boasts about 1.5 trillion parameters. 

"We show that ReaLM outperforms previous approaches, and performs roughly as well as the state of the art LLM today, GPT-4, despite consisting of far fewer parameters," the paper states.

Apple has yet to confirm whether this research will play a role in iOS 18 or its latest devices.

Editorial standards