Claude 3 overtakes GPT-4 in the duel of the AI bots. Here's how to get in on the action

Anthropic's Claude 3 Opus AI model snagged first place from OpenAI's GPT-4 at Chatbot Arena, a site that pits one chatbot model against another.
Written by Lance Whitney, Contributor
Anthropic's Claude 3 AI
Screenshot by Lance Whitney/ZDNET

Move over, GPT-4. Another AI model has taken over your territory, and his name is Claude.

This week, Anthropic's Claude 3 Opus AI LLM took first place among the rankings at Chatbot Arena, a website that tests and compares the effectiveness of different AI models. With one of the GPT-4 variants pushed down to second place in the site's Leaderboard, this marked the first time that Claude surpassed an AI model from OpenAI.

The Chatbot Arena Leaderboard
Chatbot Arena

Available at the Claude 3 website and as an API for developers, Claude 3 Opus is one of three LLMs recently developed by Anthropic, with Sonnet and Haiku completing the trio. Comparing Opus and Sonnet, Anthropic touts Sonnet as two times faster than the previous Claude 2 and Claude 2.1 models. Opus offers speeds similar to that of the prior models, according to the company, but with much higher levels of intelligence.

Also: The best AI chatbots: ChatGPT and alternatives

Launched last May, Chatbot Arena is the creation of the Large Model Systems Organization (LMYSY Org), an open research organization founded by students and faculty from the University of California, Berkeley. The goal of the arena is to help AI researchers and professionals see how two different AI LLMs fare against each other when challenged with the same prompts.

The Chatbot Arena uses a crowdsourced approach, which means that anyone is able to take it for a spin. The arena's chat page presents screens for two out of a possible 32 different AI models, including Claude, GPT-3.5, GPT-4, Google's Gemini, and Meta's Llama 2. Here, you're asked to type a question in the prompt at the bottom. But you don't know which LLM is randomly and anonymously picked to tackle your request. They're simply labeled Model A and Model B.

Also: What does GPT stand for? Understanding GPT 3.5, GPT 4, and more

After reading both responses from the two LLMs, you're asked to rate which answer you prefer. You can give the nod to A or B, rate them both equally, or select a thumbs down to signal that you don't like either one. After you submit your rating, only then are the names of the two LLMs revealed.

Choose your favorite response
Chatbot Arena

Counting the votes submitted by users of the site, the LMYSY Org compiles the totals on the leaderboard showing how each LLM performed. With the latest rankings, Claude 3 Opus received 33,250 votes with second-place GPT-4-1106-preview garnering 54,141 votes.

To rate the AI models, the leaderboard turns to the Elo ranking system, a method commonly used in games such as chess to measure the effectiveness of different players. Using the Elo system, the latest leaderboard gave Claude 3 Opus a ranking of 1253 and GPT-4-1106-preview a ranking of 1251.

Other LLM variants that fared well in the latest duel include GPT-4-0125-preview, Google's Gemini Pro, Claude 3 Sonnet, GPT-4-0314, and Claude 3 Haiku. With GPT-4 no longer in first place and all three of the latest Claude 3 models among the top 10, Anthropic is definitely making more of a splash in the overall AI arena.

Editorial standards