Close Menu
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Facebook X (Twitter) WhatsApp
Trending
  • Roman bath clog: The world’s oldest shower shoes were found at a fort along Hadrian’s Wall
  • Sea ice loss in the Arctic has triggered a critical tipping point that’s destroying the food chain
  • ‘A disease anywhere can be a disease everywhere tomorrow morning’: Public health expert on Ebola and the threat of future outbreaks
  • Thanks to natural selection, Indigenous Andeans may digest potatoes better than anyone else in the world, study finds
  • Doctors need to understand patients' lived experiences to treat them well — but medical schools may stop requiring that training
  • This yeast-based 3D printed biomaterial could one day replace your wallpaper and drapes
  • Flesh-eating screwworm found in Texas cow. Are humans at risk?
  • New Velociraptor cousin was a ‘4-winged’ dragon that hunted prey from the trees of ancient China, fossil find hints
Facebook X (Twitter) WhatsApp
Baynard Media
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Baynard Media
Home»Tech»Scale AI launches Seal Showdown, an alternative to LMArena leaderboard
Tech

Scale AI launches Seal Showdown, an alternative to LMArena leaderboard

EditorBy EditorSeptember 22, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

In the years since OpenAI launched ChatGPT to the world, kicking off the generative AI boom, developers have relied on LMArena (previously Chatbot Arena) as the default AI leaderboard. Now, Scale AI is bringing some much-needed competition to the AI benchmarking space with its new Seal Showdown benchmarking tool.

Like LMAerna, Seal Showdown allows users to test various AI models head-to-head and vote on which one performs better. However, Scale AI says that unlike LMAerna, Seal Showdown will more closely reflect how everyday users feel about various models. In an X post, Scale CEO Jason Droege said that Seal Showdown “actually captures real preferences, powered by a platform used by real people.”


This Tweet is currently unavailable. It might be loading or has been removed.

“Most benchmarks rely on synthetic tests (coding puzzles, math problems) or feedback from a small slice of people,” said Scale AI’s head of product, Janie Gu, in a blog post. “They miss the full spectrum of how real people actually use models in their daily lives. By treating diverse users as a monolith and lumping all feedback into one generalized score, critical nuance is lost.”

Scale AI launched its Safety, Evaluations, and Alignment Lab (SEAL) leaderboards last year, but these leaderboards relied on expert evaluations. Now, ScaleAI will offer leaderboards based on user testing, offering an alternative to the LMArena.

The startup says its new benchmarking tool is based on real-world use and feedback from “users spanning over 100 countries, 70 languages, and 200 professional domains.” (The company also provided the precise methodology for Seal Showdown.)

Mashable Light Speed

“Showdown introduces something never before seen in public leaderboards: rich user segmentation,” Gu wrote in the blog post announcing the project. “Because rankings are derived from conversations that contributors have on Scale’s Outlier platform, Scale is able to verify each user’s country, education level, profession, language, and age — enabling anyone to see how models perform for people like them.”

Because of this demographic information, Scale AI will be able to show which models are most popular according to specific regions, languages, ages, or use cases.

The criticism that Scale AI has with existing leaderboards is that they “rely heavily on hobbyist participation” and that current rankings are “based on a narrow group of users and their interests,” which leads to a misrepresentation of how those LLMs perform in general use.

LMArena has also been criticized for bias against open models. Critics say that LMArena’s system favors frontier models from big AI companies like Google, xAI, and OpenAI. However, Scale AI’s solution may not be ideal, either. The initial leaderboard results overwhelmingly rank GPT-5 the highest, which may merely reflect user preference rather than objective performance.

The updated SEAL leaderboards are live now. Currently, GPT-5 tops all of the benchmark categories, a stark contrast to LMArena, where Google’s Gemini 2.5 Pro, 2.5 Flash, and Veo 3 lead most of the leaderboard categories. 


Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

Topics
Artificial Intelligence
OpenAI



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleRyder Cup 2025: Luke Donald ‘proud’ of Team Europe’s pay stance as Keegan Bradley defends Team USA’s stance for Bethpage Black | Golf News
Next Article Bill Maher and Rob Reiner clash over political dialogue on podcast
Editor
  • Website

Related Posts

Tech

iPhone exploit DarkSword has been released in the wild

March 24, 2026
Tech

The U.S. router ban: Everything you need to know

March 24, 2026
Tech

Underage sexual content, self-harm info targeted by OpenAI’s new open-source prompts

March 24, 2026
Add A Comment

Comments are closed.

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Recent Posts
  • Roman bath clog: The world’s oldest shower shoes were found at a fort along Hadrian’s Wall
  • Sea ice loss in the Arctic has triggered a critical tipping point that’s destroying the food chain
  • ‘A disease anywhere can be a disease everywhere tomorrow morning’: Public health expert on Ebola and the threat of future outbreaks
  • Thanks to natural selection, Indigenous Andeans may digest potatoes better than anyone else in the world, study finds
  • Doctors need to understand patients' lived experiences to treat them well — but medical schools may stop requiring that training
calendar
June 2026
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« May    
Recent Posts
  • Roman bath clog: The world’s oldest shower shoes were found at a fort along Hadrian’s Wall
  • Sea ice loss in the Arctic has triggered a critical tipping point that’s destroying the food chain
  • ‘A disease anywhere can be a disease everywhere tomorrow morning’: Public health expert on Ebola and the threat of future outbreaks
About

Welcome to Baynard Media, your trusted source for a diverse range of news and insights. We are committed to delivering timely, reliable, and thought-provoking content that keeps you informed
and inspired

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Facebook X (Twitter) Pinterest WhatsApp
  • Contact Us
  • About Us
  • Privacy Policy
  • Disclaimer
  • UNSUBSCRIBE
© 2026 copyrights reserved

Type above and press Enter to search. Press Esc to cancel.