Close Menu
Gossips Today
  • Tech & Innovation
  • Healthcare
  • Personal Finance
  • Lifestyle
  • Travel
  • Business
  • Recipes
What's Hot

Grammarly secures $1B in non-dilutive funding from General Catalyst

Data security concerns hamper patient portal uptake: survey

This Italian Region Is a Trending Destination This Summer—and It Has 311 Miles of Gorgeous Coastline and Turquoise Waters

Facebook X (Twitter) Instagram
Friday, May 30
Gossips Today
Facebook X (Twitter) Instagram
  • Tech & Innovation

    Grammarly secures $1B in non-dilutive funding from General Catalyst

    May 30, 2025

    Meet LoveJack, the dating app designed for users to find love using just five words

    May 29, 2025

    Founder Sahil Lavingia says he was booted from DOGE after just 55 days 

    May 29, 2025

    Litehaus raises €1.46M pre-seed to build home-building platform

    May 28, 2025

    Anthropic launches a voice mode for Claude

    May 28, 2025
  • Healthcare

    Data security concerns hamper patient portal uptake: survey

    May 30, 2025

    GLP-1 prescriptions for weight loss are shooting up, despite obstacles

    May 29, 2025

    Athenahealth names CVS alum as CFO

    May 29, 2025

    FDA sets COVID vaccine formula as RFK Jr. narrows guidance for shots

    May 28, 2025

    Medicaid cuts could threaten healthcare access for young adults: report

    May 28, 2025
  • Personal Finance

    16 Budgeting Tips to Manage Your Money Better

    May 28, 2025

    How to Stick to a Budget

    May 20, 2025

    4 Steps to Navigate Marriage and Debt

    May 11, 2025

    Buying a Fixer-Upper Home: What to Know

    May 10, 2025

    How to Talk to Your Spouse About Money

    May 10, 2025
  • Lifestyle

    The Getup: Sand

    May 25, 2025

    Your Summer Style Starts Here: 17 Memorial Day Sale Picks to Grab Now + 4 Getups

    May 24, 2025

    3 Fixes If You Hate the Way Your Pants Fit (That Have Nothing to Do with Your Waist Size)

    May 14, 2025

    On Sale Now: 9 Nike Sneakers Under $100 You’ll Want to Wear All Summer

    May 10, 2025

    Get the Look: Chateau Vibes, Courtyard Rates

    May 8, 2025
  • Travel

    This Italian Region Is a Trending Destination This Summer—and It Has 311 Miles of Gorgeous Coastline and Turquoise Waters

    May 29, 2025

    I Go To the Beach Every Week, This Is Everything I Pack For Optimal Comfort and Relaxation

    May 29, 2025

    This Sleepy Island at the Bottom of the World Is a Culinary Powerhouse—Here's Where to Eat and Drink

    May 28, 2025

    Sedona, Arizona's Newest Hotel Has Rooms From Less Than $230 a Night—Plus Live Music at Sunset, Firepits, and a Beautiful Pool

    May 28, 2025

    Britney Spears Broke This Major FAA Rule and Was Met by Authorities Upon Landing

    May 27, 2025
  • Business

    Supreme Court clears way for Utah oil project, scaling back a key environmental law

    May 29, 2025

    The hidden impact of sunny days on office productivity  

    May 29, 2025

    ‘You die if you stop moving:’ Advertising legend David Droga on his plans for semi-retirement

    May 28, 2025

    Data is democratizing ethical consumption  

    May 28, 2025

    Welcome.US CEO Nazarin Ash on the value refugees bring to the economy—and corporate America

    May 27, 2025
  • Recipes

    one-pan ditalini and peas

    May 29, 2025

    eggs florentine

    May 20, 2025

    challah french toast

    May 6, 2025

    charred salt and vinegar cabbage

    April 25, 2025

    simplest brisket with braised onions

    April 2, 2025
Gossips Today
  • Tech & Innovation
  • Healthcare
  • Personal Finance
  • Lifestyle
  • Travel
  • Business
  • Recipes
Technology & Innovation

Meta’s benchmarks for its new AI models are a bit misleading

gossipstodayBy gossipstodayApril 7, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Meta's benchmarks for its new ai models are a bit
Share
Facebook Twitter LinkedIn Pinterest Email

One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM Arena, a test that has human raters compare the outputs of models and choose which they prefer. But it seems the version of Maverick that Meta deployed to LM Arena differs from the version that’s widely available to developers.

As several AI researchers pointed out on X, Meta noted in its announcement that the Maverick on LM Arena is an “experimental chat version.” A chart on the official Llama website, meanwhile, discloses that Meta’s LM Arena testing was conducted using “Llama 4 Maverick optimized for conversationality.”

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. But AI companies generally haven’t customized or otherwise fine-tuned their models to score better on LM Arena — or haven’t admitted to doing so, at least.

The problem with tailoring a model to a benchmark, withholding it, and then releasing a “vanilla” variant of that same model is that it makes it challenging for developers to predict exactly how well the model will perform in particular contexts. It’s also misleading. Ideally, benchmarks — woefully inadequate as they are — provide a snapshot of a single model’s strengths and weaknesses across a range of tasks.

Indeed, researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena. The LM Arena version seems to use a lot of emojis, and give incredibly long-winded answers.

Okay Llama 4 is def a littled cooked lol, what is this yap city pic.twitter.com/y3GvhbVz65

— Nathan Lambert (@natolambert) April 6, 2025

for some reason, the Llama 4 model in Arena uses a lot more Emojis

on together . ai, it seems better: pic.twitter.com/f74ODX4zTt

— Tech Dev Notes (@techdevnotes) April 6, 2025

We’ve reached out to Meta and Chatbot Arena, the organization that maintains LM Arena, for comment.

benchmarks bit Metas misleading models
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHealthcare, medtech industries brace for tariffs
Next Article Trump scraps Biden’s plan to cover Wegovy and Zepbound through Medicare
admin
gossipstoday
  • Website

Related Posts

Grammarly secures $1B in non-dilutive funding from General Catalyst

May 30, 2025

Meet LoveJack, the dating app designed for users to find love using just five words

May 29, 2025

Founder Sahil Lavingia says he was booted from DOGE after just 55 days 

May 29, 2025
Leave A Reply Cancel Reply

Demo
Trending Now

Grammarly secures $1B in non-dilutive funding from General Catalyst

Data security concerns hamper patient portal uptake: survey

This Italian Region Is a Trending Destination This Summer—and It Has 311 Miles of Gorgeous Coastline and Turquoise Waters

Supreme Court clears way for Utah oil project, scaling back a key environmental law

Latest Posts

Grammarly secures $1B in non-dilutive funding from General Catalyst

May 30, 2025

Data security concerns hamper patient portal uptake: survey

May 30, 2025

This Italian Region Is a Trending Destination This Summer—and It Has 311 Miles of Gorgeous Coastline and Turquoise Waters

May 29, 2025

Subscribe to News

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Advertisement
Demo
Black And Beige Minimalist Elegant Cosmetics Logo (4) (1)
Facebook X (Twitter) Pinterest Vimeo WhatsApp TikTok Instagram

Categories

  • Tech & Innovation
  • Health & Wellness
  • Personal Finance
  • Lifestyle & Productivity

Company

  • About Us
  • Contact Us
  • Advertise With Us

Services

  • Privacy Policy
  • Terms & Conditions
  • Disclaimer

Subscribe to Updates

© 2025 Gossips Today. All Right Reserved.

Type above and press Enter to search. Press Esc to cancel.