By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Times CatalogTimes CatalogTimes Catalog
  • Home
  • Tech
    • Google
    • Microsoft
    • YouTube
    • Twitter
  • News
  • How To
  • Bookmarks
Search
Technology
  • Meta
Others
  • Apple
  • WhatsApp
  • Elon Musk
  • Threads
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Reading: A new, challenging AGI test stumps most AI models
Share
Notification
Font ResizerAa
Font ResizerAa
Times CatalogTimes Catalog
Search
  • News
  • How To
  • Tech
    • AI
    • Apple
    • Microsoft
    • Google
    • ChatGPT
    • Gemini
    • YouTube
    • Twitter
  • Coming Soon
Follow US
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Times Catalog > Blog > Tech > AI > A new, challenging AGI test stumps most AI models
AITech

A new, challenging AGI test stumps most AI models

Usama
Last updated: March 25, 2025 5:26 pm
Usama
Share
5 Min Read
A new, challenging AGI test stumps most AI models
SHARE

The Arc Prize Foundation, a nonprofit dedicated to advancing artificial intelligence research, has introduced a groundbreaking test designed to push AI models to their cognitive limits. Named ARC-AGI-2, this new benchmark aims to evaluate an AI’s ability to think abstractly, adapt to new problems, and solve complex puzzles without relying on brute-force computation.

Contents
AI Models Stumble on ARC-AGI-2What Makes ARC-AGI-2 Different?Efficiency as the New BenchmarkThe Race to 85% Accuracy: A New Challenge for AI Developers
A new, challenging AGI test stumps most AI models
a sample question from Arc-AGI-2 (credit: Arc Prize)

The results so far? A stark reality check for AI development. Even the most advanced models have struggled to achieve meaningful scores, suggesting that true artificial general intelligence (AGI) is still a distant goal.

AI Models Stumble on ARC-AGI-2

Despite the rapid progress in AI, leading models have performed dismally on ARC-AGI-2. According to the Arc Prize Foundation’s leaderboard, OpenAI’s o1-pro and DeepSeek’s R1—both designed with reasoning capabilities—scored between 1% and 1.3%. Other cutting-edge models, including GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash, hovered around just 1%.

In contrast, human participants fared significantly better. The foundation gathered data from over 400 people, forming test panels that achieved an average score of 60%—vastly outperforming AI competitors.

What Makes ARC-AGI-2 Different?

ARC-AGI-2 builds on its predecessor, ARC-AGI-1, by introducing a more refined and rigorous evaluation of intelligence. The test consists of puzzle-like problems that require AI models to recognize intricate visual patterns from colored square grids and generate correct responses. Unlike traditional benchmarks, these problems are designed to be unfamiliar, preventing models from relying on memorization or pre-existing data.

François Chollet, a prominent AI researcher and co-founder of the Arc Prize Foundation, explained that ARC-AGI-2 eliminates the shortcomings of its predecessor. A major flaw of ARC-AGI-1 was that models could leverage brute-force computation—essentially using massive processing power to derive answers rather than exhibiting true intelligence. The new iteration of the test introduces a critical new metric: efficiency.

“In intelligence, the ability to acquire and deploy capabilities efficiently is just as important as problem-solving itself,” Arc Prize Foundation co-founder Greg Kamradt emphasized in a blog post. “The key question is not just whether AI can solve a task, but how efficiently it can learn and apply new skills.”

Efficiency as the New Benchmark

One of the most revealing aspects of ARC-AGI-2 is its efficiency requirement. Unlike ARC-AGI-1, where OpenAI’s o3 model achieved a breakthrough score of 75.7%—albeit at significant computational expense—the new test demands that AI models achieve high accuracy with minimal processing cost.

A new, challenging AGI test stumps most AI models
Comparison of Frontier AI model performance on ARC-AGI-1 and ARC-AGI-2 (credit: Arc Prize).

To put this in perspective, the first AI model to excel at ARC-AGI-1, OpenAI’s o3 (low), needed approximately $200 worth of computing power per task. When applied to ARC-AGI-2, that same model managed only a 4% success rate.

The shift in focus from raw computational power to cost-effective intelligence aligns with growing concerns in the AI industry. As models become increasingly expensive to train and deploy, researchers and developers are seeking new ways to measure progress without unsustainable resource consumption.

The Race to 85% Accuracy: A New Challenge for AI Developers

With ARC-AGI-2 establishing itself as a formidable test, the Arc Prize Foundation has launched a new competition for 2025. The challenge? Achieve at least 85% accuracy on ARC-AGI-2 while spending no more than $0.42 per task.

This ambitious goal presents a unique opportunity for AI developers to showcase true innovation. Rather than relying on massive datasets or overwhelming computing power, the contest encourages researchers to build more efficient, adaptable, and truly intelligent systems.

As the AI industry continues its quest for artificial general intelligence, ARC-AGI-2 serves as a stark reminder: intelligence isn’t just about solving problems—it’s about how efficiently those solutions are learned and applied. With AI models struggling against this new benchmark, the road to AGI remains an ongoing challenge—one that demands not just greater power, but smarter, more adaptable thinking.

You Might Also Like

Logitech’s MX Creative Console now supports Figma and Adobe Lightroom

Samsung resumes its troubled One UI 7 rollout

Google Messages starts rolling out sensitive content warnings for nude images

Vivo wants its new smartphone to replace your camera

Uber users can now earn miles with Delta Air Lines

Share This Article
Facebook Twitter Pinterest Whatsapp Whatsapp Copy Link
What do you think?
Love0
Happy0
Sad0
Sleepy0
Angry0
Previous Article AI creation platform Arcade expands from jewelry to home goods AI creation platform Arcade expands from jewelry to home goods
Next Article Apple’s iOS 18.4 update with AI-powered Priority Notifications is almost here Apple’s iOS 18.4 update with AI-powered Priority Notifications is almost here
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

144FollowersLike
23FollowersFollow
237FollowersPin
19FollowersFollow

Latest News

Pinterest is prompting teens to close the app at school
Pinterest is prompting teens to close the app at school
News Tech April 22, 2025
ChatGPT search is growing quickly in Europe, OpenAI data suggests
ChatGPT search is growing quickly in Europe, OpenAI data suggests
AI ChatGPT OpenAI April 22, 2025
social-media-is-not-wholly-terrible-for-teen-mental-health-study-says
Social media is not wholly terrible for teen mental health, study says
News April 22, 2025
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
AI Gemini Google Tech April 19, 2025
Times CatalogTimes Catalog
Follow US
© 2025 Times Catalog
  • About
  • Contact
  • Privacy Policy and Disclaimer
Welcome Back!

Sign in to your account

Lost your password?