When a new AI video generator hits the market, it doesn’t take long for someone to put it to the ultimate test: creating a video of Will Smith eating spaghetti. What began as a quirky internet challenge has morphed into something of a cultural benchmark, with developers and enthusiasts eagerly testing the realism of AI-generated videos by seeing how convincingly they can depict the Hollywood actor slurping down noodles.
The meme’s popularity has exploded to the point that even Will Smith himself got in on the joke. Back in February, the actor parodied the trend in an Instagram post, playfully acknowledging the internet’s obsession with the oddly specific visual.
This year, Google’s Veo 2, the company’s next-generation video-generation model, finally delivered a highly realistic rendition of Smith eating spaghetti. One viral tweet summed up the sentiment perfectly:
“We are now eating spaghetti at last.” — @jerrod_lew (December 17, 2024)
But Will Smith and pasta are just the tip of the iceberg. In 2024, a range of unconventional and downright bizarre AI benchmarks captivated both developers and casual observers. From AI-powered Minecraft builders to bots playing Pictionary against each other, these quirky tests brought a playful yet revealing dimension to the world of artificial intelligence.
![Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024](https://timescatalog.com/wp-content/uploads/2025/01/ezgif-4-eec25d8995.webp)
Why Weird AI Benchmarks Went Viral
It’s not like the AI industry lacks serious, academic benchmarks. Companies routinely showcase their models’ abilities by touting impressive feats, such as solving PhD-level math problems or excelling in coding competitions. Yet, for the average person, these achievements often feel abstract and disconnected from everyday life. Most of us use AI for far simpler tasks: replying to emails, conducting basic research, or generating shopping lists.
That’s where these unconventional benchmarks come in. They’re accessible, entertaining, and—most importantly—relatable. Watching an AI try (and sometimes hilariously fail) to build a Minecraft castle or depict Will Smith twirling spaghetti on a fork resonates in a way that academic metrics never could.
Ethan Mollick, a management professor at Wharton, recently critiqued the industry’s fixation on benchmarks that fail to address practical, real-world use cases. In a post on X (formerly Twitter), Mollick argued for more comparisons between AI performance and human performance in domains like medicine, legal advice, and everyday problem-solving.
“The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, and so on is a real shame, as people are using systems for these things, regardless,” Mollick wrote.
He’s not wrong. But until those benchmarks arrive, the internet seems perfectly content with its own makeshift tests, no matter how absurd they may be.
The Rise of Crowdsourced AI Challenges
Another factor driving the popularity of these unconventional benchmarks is their participatory nature. Platforms like Chatbot Arena, a public benchmarking tool, let anyone on the web rate AI performance on tasks ranging from coding a web app to creating a digital painting. While these ratings aren’t exactly scientific—users often vote based on personal preferences or subjective impressions—they provide a sense of communal involvement in the AI evaluation process.
![Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024](https://timescatalog.com/wp-content/uploads/2025/01/Screenshot-2024-09-04-at-2.07.55PM-1024x706.webp)
Image Credits: LMSYS
And then there are the independent creators who’ve taken AI testing into their own hands. In 2024, a 16-year-old developer made headlines by building an app that lets AI control Minecraft, evaluating its ability to design intricate structures. Meanwhile, a British programmer launched a platform where AI models compete in games like Pictionary and Connect 4, revealing unexpected strengths (and hilarious weaknesses) in the process.
These grassroots experiments highlight a key truth: AI benchmarks don’t have to be perfect or empirical to be valuable. They’re a way for the public to engage with technology that often feels opaque and intimidating. Plus, let’s be honest: they’re fun.
The Limits of Quirkiness
Of course, not all quirky benchmarks are created equal. Just because an AI can flawlessly generate a video of Will Smith eating spaghetti doesn’t mean it’s ready to revolutionize video editing or content creation. Similarly, an AI’s ability to build a stunning Minecraft castle doesn’t necessarily translate to practical skills like architectural design or urban planning.
![Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024](https://timescatalog.com/wp-content/uploads/2025/01/Screenshot-2024-11-04-at-6.15.53PM.webp)
But these tests serve a different purpose. They humanize AI by presenting its capabilities (and shortcomings) in ways that are easy to understand. They’re also a reminder that AI isn’t just a tool for solving complex academic problems—it’s a technology that’s becoming deeply embedded in our culture and daily lives.
The Future of AI Benchmarks
So, what’s next for AI benchmarking? One expert I spoke to suggested shifting the focus from narrow, domain-specific tasks to evaluating the broader societal impacts of AI. How does an AI system affect employment? What are its implications for privacy, security, and ethics? These are the kinds of questions that matter most as AI becomes increasingly powerful and pervasive.
Still, it’s hard to imagine a world where weird benchmarks don’t have a place. They’re entertaining, approachable, and—for better or worse—extremely shareable. As my colleague Max Zeff recently pointed out, the AI industry is still figuring out how to distill an incredibly complex technology into digestible narratives for the public. Quirky benchmarks like Minecraft castles and spaghetti-slurping celebrities do that job remarkably well.
The only question now is: what will 2025 bring? AI-generated dogs playing poker? A bot that can freestyle rap better than your favorite artist? Whatever it is, one thing’s for sure: the world of AI testing is only going to get weirder—and more wonderful.