What AI experts say about Elon Musk’s new AI model.
Elon Musk’s AI venture, xAI, has officially launched Grok 3, its latest artificial intelligence model. As AI competition heats up, how does Grok 3 compare to the industry’s leading chatbots, including OpenAI’s ChatGPT, Google’s Gemini, and DeepSeek’s R1?
Musk unveiled Grok 3 in a livestream on X, introducing not just the base model but also two advanced reasoning versions: Grok 3 Reasoning (in beta) and Grok 3 Mini Reasoning. Unlike traditional generative AI models, these reasoning-based versions aim to process information in a more logical and structured way, reducing hallucinations and improving accuracy.
A Bold Claim: Grok 3 Surpasses Competitors
xAI claims Grok 3 is now the best model on the market, outperforming top AI models from OpenAI, Google, Anthropic, and DeepSeek in key benchmark tests. In fact, under the codename “Chocolate,” Grok 3 performed impressively in Chatbot Arena, an evaluation platform where AI models are tested in blind performance comparisons.
While it’s remarkable that Grok 3 has caught up to its more established competitors in such a short time, does it truly deliver a game-changing experience? Let’s dive into expert insights, performance benchmarks, and user expectations.
Grok 3 Is Impressive, but Not a ChatGPT Killer—Yet
Andrej Karpathy, a founding member of OpenAI and a former AI director at Tesla, got early access to Grok 3 and shared his first impressions. Based on standard AI evaluation tests, he found that Grok 3, powered by its new Deep Search reasoning feature, stands on par with OpenAI’s high-end models like the o1-pro ($200/month subscription plan). It also outperforms DeepSeek-R1 and Google’s Gemini 2.0 Flash Thinking in specific areas.
However, for users already subscribed to ChatGPT Plus or other premium AI services, Grok 3 may not offer enough of an advantage to justify switching. While Musk’s fans are excited about Grok 3’s rapid progress, everyday users seeking the absolute best AI experience might not find enough incentive to make the leap just yet.
AI researcher and Wharton professor Ethan Mollick echoed this sentiment:
“Grok 3 landed right at expectations. There’s no drastic disruption to the AI landscape—just continued acceleration in development. Talent, compute, and scale remain the key differentiators.”
The Missing Comparison: OpenAI’s o3 Mini vs. Grok 3
xAI showcased benchmark charts demonstrating that Grok 3 Reasoning models outperformed OpenAI’s o3 Mini and o1, DeepSeek R1, and Google Gemini 2.0 Flash Thinking across multiple reasoning tests. However, OpenAI quickly responded.
Shortly after the livestream, OpenAI product engineer Rex Asabor shared an “updated” benchmark that included the yet-to-be-released o3 model, which outperformed Grok 3 in math and science tests. This suggests that while Grok 3 is a significant leap forward, OpenAI still holds a competitive edge in critical domains.
To be fair, since OpenAI’s o3 isn’t publicly available yet, xAI might not have had access to those results. But the updated comparison highlights that Grok 3 isn’t necessarily the dominant AI model across all key areas.
Astonishing Speed of Development: xAI’s Biggest Advantage?
Despite these limitations, AI experts agree that one of Grok 3’s most impressive achievements is how quickly it reached the frontier. While OpenAI and Google have been developing cutting-edge AI for over a decade, xAI only launched in 2023—yet Grok 3 has already become a formidable competitor.
According to Musk, Grok 3 was trained with ten times the computing power of Grok 2, utilizing 200,000 GPUs. This supports a well-established principle in AI research: more computing power leads to better AI performance. However, some experts remain skeptical about whether scaling alone will continue to drive meaningful improvements.
NYU psychology and AI researcher Gary Marcus raised concerns about whether xAI’s aggressive scaling will result in significantly higher intelligence. Simply throwing more GPUs at a problem doesn’t guarantee long-term breakthroughs beyond current AI capabilities.
Grok 3 Still Faces Common AI Struggles
Despite its impressive advancements, Grok 3 isn’t free from the typical limitations seen in other AI models. Here are some areas where it still struggles:
1. Humor Falls Flat
Karpathy noted that Grok 3’s humor generation remains mediocre, often defaulting to corny dad jokes. This is a known limitation of large language models (LLMs), which tend to suffer from “mode collapse,” where their creative output becomes repetitive and predictable.
2. SVG Image Generation Is Hit-or-Miss
Grok 3 fared better than its competitors when tasked with generating an SVG image of a pelican riding a bicycle, but it still wasn’t perfect. LLMs struggle with generating complex, multi-element images because they don’t “see” the way humans do, leading to awkward and incorrect visual compositions.
3. Political and Ethical Sensitivity
Musk has positioned Grok as the “anti-woke” alternative to AI models he believes are too politically correct. However, Karpathy tested Grok 3’s stance on controversial topics and found that it was hesitant to answer politically charged questions, such as whether misgendering someone could be ethically justified if it meant saving a million lives.
Grok 3’s cautious approach suggests that, despite Musk’s promises, xAI’s model is still influenced by the public data it’s trained on, which tends to lean towards more neutral or progressive responses. Musk has vowed to make future versions of Grok more politically balanced.
Final Verdict: Should You Switch to Grok 3?
For now, Grok 3 is an impressive step forward, especially considering how quickly it has caught up to AI giants. However, unless you’re an X Premium+ subscriber ($50/month) eager to try out the latest tech, there might not be a strong enough reason to abandon ChatGPT, Gemini, or DeepSeek just yet.
As AI development accelerates, the real question isn’t just who has the best model today—it’s who can consistently push the boundaries and maintain their lead in the months and years to come.
Musk and xAI have proven they can compete at the highest level. Now, they have to prove they can lead the AI revolution.