The Misleading Nature of IQ as an AI Benchmark
Artificial intelligence is advancing at an astonishing pace, and many observers are eager to quantify its progress. OpenAI CEO Sam Altman recently claimed that AI is improving so quickly that its “IQ” is increasing by roughly one standard deviation every year. While this makes for a compelling soundbite, the notion of measuring AI intelligence using IQ is fundamentally flawed.
Altman is not alone in this comparison. AI enthusiasts and influencers have frequently attempted to test and rank AI models based on IQ scores. But many experts argue that this approach is not only misleading but also fails to capture the true nature of AI’s capabilities.
IQ: A Test Designed for Humans, Not Machines
IQ tests were developed to measure specific human cognitive abilities, particularly logical reasoning and abstract problem-solving. However, these tests do not capture the full spectrum of human intelligence, such as creativity, emotional intelligence, or practical problem-solving skills. As Oxford researcher Sandra Wachter points out, equating AI performance on an IQ test with human intelligence is an apples-to-oranges comparison.
“IQ is a tool to measure human capabilities — a contested one, no less — based on what scientists believe human intelligence looks like,” Wachter explains. “But you can’t use the same measure to describe AI capabilities. A car is faster than a human, and a submarine dives better. But that doesn’t mean they surpass human intelligence.”
IQ tests are inherently relative, not absolute, measures of intelligence. They compare an individual’s abilities to those of a broader population, not to an objective standard of intelligence. Since AI is fundamentally different from human cognition, using IQ as a benchmark for AI progress is both inaccurate and misleading.
The Biases and Limitations of IQ Tests
Beyond their limited scope, IQ tests have a controversial history, with some historians tracing their origins to eugenics—the discredited belief in improving populations through selective breeding. This problematic legacy highlights the test’s inherent biases, particularly in favoring individuals familiar with Western cultural norms and traditional educational systems.
Psychologist Os Keyes, a researcher on ethical AI, argues that IQ tests are not only ideologically flawed but also easily gamed by AI models.
“[These] tests are pretty easy to game if you have a practically infinite amount of memory and patience,” Keyes says. “IQ tests are a highly limited way of measuring cognition, sentience, and intelligence, something we’ve known since before the invention of the digital computer itself.”


Unlike humans, AI models have virtually unlimited memory and can rapidly process vast amounts of information. Many AI models are trained on publicly available web data, which often includes IQ test questions and answers. As a result, their strong performance on these tests is more a reflection of their training than of any innate intelligence.
AI researcher Mike Cook from King’s College London underscores this point:
“Tests tend to repeat very similar patterns — a pretty foolproof way to raise your IQ is to practice taking IQ tests, which is essentially what every [model] has done,” Cook notes. “When I learn something, I don’t get it piped into my brain with perfect clarity 1 million times, unlike AI, and I can’t process it with no noise or signal loss, either.”
In essence, AI models have a built-in advantage on IQ tests. Their ability to memorize and instantly retrieve patterns from vast datasets allows them to excel in ways that humans simply cannot. But does that mean they are more intelligent? Hardly.
AI’s Unique Problem-Solving Approach
Unlike humans, AI systems approach problem-solving in a highly structured, deterministic manner. While a person solving a math problem might be distracted by external factors—such as hunger, fatigue, or background noise—AI operates in a controlled environment, free from these distractions. This makes direct comparisons between human and AI intelligence inherently flawed.
Cook illustrates this difference with an analogy:
“A crow might be able to use a tool to recover a treat from a box, but that doesn’t mean it can enroll at Harvard,” he says. “When I solve a mathematics problem, my brain is also contending with its ability to read the words on the page correctly, to not think about the shopping I need to do on the way home, or if it’s too cold in the room right now. In other words, human brains contend with a lot more things when they solve a problem — any problem at all, IQ tests or otherwise — and they do it with a lot less help [than AI].”
This highlights a crucial point: AI does not “think” in the way that humans do. Instead, it follows statistical patterns and computational processes that allow it to generate responses based on learned data. While this makes AI incredibly powerful in certain applications, it does not equate to human intelligence.
The Need for Better AI Benchmarks
Given these fundamental differences, it is clear that IQ tests are not a meaningful measure of AI capability. Instead, researchers should focus on developing more appropriate benchmarks tailored to AI’s unique strengths and limitations.
Heidy Khlaaf, chief AI scientist at the AI Now Institute, emphasizes the need for a paradigm shift in how we evaluate AI progress:
“In the history of computation, we haven’t compared computing abilities to that of humans’ precisely because the nature of computation means systems have always been able to complete tasks already beyond human ability,” Khlaaf states. “This idea that we directly compare systems’ performance against human abilities is a recent phenomenon that is highly contested, and what surrounds the controversy of the ever-expanding — and moving — benchmarks being created to evaluate AI systems.”
Rather than measuring AI’s “IQ,” experts suggest designing tests that assess AI’s real-world problem-solving abilities, adaptability, and ethical decision-making. These benchmarks should reflect AI’s true capabilities while avoiding misleading comparisons to human intelligence.
Conclusion: Rethinking AI Intelligence Metrics
As AI continues to evolve, it is crucial to adopt more precise and relevant ways of evaluating its progress. IQ tests, designed for humans and laden with historical and cultural biases, are an inappropriate measure for AI capabilities. AI does not think, reason, or problem-solve like humans—it follows an entirely different set of rules and methodologies.
Instead of forcing AI into human-centric intelligence frameworks, we should develop new benchmarks that truly capture its unique strengths and weaknesses. Only then can we accurately measure AI’s impact on society and ensure that its development aligns with ethical and beneficial goals.