In recent discussions on AI, a new study has shed light on the actual dangers of generative AI, showing that while it may not be apocalyptic, it is far from harmless.
A paper submitted to the Association for Computational Linguistics’ annual conference by researchers from the University of Bath and the University of Darmstadt challenges the notion that models like those in Meta’s Llama family are capable of learning independently or acquiring new skills without direct human intervention. Through thousands of experiments, the researchers tested the ability of several AI models to perform tasks outside their training data’s scope, such as answering questions on unfamiliar topics. The findings revealed that while these models could superficially follow instructions, they lacked the ability to truly master new skills independently.
“Our study demonstrates that the fear of AI models going rogue, doing something completely unexpected, innovative, and potentially dangerous is not grounded in reality,” said Harish Tayyar Madabushi, a computer scientist at the University of Bath and a co-author of the study. “The prevailing narrative that this type of AI poses a threat to humanity hinders the widespread adoption and development of these technologies and diverts attention from the genuine issues that require our focus.”
However, it’s important to note the study’s limitations. The researchers did not test the latest and most advanced models from vendors like OpenAI and Anthropic, and benchmarking models remains an imprecise science. Yet, this research is not the first to conclude that today’s generative AI technology isn’t a threat to humanity’s survival. Assuming otherwise could lead to misguided policymaking.
In a 2023 op-ed in Scientific American, AI ethicist Alex Hanna and linguistics professor Emily Bender argued that corporate AI labs are intentionally diverting regulatory attention to far-fetched, world-ending scenarios as a bureaucratic ploy. They cited OpenAI CEO Sam Altman’s appearance at a May 2023 congressional hearing, where he suggested — without substantial evidence — that generative AI tools could lead to catastrophic outcomes.
“The broader public and regulatory agencies must not fall for this tactic,” Hanna and Bender wrote. “Instead, we should heed the insights of scholars and activists who practice peer review and have consistently challenged AI hype, aiming to understand its detrimental effects in the present.”
These perspectives from Madabushi, Hanna, and Bender are crucial as investors continue pouring billions into generative AI, driving the hype cycle to its peak. There is a lot at stake for the companies backing this technology, and what benefits them — and their investors — may not align with the greater good.
Generative AI might not lead to humanity’s extinction, but it’s already causing harm in various ways. Consider the proliferation of nonconsensual deepfake pornography, wrongful arrests due to faulty facial recognition technology, and the exploitation of underpaid data annotators. Policymakers must recognize these realities and respond accordingly — or risk exposing humanity to significant, albeit non-apocalyptic, dangers.
In the News
Google Gemini and AI Galore: This week, Google’s annual Made By Google hardware event unveiled a slew of updates to its Gemini assistant, along with new phones, earbuds, and smartwatches. For comprehensive coverage, check out TechCrunch’s roundup of all the announcements.
AI Copyright Lawsuit Moves Forward: A class action lawsuit filed by artists alleging that Stability AI, Runway AI, and DeviantArt illegally trained their AIs on copyrighted works received a mixed ruling from the presiding judge. While several of the plaintiffs’ claims were dismissed, others survived, meaning the case could potentially go to trial.
Trouble for X and Grok: X, the social media platform owned by Elon Musk, is facing a series of privacy complaints after it was found to be using the data of European Union users to train AI models without their consent. In response, X has temporarily halted data processing for training Grok in the EU.
YouTube Tests Gemini Brainstorming: YouTube is experimenting with a new feature, Brainstorm with Gemini, to assist creators in generating video ideas, titles, and thumbnails. Currently, this feature is available only to select creators as part of a limited experiment.
OpenAI’s GPT-4o Does Strange Things: OpenAI’s GPT-4o is the company’s first model trained on voice, text, and image data. This combination sometimes leads to unexpected behavior, such as mimicking the voice of the person speaking to it or suddenly shouting during a conversation.
Research Paper of the Week
Many companies claim to offer tools that can reliably detect text generated by AI models, which could be invaluable in combating misinformation and plagiarism. However, when tested, these tools rarely deliver as promised. A new study indicates that the situation hasn’t improved much.
Researchers at the University of Pennsylvania developed a dataset and leaderboard, the Robust AI Detector (RAID), consisting of over 10 million AI-generated and human-written pieces, including recipes, news articles, and blog posts, to assess the performance of AI text detectors. The results were disappointing, with the detectors being described as “mostly useless” by the researchers. They found that these tools only worked when applied to specific, narrowly defined use cases and text similar to what they were trained on.
“If universities or schools rely on a narrowly trained detector to identify students using [generative AI] to complete assignments, they could falsely accuse innocent students of cheating,” warned Chris Callison-Burch, a professor in computer and information science and co-author of the study. “Conversely, they could miss students who were actually cheating using other [generative AI] tools.”
The conclusion is clear: there’s no silver bullet for AI text detection. The problem remains a challenging one.
Reportedly, OpenAI has developed a new AI text detection tool for its models — an improvement over its previous attempt. However, the company has declined to release it, citing concerns that it might disproportionately impact non-English users and could be easily circumvented by slight text modifications. (Less altruistically, OpenAI is also reportedly worried about how an AI text detector might affect the perception and usage of its products.)
Model of the Week
Generative AI has applications beyond just internet memes. MIT researchers are now applying it to identify issues in complex systems, such as wind turbines.
A team at MIT’s Computer Science and Artificial Intelligence Lab has developed a framework called SigLLM, which includes a component that converts time-series data — measurements taken repeatedly over time — into text inputs that a generative AI model can process. Users can feed these prepared data into the model and ask it to identify anomalies. The model can also forecast future time-series data points as part of an anomaly detection pipeline.
While the framework didn’t achieve exceptional results in initial experiments, the researchers are optimistic about its potential. If performance can be improved, SigLLM could help technicians identify potential problems in heavy machinery and other equipment before they occur.
“This is just the first iteration, so we didn’t expect to achieve perfection from the start,” said Sarah Alnegheimish, an electrical engineering and computer science graduate student and lead author on a paper about SigLLM. “But these results show that there’s real potential to leverage [generative AI models] for complex anomaly detection tasks.”
Grab Bag
OpenAI recently upgraded ChatGPT to a new base model, GPT-4o, but released minimal information about the changes.
there’s a new GPT-4o model out in ChatGPT since last week. hope you all are enjoying it and check it out if you haven’t! we think you’ll like it 😃
— ChatGPT (@ChatGPTapp) August 12, 2024
What should we make of this? With no changelog to consult, the answer is unclear. Anecdotal evidence from user tests offers some clues, but the lack of transparency is troubling.
Ethan Mollick, a professor at Wharton who studies AI, innovation, and startups, may have captured the sentiment best. He noted that it’s difficult to write release notes for generative AI models because they can “feel” different in one interaction to the next; their performance is largely based on subjective experience. Yet, people use — and pay for — ChatGPT. Don’t they deserve to know what they’re getting?
Perhaps the improvements are incremental, and OpenAI prefers not to tip its hand for competitive reasons. A more intriguing possibility is that the model relates to OpenAI’s rumored reasoning breakthroughs. Regardless, when it comes to AI, transparency should be a priority. Trust is hard to build and easy to lose — and OpenAI has already lost plenty.