OpenAI has recently unveiled the GPT-4o System Card, a comprehensive research document that sheds light on the safety measures and risk evaluations the company undertook before launching its latest AI model. The document, which was made public this week, provides an in-depth look at the steps OpenAI has taken to ensure that GPT-4o operates within acceptable safety margins.
Launched in May this year, GPT-4o is the latest in a series of generative AI models from OpenAI. But before this powerful model made its public debut, OpenAI employed a specialized team of external security experts, known as red teamers, to rigorously test the model’s vulnerabilities. These experts sought to uncover potential risks such as the model’s ability to generate unauthorized voice clones, produce inappropriate content, or reproduce copyrighted audio without permission. The findings of these evaluations are now being shared publicly, providing a rare glimpse into the meticulous process behind AI model safety testing.
According to OpenAI’s internal risk assessment framework, the GPT-4o model has been classified as having a “medium” risk level. This overall risk rating was derived from evaluations across four key categories: cybersecurity, biological threats, persuasion, and model autonomy. While three of these categories were deemed low risk, the persuasion category raised some concerns. Researchers discovered that certain text samples generated by GPT-4o were more persuasive than human-written content in specific instances, though the model’s overall persuasiveness was not deemed superior to that of human writers.
Lindsay McCallum Rémy, an OpenAI spokesperson, explained to The Verge that the System Card not only includes internal evaluations but also incorporates assessments from external testers. These external evaluations were conducted by teams listed on OpenAI’s website, such as Model Evaluation and Threat Research (METR) and Apollo Research, both of which specialize in building risk assessments for AI systems.
This isn’t the first time OpenAI has published a system card; previous models like GPT-4, GPT-4 with vision, and DALL-E 3 underwent similar scrutiny before their release. However, the timing of this release is particularly noteworthy. OpenAI has been under intense scrutiny for its safety standards, with criticisms coming from various quarters, including its own employees and state legislators. Just minutes before the release of the GPT-4o System Card, The Verge reported on an open letter from Sen. Elizabeth Warren (D-MA) and Rep. Lori Trahan (D-MA), which questioned OpenAI’s handling of whistleblower complaints and safety reviews. The letter highlights numerous safety concerns, including the brief ousting of CEO Sam Altman in 2023 due to the board’s safety concerns and the resignation of a key safety executive who claimed that “safety culture and processes have taken a backseat to shiny products.”
The release of the GPT-4o System Card also comes at a critical juncture, with the 2024 U.S. presidential election on the horizon. As OpenAI rolls out this highly capable multimodal model, there are concerns about the potential for it to inadvertently spread misinformation or be exploited by malicious actors. OpenAI is keen to emphasize that the company is actively testing real-world scenarios to prevent such misuse, but the risks are still very much in the spotlight.
There has been growing pressure on OpenAI to be more transparent, not just about the model’s training data—questions like “Is it trained on YouTube?” persist—but also about its safety testing protocols. In California, where OpenAI and many other leading AI labs are headquartered, state Sen. Scott Wiener is pushing for legislation to regulate large language models. The proposed bill would hold companies legally accountable if their AI is used in harmful ways. If passed, this legislation would require OpenAI’s frontier models to undergo state-mandated risk assessments before being released to the public.
The key takeaway from the GPT-4o System Card is that, despite involving external red teamers and testers, a significant portion of the evaluation process relies on OpenAI’s self-assessment. As the AI landscape continues to evolve rapidly, the balance between innovation and safety remains a critical concern.