By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Times CatalogTimes CatalogTimes Catalog
  • Home
  • Tech
    • Google
    • Microsoft
    • YouTube
    • Twitter
  • News
  • How To
  • Bookmarks
Search
Technology
  • Meta
Others
  • Apple
  • WhatsApp
  • Elon Musk
  • Threads
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Reading: OpenAI trained o1 and o3 to ‘think’ about its safety policy
Share
Notification
Font ResizerAa
Font ResizerAa
Times CatalogTimes Catalog
Search
  • News
  • How To
  • Tech
    • AI
    • Apple
    • Microsoft
    • Google
    • ChatGPT
    • Gemini
    • YouTube
    • Twitter
  • Coming Soon
Follow US
  • About
  • Contact
  • Privacy Policy and Disclaimer
© 2025 Times Catalog
Times Catalog > Blog > Tech > AI > OpenAI trained o1 and o3 to ‘think’ about its safety policy
AITech

OpenAI trained o1 and o3 to ‘think’ about its safety policy

Usama
Last updated: December 23, 2024 2:53 pm
Usama
Share
7 Min Read
OpenAI trained o1 and o3 to ‘think’ about its safety policy
SHARE

On Friday, OpenAI unveiled its latest advancement in AI reasoning: the o3 model, a member of a new family of models that outpaces its predecessors, including o1, and represents the most advanced release from the company to date. These advancements stem from scaling test-time compute—a concept we explored recently—but that’s not all. OpenAI also introduced a groundbreaking safety paradigm to train its o-series models, making them safer and more aligned with human values.

Contents
What Is Deliberative Alignment?How Deliberative Alignment WorksSynthetic Data and Scalable AlignmentReal-World Impact and ChallengesLooking Ahead

At the heart of this new approach is “deliberative alignment,” a method that helps AI models “think” about OpenAI’s safety policy while generating responses. This technique trains models like o1 and o3 to recall and apply the company’s safety principles during inference—the critical phase after a user submits a prompt.

OpenAI trained o1 and o3 to ‘think’ about its safety policy
Graph measuring o1’s improved alignment compared to Claude, Gemini, and GPT-4o (Image Credit: OpenAI)

What Is Deliberative Alignment?

Deliberative alignment marks a significant shift in AI safety research. Traditionally, AI safety efforts focus on pre-training or post-training phases, but OpenAI’s method integrates safety checks into the inference phase. The goal is to ensure that the AI not only generates safe responses but also does so in real-time by reasoning through OpenAI’s safety policy during the process.

This method has already yielded notable results. According to OpenAI’s research, deliberative alignment reduced the frequency of unsafe responses from o1 while simultaneously enhancing its ability to tackle benign queries. For instance, in a test scenario where a user requested guidance on forging a disabled person’s parking placard, the model identified the unethical nature of the request, cited OpenAI’s safety policy, and refused to comply.

OpenAI’s researchers describe this process as akin to how humans deliberate before answering complex questions. The o-series models, however, don’t actually “think” as humans do. Instead, they employ a sophisticated technique called “chain-of-thought prompting,” which involves breaking down a problem into smaller steps and reasoning through them to arrive at an answer. By incorporating deliberative alignment into this process, the models effectively “remind themselves” of OpenAI’s safety guidelines while formulating responses.

How Deliberative Alignment Works

Here’s how the o-series models operate: After a user submits a prompt, the model generates follow-up questions internally to dissect the problem, a process that can take several seconds to a few minutes depending on the complexity of the query. During this “chain-of-thought” phase, deliberative alignment kicks in. The model re-prompts itself with text from OpenAI’s safety policy, allowing it to deliberate on how best to provide a safe and appropriate answer.

For example, if a user asks the model how to create a bomb, deliberative alignment prompts the model to recall OpenAI’s policy on harmful content. The model then deliberates over the prompt and declines to provide an answer, often with an apologetic and informative response explaining why it cannot assist.

OpenAI trained o1 and o3 to ‘think’ about its safety policy
Example from OpenAI’s research on deliberative alignment (image credit: openAI)

This novel approach addresses one of AI safety’s most significant challenges: balancing refusal of unsafe prompts with the ability to answer legitimate questions. Over-refusal, where an AI model excessively restricts responses, is just as problematic as under-refusal, where it fails to filter harmful queries. Deliberative alignment seeks to strike the perfect balance.

Synthetic Data and Scalable Alignment

One of the most innovative aspects of deliberative alignment is its reliance on synthetic data. Rather than employing human annotators to create labeled training examples—a costly and time-intensive process—OpenAI used internal AI reasoning models to generate training data. These models created examples of chain-of-thought responses that referenced OpenAI’s safety policy, while another internal model, dubbed “judge,” assessed the quality of these examples.

This data was then used to fine-tune o1 and o3 through supervised learning, teaching them to integrate safety policy references into their reasoning processes. By automating data generation, OpenAI was able to scale its alignment efforts more efficiently without sacrificing quality.

OpenAI trained o1 and o3 to ‘think’ about its safety policy
Template OpenAI gave its internal reasoning model to generate synthetic data (image credit: OpenAI)

The “judge” model also played a role in reinforcement learning, another post-training phase that optimized the o-series models’ responses further. While these methods aren’t new, their application with synthetic data represents a significant step toward scalable and cost-effective AI alignment.

Real-World Impact and Challenges

The introduction of deliberative alignment has made the o-series models some of OpenAI’s safest to date. On the Pareto benchmark—a test designed to measure resistance against common jailbreaks like asking the model to act as a deceased relative with questionable expertise—o1-preview outperformed competitors such as GPT-4o, Gemini 1.5 Flash, and Claude 3.5 Sonnet.

Yet, challenges remain. AI safety is an inherently subjective field, with critics like David Sacks, Elon Musk, and Marc Andreessen arguing that some safety measures amount to censorship. OpenAI, however, maintains that its goal is to prevent harm while enabling productive and creative use of its models.

One of the biggest hurdles is accounting for the vast number of ways users can phrase unsafe requests. OpenAI’s safeguards must be robust enough to detect malicious intent without overgeneralizing and blocking legitimate queries. For instance, blocking all prompts containing the word “bomb” would prevent users from asking historical or scientific questions about the atom bomb.

Looking Ahead

Deliberative alignment is a promising step forward in AI safety, but its true potential will only become evident once o3 is publicly available, a milestone expected in 2025. By training models to deliberate over safety policies during inference, OpenAI has created a framework that could help future AI systems align more closely with human values.

As AI continues to grow more powerful and autonomous, these safety measures will become increasingly critical. OpenAI’s approach—integrating safety into the reasoning process itself—might set a new standard for how we build and deploy responsible AI.

Stay tuned for the rollout of o3 and its anticipated impact on the evolving landscape of AI safety and alignment.

You Might Also Like

Logitech’s MX Creative Console now supports Figma and Adobe Lightroom

Samsung resumes its troubled One UI 7 rollout

Google Messages starts rolling out sensitive content warnings for nude images

Vivo wants its new smartphone to replace your camera

Uber users can now earn miles with Delta Air Lines

Share This Article
Facebook Twitter Pinterest Whatsapp Whatsapp Copy Link
What do you think?
Love0
Happy0
Sad0
Sleepy0
Angry0
Previous Article OnePlus will launch its new flagship, the OnePlus 13 series, on January 7 OnePlus will launch its new flagship, the OnePlus 13 series, on January 7
Next Article X hikes ad-free Premium+ subscription price from $16 to $22 X hikes ad-free Premium+ subscription price from $16 to $22
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

144FollowersLike
23FollowersFollow
237FollowersPin
19FollowersFollow

Latest News

Pinterest is prompting teens to close the app at school
Pinterest is prompting teens to close the app at school
News Tech April 22, 2025
ChatGPT search is growing quickly in Europe, OpenAI data suggests
ChatGPT search is growing quickly in Europe, OpenAI data suggests
AI ChatGPT OpenAI April 22, 2025
social-media-is-not-wholly-terrible-for-teen-mental-health-study-says
Social media is not wholly terrible for teen mental health, study says
News April 22, 2025
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
Google is trying to get college students hooked on AI with a free year of Gemini Advanced
AI Gemini Google Tech April 19, 2025
Times CatalogTimes Catalog
Follow US
© 2025 Times Catalog
  • About
  • Contact
  • Privacy Policy and Disclaimer
Welcome Back!

Sign in to your account

Lost your password?