OpenAI has begun the rollout of ChatGPT’s Advanced Voice Mode, marking a significant leap in AI interaction. On Tuesday, the company released the alpha version of this new feature to a select group of ChatGPT Plus users, with a broader rollout expected in the fall of 2024.
The initial showcase of GPT-4o’s voice in May took the tech world by storm. Audiences were astounded by the speed and uncanny human likeness of the responses, particularly the voice named Sky, which bore a striking resemblance to actress Scarlett Johansson. However, Johansson quickly distanced herself from the project, hiring legal counsel to protect her likeness after declining several requests from OpenAI CEO Sam Altman to use her voice. OpenAI has since denied using Johansson’s voice but removed the Sky voice from its demo following the controversy. The release of Advanced Voice Mode was subsequently delayed in June to enhance safety measures.
Limited Launch with Advanced Features
Fast forward to today, OpenAI has resumed the release, albeit in a limited capacity. Not all features demonstrated during the Spring Update, such as video and screensharing capabilities, will be included in this alpha release. For now, a select group of premium users will get their first taste of the Advanced Voice Mode.
A New Era in Conversational AI
OpenAI claims that Advanced Voice Mode represents a significant advancement over the current Voice Mode available in ChatGPT. The older version utilized three separate models for voice-to-text conversion, text processing via GPT-4, and text-to-voice conversion. In contrast, GPT-4o’s multimodal capabilities streamline these processes into a single, cohesive model, resulting in significantly lower latency and more natural, fluid conversations. Additionally, GPT-4o can detect and respond to emotional nuances in users’ voices, such as sadness, excitement, or even singing.
Hands-On Experience for ChatGPT Plus Users
During this pilot phase, ChatGPT Plus users will be the first to experience the hyperrealistic capabilities of Advanced Voice Mode. Although TechCrunch has yet to test the feature, they have pledged to provide a comprehensive review once they gain access.
OpenAI is taking a cautious approach by gradually releasing the new voice feature and closely monitoring its usage. Selected users will receive notifications within the ChatGPT app, followed by an email with detailed instructions on how to utilize the new feature.
Rigorous Testing and Safety Measures
Since the May demo, OpenAI has subjected GPT-4o’s voice capabilities to extensive testing, involving over 100 external red teamers across 45 different languages. The company has promised a detailed report on these safety measures in early August.
To mitigate the risk of misuse, such as deepfake controversies, Advanced Voice Mode will be limited to four preset voices—Juniper, Breeze, Cove, and Ember—created in collaboration with professional voice actors. The controversial Sky voice will not be available. OpenAI spokesperson Lindsay McCallum emphasized that ChatGPT is designed to prevent impersonation of individuals, both private and public figures, by restricting outputs to these preset voices.
Navigating Legal and Ethical Challenges
In a proactive move to avoid deepfake scandals, OpenAI has introduced filters to block requests for generating music or other copyrighted audio. This comes in the wake of several legal challenges faced by AI companies over copyright infringement. Audio models like GPT-4o present a new frontier for potential legal disputes, particularly from record labels known for their litigious nature. AI song-generators like Suno and Udio have already faced lawsuits, highlighting the importance of these preventive measures.
Looking Ahead
As OpenAI continues to refine and expand ChatGPT’s capabilities, the gradual introduction of hyperrealistic voice features marks a pivotal moment in the evolution of AI interactions. While the initial release is limited, it offers a glimpse into the future of seamless, emotionally intelligent conversations with AI.
Stay tuned as OpenAI rolls out these groundbreaking features and sets new standards for AI communication.