OpenAI co-founder warns of dystopian future as AI safety tests reveal troubling flaws

OpenAI and Anthropic have published a joint research highlighting troubling flaws in today’s most advanced chatbots. The findings come as AI models become more powerful and widely used, raising questions about whether companies are prioritising safety over speed. The joint research, first reported by TechCrunch, and it saw both labs grant each other special access to stripped-down versions of their models to test safety issues. The goal was to uncover blind spots in internal evaluations and explore how rivals might work together on alignment and safety in the future.

OpenAI co-founder Wojciech Zaremba told TechCrunch in an interview that such cooperation was necessary at a “consequential” moment in AI’s development. “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

One of the starkest findings relates to hallucinations, when AI systems confidently provide false or misleading answers. Anthropic’s Claude Opus 4 and Sonnet 4 refused to answer up to 70 per cent of questions when uncertain, often replying: “I don’t have reliable information.” OpenAI’s o3 and o4-mini models, in contrast, attempted answers more often but showed far higher hallucination rates. Zaremba suggested the right balance lay somewhere in between.

The study also flagged “sycophancy”, which is basically the tendency of chatbots to validate harmful or irrational behaviour to please users. Anthropic researchers noted “extreme” sycophancy in both GPT-4.1 and Claude Opus 4, where the systems initially resisted but eventually reinforced troubling behaviour. Other models showed lower levels, but the concern remains pressing.

The risks of sycophancy were also underscored this week by the case of 16-year-old Adam Raine. His parents have filed a lawsuit in San Francisco, alleging that ChatGPT, powered by GPT-4o, validated Adam’s suicidal thoughts, gave detailed instructions on self-harm, and even drafted a suicide note. Adam died by suicide on April 11.

“It’s hard to imagine how difficult this is to their family,” Zaremba told TechCrunch. “It would be a sad story if we build AI that solves all these complex PhD-level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it. This is a dystopian future that I’m not excited about.”

OpenAI has said GPT-5 includes improvements in handling sensitive topics, including mental health. In a recent blog post, the company acknowledged that safeguards work best in short conversations and sometimes fail during longer exchanges. It has promised parental controls, stronger intervention features, and possible links to licensed therapists in future.

Both Zaremba and Anthropic researcher Nicholas Carlini said they hoped cross-lab collaboration would continue. “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly,” Carlini said.

– Ends

Published By:

Nandini Yadav

Published On:

Aug 28, 2025

Tune In

Source link

Israel, Hamas agree to first phase of Gaza peace plan, release hostages: Trump

Failed assassin of Argentina’s former president given 10-year prison term

Bangladesh calls Indian foreign secretary’s remark on elections ‘unwarranted’

IPS officer ‘suicide’ case: Wife names Haryana DGP in complaint; claims husband faced ‘years of systematic humiliation’ | India News – The Times of...

Senior Army officers will also now face fitness test | India News – The Times of India

‘Jeopardy!’ Contestant Loses Thriller After Fatal Blunder – Fans React

Diddy Approved for Drug Rehab Program in Prison: Could It Lead to a Shorter Sentence?

Will ‘Peacemaker’ Return for Season 3?

The Morning Show Recap: Rickrolled and Reunited — Plus, Marion Cotillard Weighs in on Celine’s Big Shock

Derrick Groves arrested: Final escaped Louisiana inmate captured after 5 months; found hiding in ‘crawl space’ – The Times of India

‘No idea’: Trump downplays Nobel peace prize chances; says ‘they’ll find a reason not to give it to me’ – The Times of India

‘Don’t want this on camera’: California gubernatorial hopeful Katie Porter storms out of interview; meltdown after Trump question – The Times of India

‘Very close’: Donald Trump says ‘good chance’ for Gaza peace deal; may visit Middle East by end of week – The Times of India

James Comey trial: Ex-FBI director pleads not guilty in court; calls it ‘vindictive, selective prosecution’ – The Times of India

‘Tonight Show’ Set to Air Special Episode Featuring Extended Cut of Taylor Swift ‘Showgirl’ Appearance

Travis Kelce admits he’s ‘terrified’ for dad to listen to Taylor Swift’s provocative ‘Wood’ song

Andrew Garfield Reacts to the Internet’s Response to His Viral Glasses Look on Golden Globes Stage

Brooklyn Beckham makes fleeting appearance in ‘Victoria Beckham’ doc amid family feud

Gene Simmon’s SUV Crash Explained: Wife, Shannon Reveals What Caused The Frightening Accident

That’s It for October’s Prime Day—Shop the Best Final Fashion Deals Before Midnight

It’s the Last Day! Shop Our Favorite Prime Day Sale Picks

Inside the Raucous Opening Night of Dylan Mulvaney’s New Play, ‘The Least Problematic Woman in the World’

Van Cleef & Arpels Celebrates 100 Years of Art Deco With Tokyo Exhibition

Castor Oil Stomach Wraps Helped My Bloating More Than Any Supplement