More
    HomeHomeOpenAI co-founder warns of dystopian future as AI safety tests reveal troubling...

    OpenAI co-founder warns of dystopian future as AI safety tests reveal troubling flaws

    Published on

    spot_img


    OpenAI and Anthropic have published a joint research highlighting troubling flaws in today’s most advanced chatbots. The findings come as AI models become more powerful and widely used, raising questions about whether companies are prioritising safety over speed. The joint research, first reported by TechCrunch, and it saw both labs grant each other special access to stripped-down versions of their models to test safety issues. The goal was to uncover blind spots in internal evaluations and explore how rivals might work together on alignment and safety in the future.

    OpenAI co-founder Wojciech Zaremba told TechCrunch in an interview that such cooperation was necessary at a “consequential” moment in AI’s development. “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

    One of the starkest findings relates to hallucinations, when AI systems confidently provide false or misleading answers. Anthropic’s Claude Opus 4 and Sonnet 4 refused to answer up to 70 per cent of questions when uncertain, often replying: “I don’t have reliable information.” OpenAI’s o3 and o4-mini models, in contrast, attempted answers more often but showed far higher hallucination rates. Zaremba suggested the right balance lay somewhere in between.

    The study also flagged “sycophancy”, which is basically the tendency of chatbots to validate harmful or irrational behaviour to please users. Anthropic researchers noted “extreme” sycophancy in both GPT-4.1 and Claude Opus 4, where the systems initially resisted but eventually reinforced troubling behaviour. Other models showed lower levels, but the concern remains pressing.

    The risks of sycophancy were also underscored this week by the case of 16-year-old Adam Raine. His parents have filed a lawsuit in San Francisco, alleging that ChatGPT, powered by GPT-4o, validated Adam’s suicidal thoughts, gave detailed instructions on self-harm, and even drafted a suicide note. Adam died by suicide on April 11.

    “It’s hard to imagine how difficult this is to their family,” Zaremba told TechCrunch. “It would be a sad story if we build AI that solves all these complex PhD-level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it. This is a dystopian future that I’m not excited about.”

    OpenAI has said GPT-5 includes improvements in handling sensitive topics, including mental health. In a recent blog post, the company acknowledged that safeguards work best in short conversations and sometimes fail during longer exchanges. It has promised parental controls, stronger intervention features, and possible links to licensed therapists in future.

    Both Zaremba and Anthropic researcher Nicholas Carlini said they hoped cross-lab collaboration would continue. “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly,” Carlini said.

    – Ends

    Published By:

    Nandini Yadav

    Published On:

    Aug 28, 2025

    Tune In



    Source link

    Latest articles

    5 horror films on gothic supernatural elements

    horror films on gothic supernatural elements Source link

    6 upcoming OTT releases in September

    upcoming OTT releases in September Source link

    More like this

    5 horror films on gothic supernatural elements

    horror films on gothic supernatural elements Source link

    6 upcoming OTT releases in September

    upcoming OTT releases in September Source link