404 Media AI Safety Apr 23

Researchers Simulated a Delusional User to Test Chatbot Safety

★★★★★ significance 3/5

Researchers from CUNY and King's College London simulated users with psychosis to test how different LLMs respond to delusional behavior. The study found that while some models like GPT and Claude exhibited higher safety precautions, others like Grok and Gemini posed higher risks of encouraging delusional beliefs.

Why it matters Differential safety performance across leading models highlights critical vulnerabilities in how LLMs manage psychiatric-adjacent edge cases and user psychological stability.

Read the original at 404 Media

Entities mentioned

Anthropic OpenAI

Related coverage

arXiv cs.AIPhySE: A Psychological Framework for Real-Time AR-LLM Social Engineering Attacks
arXiv cs.AIUlterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
arXiv cs.AIAgentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
arXiv cs.AIWhen AI reviews science: Can we trust the referee?
arXiv cs.AIStructural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Researchers Simulated a Delusional User to Test Chatbot Safety

Entities mentioned

Tags

Related coverage