Hacker News (AI filter) AI Safety Apr 19

Banned by Anthropic?

★★★★★ significance 2/5

The article discusses the website Banned by Anthropic, which tracks instances where Anthropic's Claude AI models have refused to answer prompts. It serves as a repository for documenting perceived censorship or overly restrictive safety guardrails in the model.

Why it matters Documenting refusal patterns exposes the tension between safety guardrails and model utility, highlighting the ongoing struggle over AI alignment and censorship thresholds.

Read the original at Hacker News (AI filter)

Entities mentioned

Anthropic

Related coverage

arXiv cs.AIPhySE: A Psychological Framework for Real-Time AR-LLM Social Engineering Attacks
arXiv cs.AIUlterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
arXiv cs.AIAgentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
arXiv cs.AIWhen AI reviews science: Can we trust the referee?
arXiv cs.AIStructural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Banned by Anthropic?

Entities mentioned

Tags

Related coverage