Apr 19
Banned by Anthropic?
★★★★★
significance 2/5
The article discusses the website Banned by Anthropic, which tracks instances where Anthropic's Claude AI models have refused to answer prompts. It serves as a repository for documenting perceived censorship or overly restrictive safety guardrails in the model.
Why it matters
Documenting refusal patterns exposes the tension between safety guardrails and model utility, highlighting the ongoing struggle over AI alignment and censorship thresholds.
Entities mentioned
AnthropicTags
#anthropic #claude #ai censorship #safety guardrailsRelated coverage
- arXiv cs.AIPhySE: A Psychological Framework for Real-Time AR-LLM Social Engineering Attacks
- arXiv cs.AIUlterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
- arXiv cs.AIAgentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
- arXiv cs.AIWhen AI reviews science: Can we trust the referee?
- arXiv cs.AIStructural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture