Apr 22
Mechanistic Anomaly Detection via Functional Attribution
★★★★★
significance 3/5
The paper introduces a new method for mechanistic anomaly detection by framing it as a functional attribution problem using influence functions. This approach effectively detects backdoors, adversarial attacks, and out-of-distribution samples across both vision models and LLMs.
Why it matters
Functional attribution provides a scalable pathway for identifying latent vulnerabilities and backdoors in increasingly complex, large-scale model architectures.
Tags
#anomaly detection #influence functions #backdoor detection #llm securityRelated coverage
- Global South OpportunitiesPivotal Research Fellowship 2026 (Q3): AI Safety Research Opportunity - Global South Opportunities
- arXiv cs.AIAn Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement
- arXiv cs.AIPExA: Parallel Exploration Agent for Complex Text-to-SQL
- arXiv cs.AIThe Power of Power Law: Asymmetry Enables Compositional Reasoning
- arXiv cs.AIOn the Existence of an Inverse Solution for Preference-Based Reductions in Argumentation