11h ago
StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning
★★★★★
significance 3/5
Researchers introduce StoryTR, a new benchmark for video moment retrieval that requires Theory of Mind (ToM) reasoning to understand narrative intent. The paper also presents a 7B 'Shorts-Moment' model trained via an agentic data pipeline to better decode subtle multimodal cues in short-form video.
Why it matters
Bridging the gap between visual pattern recognition and cognitive-level narrative intent marks a critical step toward truly agentic video understanding.
Tags
#video retrieval #theory of mind #multimodal #narrative reasoning #benchmarkRelated coverage
- Global South OpportunitiesPivotal Research Fellowship 2026 (Q3): AI Safety Research Opportunity - Global South Opportunities
- arXiv cs.AIAn Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement
- arXiv cs.AIPExA: Parallel Exploration Agent for Complex Text-to-SQL
- arXiv cs.AIThe Power of Power Law: Asymmetry Enables Compositional Reasoning
- arXiv cs.AIOn the Existence of an Inverse Solution for Preference-Based Reductions in Argumentation