GPT-4o

2026-04-27

When Cow Urine Cures Constipation on YouTube: Limits of LLMs in Detecting Culture-specific Health Misinformation

arXiv cs.CL research ★★★★★

Researchers investigated the inability of major LLMs to accurately detect culture-specific health misinformation, using Indian traditional medicine discourse as a case study. The study found that LLMs trained on Western-centric data struggle to analyze nuanced, culturally embedded misinformation-driven rhetoric.

2026-04-24

EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

arXiv cs.CL research ★★★★★

Researchers introduce EngramaBench, a new benchmark designed to evaluate how large language models manage long-term conversational memory. The study compares different memory architectures, including graph-structured systems and vector-retrieval, against full-context prompting.

2026-04-20

Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

arXiv cs.AI research ★★★★★

Researchers investigated using Vision-Language Models like GPT-4o and Gemini-1.5-Flash to automate the generation of crash diagrams from police reports. The study evaluated model performance in translating text-based accident descriptions into spatial visualizations, specifically for complex multi-lane roundabouts.

Coverage