GMM | Yuxuan Tang

Paper: ViG-RAG: Video-aware Graph Retrieval-Augmented Generation via Temporal and Semantic Hybrid Reasoning PDF: AAAI Proceedings PDF Code: AI-Researcher-Team/ViG-RAG Background Long-video RAG is harder than text RAG because video evidence is not just a list of documents. Useful information may be distributed across: visual scenes; speech transcripts; entities and events; temporal order; uncertain or noisy observations. If we simply split the video into independent chunks and retrieve by static text similarity, two problems appear: ...