ViG-RAG

Paper: ViG-RAG: Video-aware Graph Retrieval-Augmented Generation via Temporal and Semantic Hybrid Reasoning PDF: AAAI Proceedings PDF Code: AI-Researcher-Team/ViG-RAG Background Long-video RAG is harder than text RAG because video evidence is not just a list of documents. Useful information may be distributed across: visual scenes; speech transcripts; entities and events; temporal order; uncertain or noisy observations. If we simply split the video into independent chunks and retrieve by static text similarity, two problems appear: ...

May 9, 2026 · 13 min

AdaVideoRAG

Paper: AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding Code: xzc-zju/AdaVideoRAG Background Long-video understanding is hard because the useful evidence is sparse, long-range, and often spread across multiple modalities: visual content; speech; scene text; temporal relations. RAG is a natural fit here. Instead of feeding the whole video to the MLLM every time, the system can first build a searchable memory, retrieve relevant evidence, and then answer with a smaller context. But a fixed VideoRAG pipeline is not ideal. Easy questions may not need retrieval at all, while hard questions may need structured graph reasoning. ...

May 9, 2026 · 13 min