Benchmarks for Streaming Video Understanding

This post is a small index for the benchmarks that appear repeatedly in recent streaming video / long-video VLM papers. The main split is simple: online streaming benchmarks test whether the model can answer while the video is still coming in; offline long-video benchmarks test long-context video understanding, but usually assume the whole video is already available; standard video QA benchmarks are useful for comparability, but they are not the real target of streaming-memory papers. The newer VideoRAG papers add another emphasis: ...

May 2, 2026 · Updated May 10, 2026 · 2 min

Long Streaming Video Understanding Pipeline

Related papers: ReKV: Paper / Code StreamKV: Paper / Code InfiniPot-V: Paper / Code StreamMem: Paper LiveVLM: Paper / Code StreamingTOM: Paper / Code rLiVS: Paper / Code Core Question All of these papers are trying to solve the same systems problem: When a video stream keeps arriving, the user question is not known yet, and GPU memory is limited, how should a Video-LLM process the stream, compress memory, retrieve evidence, and generate an answer with low latency? The methods look different if we read them one by one: KV cache retrieval, semantic chunking, TaR / VaN, chat-template proxy queries, VSB, CTR, OQM, caption RAG, and so on. ...

April 27, 2026 · 13 min