Survey

Related papers: ReKV: Paper / Code StreamKV: Paper / Code InfiniPot-V: Paper / Code StreamMem: Paper LiveVLM: Paper / Code StreamingTOM: Paper / Code rLiVS: Paper / Code Core Question All of these papers are trying to solve the same systems problem: When a video stream keeps arriving, the user question is not known yet, and GPU memory is limited, how should a Video-LLM process the stream, compress memory, retrieve evidence, and generate an answer with low latency? The methods look different if we read them one by one: KV cache retrieval, semantic chunking, TaR / VaN, chat-template proxy queries, VSB, CTR, OQM, caption RAG, and so on. ...

Survey

Benchmarks for Streaming Video Understanding

Long Streaming Video Understanding Pipeline