StreamChat

Paper: Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge Code: hmxiong/StreamChat Background Most Video-LLMs are still awkward in a real streaming setting. Offline video QA usually assumes: the whole video is already available; the question is known before inference; the interaction is single-turn. But a streaming assistant has a different problem: video frames keep arriving; the user may ask questions at arbitrary timestamps; the system should remember previous conversation turns; the answer should come back with low latency. This is close to the motivation of ReKV, Flash-VStream, LiveVLM, and rLiVS, but StreamChat chooses a different abstraction. ...

May 2, 2026 · 12 min

StreamMem

Paper: StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding Background Streaming video understanding is hard because the model has to process frames as they arrive, without knowing: how long the video will be; what future user questions will ask; which past details will become important later. For long videos, the visual tokens and their KV cache keep growing over time. Even if a long-context MLLM can technically accept many tokens, storing and attending to all historical KV entries is still expensive. ...

April 24, 2026 · 10 min