StreamKV

Background Streaming video question-answering (StreamingVQA) requires a model to continuously process incoming video, preserve useful historical context, and answer questions online with low latency. ReKV showed that video QA can be reformulated as retrieve relevant KV caches first, then answer with the retrieved KV. But it still has several weaknesses: It uses uniform segmentation, which may cut through semantic boundaries. It keeps essentially the whole historical visual context, so memory usage is still large. Its retrieval strategy is not flexible enough, especially when the useful information is distributed differently across layers. Core Idea StreamKV extends the ReKV line in two directions at the same time: ...

April 22, 2026 · 8 min

ReKV

Background Consider the problem of streaming video question-answering (StreamingVQA), it presents three challenges: Efficient Video Encoding: we need to efficiently process incoming frames without access to future frames or frequent revisiting of distant past frames. Video Context Preservation: models must preserve relevant information from earlier frames. Real-Time Response: models must provide accurate answers with minimum delay. Core Idea The attention calculation makes it possible to decouple video encoding from question answering. So we can pre-produce KV and reuse KV in QA. ...

April 22, 2026 · 3 min