ReKV | Yuxuan Tang

Paper: Streaming Video Question-Answering with In-context Video KV-Cache Retrieval Code: Becomebright/ReKV Background Consider the problem of streaming video question-answering (StreamingVQA), it presents three challenges: Efficient Video Encoding: we need to efficiently process incoming frames without access to future frames or frequent revisiting of distant past frames. Video Context Preservation: models must preserve relevant information from earlier frames. Real-Time Response: models must provide accurate answers with minimum delay. Core Idea The attention calculation makes it possible to decouple video encoding from question answering. So we can pre-produce KV and reuse KV in QA. ...