InfiniPot-V

Paper: InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding Code: aiha-lab/InfiniPot-V Background Streaming video understanding is more constrained than offline long-video understanding. In offline settings, the model can see the whole video first, maybe even the user query first, and then decide how to compress tokens or KV cache. But in streaming settings: frames arrive continuously; future queries are unknown; memory is fixed; KV cache still grows roughly linearly with time. This is exactly the part that makes many existing KV compression methods awkward for real streaming scenarios. ...

April 23, 2026 · 11 min