Xnhyacinth / Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
View on GitHubAI Architecture Analysis
This repository is indexed by RepoMind. By analyzing Xnhyacinth/Awesome-LLM-Long-Context-Modeling in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewLarge Language Model Based Long Context Modeling Papers and Blogs 📝 Paper | 📄 List | 📚 Notions This repository includes papers and blogs about Efficient Transformers, KV Cache, Length Extrapolation, Long-Term Memory, Retrieval-Augmented Generation (RAG), Compress, Long Text Generation, Long Video, Long CoT and Evaluation for Long Context Modeling. 🔥 Must-read papers for LLM-based Long Context Modeling. 🔥⚡🔥 Thanks for all the great contributors on GitHub! 🚀🤝🚀 I have the privilege of joining [**LCLM-Horizon**] and collaborating with them on providing a very complete and comprehensive scholarly survey \(A Comprehensive Survey on Long Context Language Modeling\) and repository \(A-Comprehensive-Survey-For-Long-Context-Language-Modeling\) dedicated to **Long Context Language Modeling**. I look forward to collaborating with them to advance research and deepen understanding in this area! If you find our repository and survey useful for your research, please consider citing the following paper: Contents • Large Language Model Based Long Context Modeling Papers and Blogs • Contents • 📢 News • Week Papers • Month Papers • 📜 Papers • 1. Survey Papers • 2. Efficient Attention • 2.1 Sparse Attention • 2.2 Linear Attention • 2.3 Hierarchical Attention • 2.4 IO-Aware Attention • 3. Recurrent Transformers • 4. State Space Models • 5. Long-Context LLMs • 5.1 Length Extrapolation • 5.2 Test-Time Training • 6. Long Term Memory • 7. RAG and ICL • 8. Agent • 9. Compress • 9.1 Context • 9.2 Model • 9.3 Long CoT • 9.4 Latent • 10. Long Video and Image • 10.1 Offline • 10.2 Streaming • 10.3 Frame Selection • 10.4 Vision Language Action • 10.5 Agentic • 11. Benchmark and Evaluation • 11.1 LLM • 11.2 MLLM • 11.3 Agentic • 12. Long Text Generation • 13. Long CoT • 13.1 LLM • 13.2 MLLM • 14. Speculative Decoding • 15. Technical Report • 16. Blogs • Acknowledgements • Contributors • Star History 📢 News Week Papers • **[2026.03.23]** • Paper: HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering • Paper: VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking • Paper: CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management • Paper: ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding • Paper: Adaptive Greedy Frame Selection for Long Video Understanding • **[2026.03.18]** • Paper: Symphony: A Cognitively-Inspired Multi-Agent System for Long-Video Understanding • Paper: Shot-Aware Frame Sampling for Video Understanding • Paper: VideoAtlas: Navigating Long-Form Video in Logarithmic Compute • Paper: Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models • Paper: Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress • Paper: Unified Spatio-Temporal Token Scoring for Efficient Video VLMs • **[2026.02.27]** • Paper: OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence • Paper: XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression • **[2026.02.26]** • Paper: See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs • Paper: TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model • **[2026.01.27]** • Paper: Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers • Paper: Soft Tail-dropping for Adaptive Visual Tokenization • Paper: HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding • Paper: BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model • Paper: LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding • **[2026.01.26]** • Paper: HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding • **[2026.01.22]** • Paper: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings • Paper: RePo: Language Models with Context Re-Positioning • Paper: Fast-weight Product Key Memory • Paper: HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding • **[2026.01.21]** • Paper: Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring • Paper: Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search • Paper: Beyond Accuracy: Evaluating Grounded Visual Evidence in Thinking with Images • Paper: Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning • Paper: AgentOCR: Reimagining Agent History via Optical Self-Compression • **[2026.01.19]** • Paper: Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding • Paper: Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration • **[2026.01.16]** • Paper: Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding • **[2026.01.15]** • Paper: See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval • Paper: STEP3-VL-10B Technical Report • **[2026.01.13]** • Paper: Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning • Paper: Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models • Paper: MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head • Paper: Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs • **[2026.01.12]** • Paper: MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding • Paper: VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice • **[2025.12.17]** • Paper: Zoom-Zero: Reinforced C…