amusi / CVPR2026-Papers-with-Code
CVPR 2026 论文和开源项目合集
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing amusi/CVPR2025-Papers-with-Code in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewCVPR 2026 论文和开源项目合集(Papers with Code)
CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092
注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!

【CVPR 2026 论文开源目录】
- 3DGS(Gaussian Splatting)
- Agent)
- Avatars
- Backbone
- CLIP
- Mamba
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- 具身智能(Embodied AI)
- 空间智能(Spatial Intelligence
- NAS
- OCR
- NeRF
- DETR
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 3D Visual Grounding(3D视觉定位)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 遥感(Remote)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 视频压缩(Video Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 暗光图像增强(Low-light Image Enhancement)
- 场景图生成(Scene Graph Generation)
- 图像检索(Image Retrieval)
- 风格迁移(Style Transfer)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 压缩感知(Compressive Sensing)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
3DGS(Gaussian Splatting)
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting
- Paper: https://arxiv.org/abs/2602.20933
- Code:
- Project: https://sk-fun.fun/DropAnSH-GS
Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking
- Paper: https://arxiv.org/abs/2512.01329
- Project: https://haza628.github.io/tagSplat/
FastGS: Training 3D Gaussian Splatting in 100 Seconds
- Paper: https://arxiv.org/pdf/2511.04283
- Code: https://github.com/fastgs/FastGS
- Project: https://fastgs.github.io/
Agent
Avatars
Backbone
CLIP
Mamba
GAN
OCR
NeRF
DETR
Prompt
多模态大语言模型(MLLM)
Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
- Paper: https://arxiv.org/abs/2603.05075
- Code:
- Project: https://any2any-mllm.github.io/unim/
大语言模型(LLM)
具身智能(Embodied AI)
Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI
- Paper: https://arxiv.org/abs/2511.20620
- Code: https://github.com/ai4ce/wanderland
- Project: https://ai4ce.github.io/wanderland/
空间智能(Spatial Intelligence)
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
- Paper: https://arxiv.org/abs/2510.27606
- Code: https://github.com/InternLM/Spatial-SSRL
- Model: https://huggingface.co/internlm/Spatial-SSRL-7B
NAS
ReID(重识别)
MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
扩散模型(Diffusion Models)
Vision Transformer
视觉和语言(Vision-Language)
StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
ApET: Approximation-Error Guided Token Compression for Efficient VLMs
Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
目标检测(Object Detection)
异常检测(Anomaly Detection)
目标跟踪(Object Tracking)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation
- Paper: https://arxiv.org/abs/2602.20423
- Code: https://github.com/HealthX-Lab/MedCLIPSeg
- Project: https://tahakoleilat.github.io/MedCLIPSeg
自动驾驶(Autonomous Driving)
Open-Vocabulary Domain Generalization in Urban-Scene Segmentation
U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
3D点云(3D-Point-Cloud)
CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
图像去噪(Image Denoising)
3D人体姿态估计(3D Human Pose Estimation)
#3D Visual Grounding(3D视觉定位)
图像生成(Image Generation)
ExpPortrait: Expressive Portrait Generation via Personalized Representation
- Paper: https://arxiv.org/abs/2602.19900
- Code:
视频生成(Video Generation)
图像编辑(Image Editing)
视频编辑(Video Editing)
3D生成(3D Generation)
3D重建(3D Reconstruction)
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
- Project: https://cwchenwang.github.io/tttLRM/
- Paper: https://arxiv.org/abs/2602.20160
- Code: https://github.com/cwchenwang/tttLRM
Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning
- Project: https://flow3r-project.github.io/
- Paper: https://arxiv.org/abs/2602.20157
- Code: https://github.com/Kidrauh/flow3r
RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing
人体运动生成(Human Motion Generation)
视频理解(Video Understanding)
遥感(Remote)
Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation
- Paper: https://arxiv.org/abs/2602.19863
- Code: None
知识蒸馏(Knowledge Distillation)
深度估计(Depth Estimation)
立体匹配(Stereo Matching)
暗光图像增强(Low-light Image Enhancement)
图像压缩(Image Compression)](#IC)
视频压缩(Video Compression)](#VC)
UniComp: Rethinking Video Compression Through Informational Uniqueness
场景图生成(Scene Graph Generation)
图像检索(Image Retrieval)
**PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing **
- Paper: https://arxiv.org/abs/2603.04598
- Code:
风格迁移(Style Transfer)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
压缩感知(Compressive Sensing)
数据集(Datasets)
其他(Others)
Decoupling Defense Strategies for Robust Image Watermarking
- Paper: https://arxiv.org/abs/2602.20053
- Code: None
Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery
- Paper: https://arxiv.org/abs/2602.19910
- Code:
The Invisible Gorilla Effect in Out-of-distribution Detection
- Paper: https://arxiv.org/abs/2602.20068
- Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect
SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images
- Paper: https://arxiv.org/abs/2602.20412
- Code:
RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces
- Paper: https://arxiv.org/abs/2602.20618
- Code:
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models
- Paper:
- Code:
GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement
- Paper: https://arxiv.org/abs/2603.05095
- Code:
FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation
**Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning **