back to home

amusi / CVPR2026-Papers-with-Code

CVPR 2026 论文和开源项目合集

22,014 stars
2,781 forks
22 issues

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing amusi/CVPR2025-Papers-with-Code in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/amusi/CVPR2025-Papers-with-Code)
Preview:Analyzed by RepoMind

Repository Summary (README)

Preview

CVPR 2026 论文和开源项目合集(Papers with Code)

CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092

注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!

【CVPR 2026 论文开源目录】

3DGS(Gaussian Splatting)

Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting

Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking

FastGS: Training 3D Gaussian Splatting in 100 Seconds

Agent

Avatars

Backbone

CLIP

Mamba

GAN

OCR

NeRF

DETR

Prompt

多模态大语言模型(MLLM)

Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

大语言模型(LLM)

具身智能(Embodied AI)

Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI

空间智能(Spatial Intelligence)

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

NAS

ReID(重识别)

MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification

扩散模型(Diffusion Models)

Vision Transformer

视觉和语言(Vision-Language)

StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

ApET: Approximation-Error Guided Token Compression for Efficient VLMs

Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

目标检测(Object Detection)

异常检测(Anomaly Detection)

目标跟踪(Object Tracking)

医学图像(Medical Image)

医学图像分割(Medical Image Segmentation)

MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation

自动驾驶(Autonomous Driving)

Open-Vocabulary Domain Generalization in Urban-Scene Segmentation

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

3D点云(3D-Point-Cloud)

CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

3D目标检测(3D Object Detection)

3D语义分割(3D Semantic Segmentation)

Low-level Vision

超分辨率(Super-Resolution)

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

#3D Visual Grounding(3D视觉定位)

图像生成(Image Generation)

ExpPortrait: Expressive Portrait Generation via Personalized Representation

视频生成(Video Generation)

图像编辑(Image Editing)

视频编辑(Video Editing)

3D生成(3D Generation)

3D重建(3D Reconstruction)

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing

人体运动生成(Human Motion Generation)

视频理解(Video Understanding)

遥感(Remote)

Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation

知识蒸馏(Knowledge Distillation)

深度估计(Depth Estimation)

立体匹配(Stereo Matching)

暗光图像增强(Low-light Image Enhancement)

图像压缩(Image Compression)](#IC)

视频压缩(Video Compression)](#VC)

UniComp: Rethinking Video Compression Through Informational Uniqueness

场景图生成(Scene Graph Generation)

图像检索(Image Retrieval)

**PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing **

风格迁移(Style Transfer)

图像质量评价(Image Quality Assessment)

视频质量评价(Video Quality Assessment)

压缩感知(Compressive Sensing)

数据集(Datasets)

其他(Others)

Decoupling Defense Strategies for Robust Image Watermarking

Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

The Invisible Gorilla Effect in Out-of-distribution Detection

SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models

  • Paper:
  • Code:

GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

**Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning **