983632847 / Awesome-Multimodal-Object-Tracking

A continuously updated project to track the latest progress in the field of multi-modal object tracking. This project focuses solely on single-object tracking.

View on GitHub

1,196 stars

54 forks

0 issues

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing 983632847/Awesome-Multimodal-Object-Tracking in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/983632847/Awesome-Multimodal-Object-Tracking)

Preview:

Repository Overview (README excerpt)

Crawler view

Awesome Multi-modal Object Tracking ------ > **A Comprehensive Survey: Awesome Multi-modal Object Tracking.** Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang. [paper] [homepage][中文解读] > ** Abstract:** *Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attention. However, existing MMOT algorithms mainly focus on two modalities (\eg RGB+depth, RGB+thermal infrared, and RGB+language). To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality. Additionally, some large-scale multi-modal tracking benchmarks have been established by simultaneously providing more than two modalities, such as vision-language-audio (\eg WebUAV-3M) and vision-depth-language (\eg UniMod1K). To track the latest progress in MMOT, we conduct a comprehensive investigation in this report. Specifically, we first divide existing MMOT tasks into five main categories, \ie RGBL tracking, RGBE tracking, RGBD tracking, RGBT tracking, and miscellaneous (RGB+X), where X can be any modality, such as language, depth, and event. Then, we analyze and summarize each MMOT task, focusing on widely used datasets and mainstream tracking algorithms based on their technical paradigms (\eg self-supervised learning, prompt learning, knowledge distillation, generative models, and state space models). Finally, we maintain a continuously updated paper list for MMOT at this https URL.* > **Awesome MMOT:** A continuously updated project to track the latest progress in multi-modal object tracking (MMOT). This project focuses solely on single-object tracking. If this repository can bring you some inspiration, we would feel greatly honored. If you have any suggestions, please feel free to contact: andyzhangchunhui@gmail.com. > **UPDATE:** Our survey covers common paradigms of multi-modal object tracking, including RGBL, RGBE, RGBD, RGBT, RGB-Sonar, RGB-NIR, miscellaneous (RGB+X) tracking, Embodied Visual Tracking (EVT), and Hyperspectral Object Tracking (HOT). We welcome researchers to submit pull requests and become contributors to this project. If you like our project, please give us a star ⭐ on this GitHub. :fire: Awesome Visual Object Tracking (VOT) Project is at Awesome-VOT. :collision: Highlights • 2026.01.23: We Released UAV-Anti-UAV dataset V1.5 with both Training and Test Sets Available (Project). • 2025.12.08: The Paper of UAV-Anti-UAV & MambaSTS was Online (arXiv). • 2025.04.28: The Paper of UW-COT220 & VL-SAM2 was Accepted by CVPR 2025 Workshop (arXiv, Outstanding Paper). • 2025.04.02: We Released UW-COT220 & VL-SAM2, with both Training and Testing Code Available (Project). • 2025.04.02: The Paper of VL-SOT500 & COST was Online (arXiv, Project). • 2025.02.28: Awesome Visual Object Tracking Project Started at Awesome-VOT. • 2025.01.20: The Technical Report for UW-COT220 & VL-SAM2 was Updated (arXiv, 知乎). • 2024.09.26: The WebUOT-1M was Accepted by NeurIPS 2024, and its Extended Version, UW-COT220, was Online (arXiv). • 2024.05.30: The Paper of WebUOT-1M was Online (arXiv). • 2024.05.24: The Report of Awesome MMOT Project was Online (arXiv, 知乎). • 2024.05.20: Awesome MMOT Project Started. Contents • Survey • Embodied Visual Tracking • Datasets • Papers • Vision-Language Tracking (RGBL Tracking) • Datasets • Papers • RGBE Tracking • Datasets • Papers • RGBD Tracking • Datasets • Papers • RGBT Tracking • Datasets • Papers • Miscellaneous (RGB+X) • Datasets • Papers • Hyperspectral Object Tracking • Datasets • Papers • Others (RGBNIR, RGBS, etc) • Awesome Repositories for MMOT :fire: Citation If you find our work useful in your research, please consider citing: Survey :boom:**Awesome MMOT:** Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang. "Awesome Multi-modal Object Tracking." ArXiv (2024). [Paper] [MMOT Project] • Pengyu Zhang, Dong Wang, Huchuan Lu. "Multi-modal Visual Tracking: Review and Experimental Comparison." ArXiv (2022). [paper] • Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu. "A Survey for Deep RGBT Tracking." ArXiv (2022). [paper] • Jinyu Yang, Zhe Li, Song Yan, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen, Ling Shao. "RGBD Object Tracking: An In-depth Review." ArXiv (2022). [paper] • Chenglong Li, Andong Lu, Lei Liu, Jin Tang. "Multi-modal visual tracking: a survey. 多模态视觉跟踪方法综述" Journal of Image and Graphics.中国图象图形学报 (2023). [paper] • Ou Zhou, Ying Ge, Zhang Dawei, and Zheng Zhonglong. "A Survey of RGB-Depth Object Tracking. RGB-D 目标跟踪综述" Journal of Computer-Aided Design & Computer Graphics. 计算机辅助设计与图形学学报 (2024). [paper] • Zhang, ZhiHao and Wang, Jun and Zang, Zhuli and Jin, Lei and Li, Shengjie and Wu, Hao and Zhao, Jian and Bo, Zhang. "Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective." ACM Transactions on Multimedia Computing, Communications and Applications (2024). [paper] • **MV-RGBT & MoETrack:** Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler. "Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method." TIP (2025). [paper] [code] • Xingchen Zhang and Ping Ye and Henry Leung and Ke Gong and Gang Xiao. "Object fusion tracking based on visible and infrared images: A comprehensive review." Information Fusion (2024). [paper] • Mingzheng Feng and Jianbo Su. "RGBT tracking: A comprehensive review." Information Fusion (2024). [paper] • Zhang, Haiping and Yuan, Di and Shu, Xiu and Li, Zhihui and Liu, Qiao and Chang, Xiaojun and He, Zhenyu and Shi, Guangming. "A Comprehensive Review of RGB…