facebookresearch / co-tracker
CoTracker is a model for tracking any point (pixel) on a video.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing facebookresearch/co-tracker in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewCoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos **Meta AI Research, GenAI**; **University of Oxford, VGG** Nikita Karaev, Iurii Makarov, Jianyuan Wang, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht Project Page | Paper #1 | Paper #2 | X Thread | BibTeX **CoTracker** is a fast transformer-based model that can track any point in a video. It brings to tracking some of the benefits of Optical Flow. CoTracker can track: • **Any pixel** in a video • A **quasi-dense** set of pixels together • Points can be manually selected or sampled on a grid in any video frame Try these tracking modes for yourself with our Colab demo or in the Hugging Face Space 🤗. **Updates:** • [January 21, 2025] 📦 Kubric Dataset used for CoTracker3 now available! This dataset contains **6,000 high-resolution sequences** (512×512px, 120 frames) with slight camera motion, rendered using the Kubric engine. Check it out on Hugging Face Dataset. • [October 15, 2024] 📣 We're releasing CoTracker3! State-of-the-art point tracking with a lightweight architecture trained with 1000x less data than previous top-performing models. Code for baseline models and the pseudo-labeling pipeline are available in the repo, as well as model checkpoints. Check out our paper for more details. • [September 25, 2024] CoTracker2.1 is now available! This model has better performance on TAP-Vid benchmarks and follows the architecture of the original CoTracker. Try it out! • [June 14, 2024] We have released the code for VGGSfM, a model for recovering camera poses and 3D structure from any image sequences based on point tracking! VGGSfM is the first fully differentiable SfM framework that unlocks scalability and outperforms conventional SfM methods on standard benchmarks. • [December 27, 2023] CoTracker2 is now available! It can now track many more (up to **265*265**!) points jointly and it has a cleaner and more memory-efficient implementation. It also supports online processing. See the updated paper for more details. The old version remains available here. • [September 5, 2023] You can now run our Gradio demo locally. Quick start The easiest way to use CoTracker is to load a pretrained model from : Offline mode: , then: Online mode: Online processing is more memory-efficient and allows for the processing of longer videos. However, in the example provided above, the video length is known! See the online demo for an example of tracking from an online stream with an unknown video length. Visualize predicted tracks: After installing CoTracker, you can visualize tracks with: We offer a number of other ways to interact with CoTracker: • Interactive Gradio demo: • A demo is available in the Hugging Face Space 🤗. • You can use the gradio demo locally by running after installing the required packages: . • Jupyter notebook: • You can run the notebook in Google Colab. • Or explore the notebook located at . • You can install CoTracker _locally_ and then: • Run an *offline* demo with 10 ⨉ 10 points sampled on a grid on the first frame of a video (results will be saved to )): • Run an *online* demo: A GPU is strongly recommended for using CoTracker locally. Installation Instructions You can use a Pretrained Model via PyTorch Hub, as described above, or install CoTracker from this GitHub repo. This is the best way if you need to run our local demo or evaluate/train CoTracker. Ensure you have both _PyTorch_ and _TorchVision_ installed on your system. Follow the instructions here for the installation. We strongly recommend installing both PyTorch and TorchVision with CUDA support, although for small tasks CoTracker can be run on CPU. Install a Development Version You can manually download all CoTracker3 checkpoints (baseline and scaled models, as well as single and sliding window architectures) from the links below and place them in the folder as follows: You can also download CoTracker3 checkpoints trained only on Kubric: For old checkpoints, see this section. Evaluation To reproduce the results presented in the paper, download the following datasets: • TAP-Vid • Dynamic Replica And install the necessary dependencies: Then, execute the following command to evaluate the online model on TAP-Vid DAVIS: And the offline model: We run evaluations jointly on all the target points at a time for faster inference. With such evaluations, the numbers are similar to those presented in the paper. If you want to reproduce the exact numbers from the paper, add the flag . These are the numbers that you should be able to reproduce using the released checkpoint and the current version of the codebase: | | Kinetics, $\delta_\text{avg}^\text{vis}$ | DAVIS, $\delta_\text{avg}^\text{vis}$ | RoboTAP, $\delta_\text{avg}^\text{vis}$ | RGB-S, $\delta_\text{avg}^\text{vis}$| | :---: |:---: | :---: | :---: | :---: | | CoTracker2, 27.12.23 | 61.8 | 74.6 | 69.6 | 73.4 | | CoTracker2.1, 25.09.24 | 63 | 76.1 | 70.6 | 79.6 | | CoTracker3 offline, 15.10.24 | 67.8 | **76.9** | 78.0 | **85.0** | | CoTracker3 online, 15.10.24 | **68.3** | 76.7 | **78.8** | 82.7 | Training Baseline To train the CoTracker as described in our paper, you first need to generate annotations for Google Kubric MOVI-f dataset. Instructions for annotation generation can be found here. You can also find a discussion on dataset generation in this issue. Once you have the annotated dataset, you need to make sure you followed the steps for evaluation setup and install the training dependencies: Now you can launch training on Kubric. Our model was trained for 50000 iterations on 32 GPUs (4 nodes with 8 GPUs). Modify _dataset_root_ and _ckpt_path_ accordingly before running this command. For training on 4 nodes, add . Here is an example of how to launch training of the online model on Kubric: Training the offline model on Kubric: Fine-tuning with pseudo labels In order to launch training with pseudo-labelling, you need to colle…