back to home

NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter

4,858 stars
698 forks
344 issues
PythonC++Cuda

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing NVIDIA-AI-IOT/torch2trt in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/NVIDIA-AI-IOT/torch2trt)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

torch2trt > What models are you using, or hoping to use, with TensorRT? Feel free to join the discussion here. torch2trt is a PyTorch to TensorRT converter which utilizes the TensorRT Python API. The converter is • Easy to use - Convert modules with a single function call torch2trt • Easy to extend - Write your own layer converter in Python and register it with @tensorrt_converter If you find an issue, please let us know! > Please note, this converter has limited coverage of TensorRT / PyTorch. We created it primarily > to easily optimize the models used in the JetBot project. If you find the converter helpful with other models, please let us know. Usage Below are some usage examples, for more check out the notebooks. Convert Execute We can execute the returned TRTModule just like the original PyTorch model Save and load We can save the model as a state_dict . We can load the saved model into a TRTModule Models We tested the converter against these models using the test.sh script. You can generate the results by calling > The results below show the throughput in FPS. You can find the raw output, which includes latency, in the benchmarks folder. | Model | Nano (PyTorch) | Nano (TensorRT) | Xavier (PyTorch) | Xavier (TensorRT) | |-------|:--------------:|:---------------:|:----------------:|:-----------------:| | alexnet | 46.4 | 69.9 | 250 | 580 | | squeezenet1_0 | 44 | 137 | 130 | 890 | | squeezenet1_1 | 76.6 | 248 | 132 | 1390 | | resnet18 | 29.4 | 90.2 | 140 | 712 | | resnet34 | 15.5 | 50.7 | 79.2 | 393 | | resnet50 | 12.4 | 34.2 | 55.5 | 312 | | resnet101 | 7.18 | 19.9 | 28.5 | 170 | | resnet152 | 4.96 | 14.1 | 18.9 | 121 | | densenet121 | 11.5 | 41.9 | 23.0 | 168 | | densenet169 | 8.25 | 33.2 | 16.3 | 118 | | densenet201 | 6.84 | 25.4 | 13.3 | 90.9 | | densenet161 | 4.71 | 15.6 | 17.2 | 82.4 | | vgg11 | 8.9 | 18.3 | 85.2 | 201 | | vgg13 | 6.53 | 14.7 | 71.9 | 166 | | vgg16 | 5.09 | 11.9 | 61.7 | 139 | | vgg19 | | | 54.1 | 121 | | vgg11_bn | 8.74 | 18.4 | 81.8 | 201 | | vgg13_bn | 6.31 | 14.8 | 68.0 | 166 | | vgg16_bn | 4.96 | 12.0 | 58.5 | 140 | | vgg19_bn | | | 51.4 | 121 | Setup > Note: torch2trt depends on the TensorRT Python API. On Jetson, this is included with the latest JetPack. For desktop, please follow the TensorRT Installation Guide. You may also try installing torch2trt inside one of the NGC PyTorch docker containers for Desktop or Jetson. Step 1 - Install the torch2trt Python library To install the torch2trt Python library, call the following Step 2 (optional) - Install the torch2trt plugins library To install the torch2trt plugins library, call the following This includes support for some layers which may not be supported natively by TensorRT. Once this library is found in the system, the associated layer converters in torch2trt are implicitly enabled. > Note: torch2trt now maintains plugins as an independent library compiled with CMake. This makes compiled TensorRT engines more portable. If needed, the deprecated plugins (which depend on PyTorch) may still be installed by calling python setup.py install --plugins . Step 3 (optional) - Install experimental community contributed features To install torch2trt with experimental community contributed features under torch2trt.contrib , like Quantization Aware Training (QAT)( ), call the following, This enables you to run the QAT example located here. How does it work? This converter works by attaching conversion functions (like convert_ReLU ) to the original PyTorch functional calls (like torch.nn.ReLU.forward ). The sample input data is passed through the network, just as before, except now whenever a registered function ( torch.nn.ReLU.forward ) is encountered, the corresponding converter ( convert_ReLU ) is also called afterwards. The converter is passed the arguments and return statement of the original PyTorch function, as well as the TensorRT network that is being constructed. The input tensors to the original PyTorch function are modified to have an attribute _trt , which is the TensorRT counterpart to the PyTorch tensor. The conversion function uses this _trt to add layers to the TensorRT network, and then sets the _trt attribute for relevant output tensors. Once the model is fully executed, the final tensors returns are marked as outputs of the TensorRT network, and the optimized TensorRT engine is built. How to add (or override) a converter Here we show how to add a converter for the ReLU module using the TensorRT python API. The converter takes one argument, a ConversionContext , which will contain the following • ctx.network - The TensorRT network that is being constructed. • ctx.method_args - Positional arguments that were passed to the specified PyTorch function. The _trt attribute is set for relevant input tensors. • ctx.method_kwargs - Keyword arguments that were passed to the specified PyTorch function. • ctx.method_return - The value returned by the specified PyTorch function. The converter must set the _trt attribute where relevant. Please see this folder for more examples. See also • JetBot - An educational AI robot based on NVIDIA Jetson Nano • JetRacer - An educational AI racecar using NVIDIA Jetson Nano • JetCam - An easy to use Python camera interface for NVIDIA Jetson • JetCard - An SD card image for web programming AI projects with NVIDIA Jetson Nano