back to home

Best Open Source cuda Libraries

A curated list of the most popular GitHub repositories tagged with cuda. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

70,857Python
Analyze Code

#2hashcat/hashcat

World's fastest and most advanced password recovery utility

25,462C
Analyze Code

#3sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

23,633Python
Analyze Code

#4NVIDIA/nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

17,495
Analyze Code

#5tracel-ai/burn

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

14,404Rust
Analyze Code

#6vosen/ZLUDA

CUDA on non-NVIDIA GPUs

13,947Rust
Analyze Code

#7isl-org/Open3D

Open3D: A Modern Library for 3D Data Processing

13,330C++
Analyze Code

#8NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

12,918Python
Analyze Code

#9xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

9,693Cuda
Analyze Code

#10rapidsai/cudf

cuDF - GPU DataFrame Library

9,495C++
Analyze Code

#11Oneflow-Inc/oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

9,394C++
Analyze Code

#12NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

9,302C++
Analyze Code

#13replicate/cog

Containers for machine learning

9,246Go
Analyze Code

#14jamiepine/voicebox

The open-source voice synthesis studio powered by Qwen3-TTS.

9,144TypeScript
Analyze Code

#15NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

8,865C
Analyze Code

#16catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

8,811C++
Analyze Code

#17LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

6,910Python
Analyze Code

#18XuehaiPan/nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

6,578Python
Analyze Code