back to home

Best Open Source cuda Libraries

A curated list of the most popular GitHub repositories tagged with cuda. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

73,416Python
Explore Repo

#2sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

26,081Python
Explore Repo

#3hashcat/hashcat

World's fastest and most advanced password recovery utility

25,601C
Explore Repo

#4NVlabs/instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

17,315Cuda
Explore Repo

#5kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

15,343Shell
Explore Repo

#6NVIDIA/TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

13,426Python
Explore Repo

#7xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

9,920Cuda
Explore Repo

#8rapidsai/cudf

cuDF - GPU DataFrame Library

9,543C++
Explore Repo

#9NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

9,447C++
Explore Repo

#10Oneflow-Inc/oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

9,389C++
Explore Repo

#11replicate/cog

Containers for machine learning

9,268Go
Explore Repo

#12NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

8,961C
Explore Repo

#13catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

8,845C++
Explore Repo

#14NVIDIA/warp

A Python framework for GPU-accelerated simulation, robotics, and machine learning.

6,534Python
Explore Repo

#15arrayfire/arrayfire

ArrayFire: a general purpose GPU library.

4,868C++
Explore Repo

#16iree-org/iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

3,725C++
Explore Repo

#17apache/mahout

Apache Mahout - an environment for quickly creating scalable, performant machine learning applications.

2,282Rust
Explore Repo

#18pykeen/pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings

1,979Python
Explore Repo

#19tenstorrent/tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

1,415C++
Explore Repo

#20ForceInjection/AI-fundermentals

AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识

1,082HTML
Explore Repo

#21Devsh-Graphics-Programming/Nabla

Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android

683C++
Explore Repo

#22xlite-dev/ffpa-attn

🤖FFPA: Extend FlashAttention-2 w/ Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.

268Cuda
Explore Repo

#23invergent-ai/surogate

Training/Fine-tuning at the speed of light

170C++
Explore Repo

#24glotzerlab/fresnel

Publication quality path tracing in real time.

127C++
Explore Repo