Best Open Source cuda Libraries
A curated list of the most popular GitHub repositories tagged with cuda. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
#2sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
#3hashcat/hashcat
World's fastest and most advanced password recovery utility
#4NVlabs/instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
#5kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
#6NVIDIA/TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
#7xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
#8rapidsai/cudf
cuDF - GPU DataFrame Library
#9NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
#10Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
#11replicate/cog
Containers for machine learning
#12NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
#13catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
#14NVIDIA/warp
A Python framework for GPU-accelerated simulation, robotics, and machine learning.
#15arrayfire/arrayfire
ArrayFire: a general purpose GPU library.
#16iree-org/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
#17apache/mahout
Apache Mahout - an environment for quickly creating scalable, performant machine learning applications.
#18pykeen/pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
#19tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
#20ForceInjection/AI-fundermentals
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识
#21Devsh-Graphics-Programming/Nabla
Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android
#22xlite-dev/ffpa-attn
🤖FFPA: Extend FlashAttention-2 w/ Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.
#23invergent-ai/surogate
Training/Fine-tuning at the speed of light
#24glotzerlab/fresnel
Publication quality path tracing in real time.