Best Open Source inference Libraries
A curated list of the most popular GitHub repositories tagged with inference. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
#2ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
#3deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
#4hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
#5google-ai-edge/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
#6sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
#7Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
#8SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
#9gvergnaud/ts-pattern
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
#10NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
#11openvinotoolkit/openvino
OpenVINOâ„¢ is an open source toolkit for optimizing and deploying AI inference
#12RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
#13xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
#14oumi-ai/oumi
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
#15dusty-nv/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
#16LMCache/LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
#17gcanti/io-ts
Runtime type system for IO decoding/encoding