Best Open Source inference Libraries
A curated list of the most popular GitHub repositories tagged with inference. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
#2ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
#3deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
#4hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
#5google-ai-edge/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
#6sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
#7stas00/ml-engineering
Machine Learning Engineering Open Book
#8gvergnaud/ts-pattern
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
#9openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
#10xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
#11oumi-ai/oumi
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
#12dusty-nv/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
#13vllm-project/vllm-omni
A framework for efficient model inference with omni-modality models
#14kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
#15tencentmusic/cube-studio
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场,支持国产cpu/gpu/npu 昇腾生态,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式
#16NVIDIA-AI-IOT/torch2trt
An easy to use PyTorch to TensorRT converter
#17PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
#18vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
#19larq/compute-engine
Highly optimized inference engine for Binarized Neural Networks
#20llm-d/llm-d-router
llm-d Router: The intelligent entry point for inference requests
#21chigwell/llm7.io
LLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.
#22novitalabs/pegaflow
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.
#23zhongkaifu/TensorSharp
A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability