Best Open Source inference Libraries

A curated list of the most popular GitHub repositories tagged with inference. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

73,416Python

Explore Repo

#2ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

47,618C++

Explore Repo

#3deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

41,835Python

Explore Repo

#4hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

41,361Python

Explore Repo

#5google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

34,172C++

Explore Repo

#6sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

26,081Python

Explore Repo

#7stas00/ml-engineering

Machine Learning Engineering Open Book

17,414Python

Explore Repo

#8gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

14,837TypeScript

Explore Repo

#9openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

9,899C++

Explore Repo

#10xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

9,135Python

Explore Repo

#11oumi-ai/oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

8,911Python

Explore Repo

#12dusty-nv/jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

8,755C++

Explore Repo

#13kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

4,937C++

Explore Repo

#14tencentmusic/cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场，支持国产cpu/gpu/npu 昇腾生态，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式

4,887Python

Explore Repo

#15NVIDIA-AI-IOT/torch2trt

An easy to use PyTorch to TensorRT converter

4,858Python

Explore Repo

#16vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

4,420Python

Explore Repo

#17vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

1,949Python

Explore Repo

#18jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

1,235C++

Explore Repo

#19alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,095Cuda

Explore Repo

#20waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

900Python

Explore Repo

#21jjang-ai/mlxstudio

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

534

Explore Repo

#22kaito-project/aikit

🏗️ Fine-tune, build, and deploy open-source LLMs easily!

518Go

Explore Repo