back to home

Best Open Source inference Libraries

A curated list of the most popular GitHub repositories tagged with inference. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

73,416Python
Explore Repo

#2ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

47,618C++
Explore Repo

#3deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

41,835Python
Explore Repo

#4hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

41,361Python
Explore Repo

#5google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

34,172C++
Explore Repo

#6sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

26,081Python
Explore Repo

#7stas00/ml-engineering

Machine Learning Engineering Open Book

17,414Python
Explore Repo

#8gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

14,837TypeScript
Explore Repo

#9openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

9,899C++
Explore Repo

#10xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

9,135Python
Explore Repo

#11oumi-ai/oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

8,911Python
Explore Repo

#12dusty-nv/jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

8,755C++
Explore Repo

#13kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

4,937C++
Explore Repo

#14tencentmusic/cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场,支持国产cpu/gpu/npu 昇腾生态,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式

4,887Python
Explore Repo

#15NVIDIA-AI-IOT/torch2trt

An easy to use PyTorch to TensorRT converter

4,858Python
Explore Repo

#16vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

4,420Python
Explore Repo

#17vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

1,949Python
Explore Repo

#18jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

1,235C++
Explore Repo

#19alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

1,095Cuda
Explore Repo

#20waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

900Python
Explore Repo

#21jjang-ai/mlxstudio

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

534
Explore Repo

#22kaito-project/aikit

🏗️ Fine-tune, build, and deploy open-source LLMs easily!

518Go
Explore Repo