Best Open Source multimodal Libraries

A curated list of the most popular GitHub repositories tagged with multimodal. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

54,814JavaScript

Analyze Code

#2bytedance/UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

28,120TypeScript

Analyze Code

#3haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

24,471Python

Analyze Code

#4microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

22,031Python

Analyze Code

#5jina-ai/serve

☁️ Build multimodal AI applications with cloud-native stack

21,830Python

Analyze Code

#6deepseek-ai/Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

17,710Python

Analyze Code

#7modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

12,693Python

Analyze Code

#8RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

9,151C++

Analyze Code

#9apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

9,124Java

Analyze Code