Best Open Source multimodal Libraries
A curated list of the most popular GitHub repositories tagged with multimodal. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
#2bytedance/UI-TARS-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
#3haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
#4microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
#5jina-ai/serve
☁️ Build multimodal AI applications with cloud-native stack
#6deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
#7modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).
#8RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
#9apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.