Best Open Source distributed Libraries
A curated list of the most popular GitHub repositories tagged with distributed. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
#2ClickHouse/ClickHouse
ClickHouse® is a real-time analytics database management system
#3mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, MCP, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference
#4milvus-io/milvus
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
#5ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
#6nextcloud/server
☁️ Nextcloud server, a safe home for all your data
#7surrealdb/surrealdb
A scalable, distributed, collaborative, document-graph database, for the realtime web
#8xuxueli/xxl-job
A distributed task scheduling framework.(分布式任务调度平台XXL-JOB)
#9ageron/handson-ml
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
#10taosdata/TDengine
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
#11dianping/cat
CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。
#12teambit/bit
AI-powered development workspaces with reusable components, architectural clarity and zero overhead.
#13lightgbm-org/LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
#14Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
#15orbitdb/orbitdb
Peer-to-Peer Databases for the Decentralized Web
#16hazelcast/hazelcast
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
#17GreptimeTeam/greptimedb
The open-source Observability 2.0 database. One engine for metrics, logs, and traces — replacing Prometheus, Loki & ES.
#18microsoft/FluidFramework
Library for building distributed, real-time collaborative web applications
#19ydb-platform/ydb
YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
#20crate/crate
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
#21apache/datafusion-ballista
Apache DataFusion Ballista Distributed Query Engine
#22apache/flink-agents
Flink Agents is an Agentic AI framework based on Apache Flink
#23tonbo-io/ursula
Distributed event stream server over HTTP, backed by S3.