Best Open Source distributed training Libraries
A curated list of the most popular GitHub repositories tagged with distributed training. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
#2huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
#3PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
#4PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
#5Netflix/metaflow
Build, Manage and Deploy AI/ML Systems
#6skypilot-org/skypilot
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).