Best Open Source data pipelines Libraries
A curated list of the most popular GitHub repositories tagged with data pipelines. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
#2apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
#3dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
#4apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
#5Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
#6mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.