back to home

Best Open Source data science Libraries

A curated list of the most popular GitHub repositories tagged with data science. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1microsoft/ML-For-Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

83,829Jupyter Notebook
Analyze Code

#2apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

70,618TypeScript
Analyze Code

#3scikit-learn/scikit-learn

scikit-learn: machine learning in Python

65,186Python
Analyze Code

#4keras-team/keras

Deep Learning for humans

63,864Python
Analyze Code

#5Asabeneh/30-Days-Of-Python

The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

58,405Python
Analyze Code

#6pandas-dev/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

47,933Python
Analyze Code

#7GokuMohandas/Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

46,391Jupyter Notebook
Analyze Code

#8apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

44,349Python
Analyze Code

#9streamlit/streamlit

Streamlit — A faster way to build and share data apps.

43,570Python
Analyze Code

#10gradio-app/gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

41,779Python
Analyze Code

#11ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

41,416Python
Analyze Code

#12microsoft/Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

33,972Jupyter Notebook
Analyze Code

#13explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

33,228Python
Analyze Code

#14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

31,785
Analyze Code

#15eriklindernoren/ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

30,861Python
Analyze Code

#16Lightning-AI/pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

30,855Python
Analyze Code

#17AMAI-GmbH/AI-Expert-Roadmap

Roadmap to becoming an Artificial Intelligence Expert in 2022

30,751JavaScript
Analyze Code

#18donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

28,880Python
Analyze Code

#19eugeneyan/applied-ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

28,694
Analyze Code

#20CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

28,481Jupyter Notebook
Analyze Code

#21academic/awesome-datascience

:memo: An awesome Data Science repository to learn and apply for real world problems.

28,413
Analyze Code

#22d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

28,194Python
Analyze Code

#23fastai/fastbook

The fastai book, published as Jupyter Notebooks

24,594Jupyter Notebook
Analyze Code

#24plotly/dash

Data Apps & Dashboards for Python. No JavaScript Required.

24,507Python
Analyze Code

#25lukasmasuch/best-of-ml-python

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

23,240
Analyze Code

#26sinaptik-ai/pandas-ai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

23,211Python
Analyze Code

#27matplotlib/matplotlib

matplotlib: plotting with Python

22,467Python
Analyze Code

#28PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

21,650Python
Analyze Code

#29recommenders-team/recommenders

Best Practices on Recommendation Systems

21,455Python
Analyze Code

#30afshinea/stanford-cs-229-machine-learning

VIP cheatsheets for Stanford's CS 229 Machine Learning

19,274
Analyze Code

#31marimo-team/marimo

A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.

19,249Python
Analyze Code

#32dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

14,983Python
Analyze Code

#33microsoft/nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

14,341Python
Analyze Code

#34virgili0/Virgilio

Your new Mentor for Data Science E-Learning.

14,314Jupyter Notebook
Analyze Code

#35oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

14,239
Analyze Code

#36mwaskom/seaborn

Statistical data visualization in Python

13,739Python
Analyze Code

#37visenger/awesome-mlops

A curated list of references for MLOps

13,711
Analyze Code

#38ydataai/ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

13,387Python
Analyze Code

#39jpmorganchase/python-training

Python training for business analysts and traders

12,721Jupyter Notebook
Analyze Code

#40tangyudi/Ai-Learn

人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

12,642
Analyze Code

#41rasbt/python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

12,590Jupyter Notebook
Analyze Code

#42trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

12,579Java
Analyze Code

#43dair-ai/ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

12,241
Analyze Code

#44chiphuyen/machine-learning-systems-design

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

9,986HTML
Analyze Code

#45microsoft/computervision-recipes

Best Practices, code samples, and documentation for Computer Vision.

9,827Jupyter Notebook
Analyze Code

#46alexeygrigorev/data-science-interviews

Data science interview questions and answers

9,775HTML
Analyze Code

#47yzhao062/pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

9,722Python
Analyze Code

#48pycaret/pycaret

An open-source, low-code machine learning library in Python

9,698Jupyter Notebook
Analyze Code

#49drivendataorg/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

9,685Python
Analyze Code

#50tflearn/tflearn

Deep learning library featuring a higher-level API for TensorFlow.

9,606Python
Analyze Code