back to home

Best Open Source data science Libraries

A curated list of the most popular GitHub repositories tagged with data science. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1microsoft/ML-For-Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

84,512Jupyter Notebook
Explore Repo

#2apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

70,995TypeScript
Explore Repo

#3scikit-learn/scikit-learn

scikit-learn: machine learning in Python

65,429Python
Explore Repo

#4keras-team/keras

Deep Learning for humans

63,932Python
Explore Repo

#5Asabeneh/30-Days-Of-Python

The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

59,617Python
Explore Repo

#6pandas-dev/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

48,165Python
Explore Repo

#7GokuMohandas/Made-With-ML

Learn how to develop, deploy and iterate on production-grade ML applications.

46,810Jupyter Notebook
Explore Repo

#8apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

44,671Python
Explore Repo

#9streamlit/streamlit

Streamlit — A faster way to build and share data apps.

43,918Python
Explore Repo

#10gradio-app/gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

42,034Python
Explore Repo

#11ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

41,784Python
Explore Repo

#12microsoft/Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

34,258Jupyter Notebook
Explore Repo

#13explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

33,343Python
Explore Repo

#14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

32,271
Explore Repo

#15eriklindernoren/ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

31,046Python
Explore Repo

#16Lightning-AI/pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

30,933Python
Explore Repo

#17AMAI-GmbH/AI-Expert-Roadmap

Roadmap to becoming an Artificial Intelligence Expert in 2022

30,813JavaScript
Explore Repo

#18donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

28,928Python
Explore Repo

#19eugeneyan/applied-ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

28,715
Explore Repo

#20academic/awesome-datascience

:memo: An awesome Data Science repository to learn and apply for real world problems.

28,642
Explore Repo

#21CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

28,435Jupyter Notebook
Explore Repo

#22d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

28,422Python
Explore Repo

#23fastai/fastbook

The fastai book, published as Jupyter Notebooks

24,742Jupyter Notebook
Explore Repo

#24marimo-team/marimo

A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.

19,718Python
Explore Repo

#25afshinea/stanford-cs-229-machine-learning

VIP cheatsheets for Stanford's CS 229 Machine Learning

19,291
Explore Repo

#26akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

17,387Python
Explore Repo

#27dair-ai/ML-YouTube-Courses

📺 Discover the latest machine learning / AI courses on YouTube.

17,120
Explore Repo

#28stefan-jansen/machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.

16,765Jupyter Notebook
Explore Repo

#29ipython/ipython

Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

16,686Python
Explore Repo

#30bharathgs/Awesome-pytorch-list

A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

16,419
Explore Repo

#31piskvorky/gensim

Topic Modelling for Humans

16,373Python
Explore Repo

#32treeverse/dvc

🦉 Data Versioning and ML Experiments

15,454Python
Explore Repo

#33dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

15,112Python
Explore Repo

#34HugoBlox/kit

⚡ The Open Research Copilot. Build high-perf Portfolios, Lab Sites & Docs in Markdown + Jupyter. 100% Data Control. 🦫 数据科学家的开源 Copilot。一键部署 👇

9,859HTML
Explore Repo

#35microsoft/computervision-recipes

Best Practices, code samples, and documentation for Computer Vision.

9,835Jupyter Notebook
Explore Repo

#36alexeygrigorev/data-science-interviews

Data science interview questions and answers

9,821HTML
Explore Repo

#37yzhao062/pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

9,748Python
Explore Repo

#38drivendataorg/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

9,737Python
Explore Repo

#39pycaret/pycaret

An open-source, low-code machine learning library in Python

9,712Jupyter Notebook
Explore Repo

#40sktime/sktime

A unified framework for machine learning with time series

9,635Python
Explore Repo

#41tflearn/tflearn

Deep learning library featuring a higher-level API for TensorFlow.

9,596Python
Explore Repo

#42skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).

9,582Python
Explore Repo

#43rapidsai/cudf

cuDF - GPU DataFrame Library

9,543C++
Explore Repo

#44goplus/xgo

XGo is a programming language that reads like plain English. But it's also incredibly powerful — it lets you leverage assets from C/C++, Go, Python, and JavaScript/TypeScript, creating a unified software engineering ecosystem. Our vision is to enable everyone to become a builder of the world.

9,407Go
Explore Repo

#45pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,256Python
Explore Repo

#46unit8co/darts

A python library for user-friendly forecasting and anomaly detection on time series.

9,255Python
Explore Repo

#47blue-yonder/tsfresh

Automatic extraction of relevant features from time series:

9,146Jupyter Notebook
Explore Repo

#48activeloopai/deeplake

the GPU-native, sandboxed Postgres for AI agents

9,037C++
Explore Repo

#49catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

8,845C++
Explore Repo

#50lazyprogrammer/machine_learning_examples

A collection of machine learning examples and tutorials.

8,841Python
Explore Repo