Best Open Source data science Libraries
A curated list of the most popular GitHub repositories tagged with data science. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1microsoft/ML-For-Beginners
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
#2apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
#3scikit-learn/scikit-learn
scikit-learn: machine learning in Python
#4keras-team/keras
Deep Learning for humans
#5Asabeneh/30-Days-Of-Python
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
#6pandas-dev/pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
#7GokuMohandas/Made-With-ML
Learn how to develop, deploy and iterate on production-grade ML applications.
#8apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
#9streamlit/streamlit
Streamlit — A faster way to build and share data apps.
#10gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
#11ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
#12microsoft/Data-Science-For-Beginners
10 Weeks, 20 Lessons, Data Science for All!
#13explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
#14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
#15eriklindernoren/ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
#16Lightning-AI/pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
#17AMAI-GmbH/AI-Expert-Roadmap
Roadmap to becoming an Artificial Intelligence Expert in 2022
#18donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
#19eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
#20academic/awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
#21CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
#22d2l-ai/d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
#23fastai/fastbook
The fastai book, published as Jupyter Notebooks
#24marimo-team/marimo
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
#25afshinea/stanford-cs-229-machine-learning
VIP cheatsheets for Stanford's CS 229 Machine Learning
#26akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
#27dair-ai/ML-YouTube-Courses
📺 Discover the latest machine learning / AI courses on YouTube.
#28stefan-jansen/machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
#29ipython/ipython
Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.
#30bharathgs/Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
#31piskvorky/gensim
Topic Modelling for Humans
#32treeverse/dvc
🦉 Data Versioning and ML Experiments
#33dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
#34HugoBlox/kit
⚡ The Open Research Copilot. Build high-perf Portfolios, Lab Sites & Docs in Markdown + Jupyter. 100% Data Control. 🦫 数据科学家的开源 Copilot。一键部署 👇
#35microsoft/computervision-recipes
Best Practices, code samples, and documentation for Computer Vision.
#36alexeygrigorev/data-science-interviews
Data science interview questions and answers
#37yzhao062/pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
#38drivendataorg/cookiecutter-data-science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
#39pycaret/pycaret
An open-source, low-code machine learning library in Python
#40sktime/sktime
A unified framework for machine learning with time series
#41tflearn/tflearn
Deep learning library featuring a higher-level API for TensorFlow.
#42skypilot-org/skypilot
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
#43rapidsai/cudf
cuDF - GPU DataFrame Library
#44goplus/xgo
XGo is a programming language that reads like plain English. But it's also incredibly powerful — it lets you leverage assets from C/C++, Go, Python, and JavaScript/TypeScript, creating a unified software engineering ecosystem. Our vision is to enable everyone to become a builder of the world.
#45pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
#46unit8co/darts
A python library for user-friendly forecasting and anomaly detection on time series.
#47blue-yonder/tsfresh
Automatic extraction of relevant features from time series:
#48activeloopai/deeplake
the GPU-native, sandboxed Postgres for AI agents
#49catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
#50lazyprogrammer/machine_learning_examples
A collection of machine learning examples and tutorials.