Best Open Source data science Libraries
A curated list of the most popular GitHub repositories tagged with data science. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1microsoft/ML-For-Beginners
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
#2apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
#3scikit-learn/scikit-learn
scikit-learn: machine learning in Python
#4keras-team/keras
Deep Learning for humans
#5Asabeneh/30-Days-Of-Python
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
#6pandas-dev/pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
#7GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
#8apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
#9streamlit/streamlit
Streamlit — A faster way to build and share data apps.
#10gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
#11ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
#12microsoft/Data-Science-For-Beginners
10 Weeks, 20 Lessons, Data Science for All!
#13explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
#14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
#15eriklindernoren/ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
#16Lightning-AI/pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
#17AMAI-GmbH/AI-Expert-Roadmap
Roadmap to becoming an Artificial Intelligence Expert in 2022
#18donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
#19eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
#20CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
#21academic/awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
#22d2l-ai/d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
#23fastai/fastbook
The fastai book, published as Jupyter Notebooks
#24plotly/dash
Data Apps & Dashboards for Python. No JavaScript Required.
#25lukasmasuch/best-of-ml-python
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
#26sinaptik-ai/pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
#27matplotlib/matplotlib
matplotlib: plotting with Python
#28PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
#29recommenders-team/recommenders
Best Practices on Recommendation Systems
#30afshinea/stanford-cs-229-machine-learning
VIP cheatsheets for Stanford's CS 229 Machine Learning
#31marimo-team/marimo
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
#32dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
#33microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
#34virgili0/Virgilio
Your new Mentor for Data Science E-Learning.
#35oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
#36mwaskom/seaborn
Statistical data visualization in Python
#37visenger/awesome-mlops
A curated list of references for MLOps
#38ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
#39jpmorganchase/python-training
Python training for business analysts and traders
#40tangyudi/Ai-Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
#41rasbt/python-machine-learning-book
The "Python Machine Learning (1st edition)" book code repository and info resource
#42trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
#43dair-ai/ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
#44chiphuyen/machine-learning-systems-design
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`
#45microsoft/computervision-recipes
Best Practices, code samples, and documentation for Computer Vision.
#46alexeygrigorev/data-science-interviews
Data science interview questions and answers
#47yzhao062/pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
#48pycaret/pycaret
An open-source, low-code machine learning library in Python
#49drivendataorg/cookiecutter-data-science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
#50tflearn/tflearn
Deep learning library featuring a higher-level API for TensorFlow.