back to home

Best Open Source data mining Libraries

A curated list of the most popular GitHub repositories tagged with data mining. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1eriklindernoren/ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

30,861Python
Analyze Code

#2JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

28,981Python
Analyze Code

#3academic/awesome-datascience

:memo: An awesome Data Science repository to learn and apply for real world problems.

28,413
Analyze Code

#4EthicalML/awesome-production-machine-learning

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

20,160
Analyze Code

#5microsoft/LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

18,095C++
Analyze Code

#6tangyudi/Ai-Learn

人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

12,642
Analyze Code

#7rasbt/python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

12,590Jupyter Notebook
Analyze Code

#8yzhao062/pyod

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

9,722Python
Analyze Code

#9sktime/sktime

A unified framework for machine learning with time series

9,528Python
Analyze Code

#10yzhao062/anomaly-detection-resources

Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!

9,173Python
Analyze Code

#11catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

8,811C++
Analyze Code