Best Open Source data analysis Libraries
A curated list of the most popular GitHub repositories tagged with data analysis. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
#2scikit-learn/scikit-learn
scikit-learn: machine learning in Python
#3pandas-dev/pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
#4sansan0/TrendRadar
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
#5metabase/metabase
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:
#6streamlit/streamlit
Streamlit — A faster way to build and share data apps.
#7gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
#8666ghj/BettaFish
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
#9gchq/CyberChef
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
#10microsoft/Data-Science-For-Beginners
10 Weeks, 20 Lessons, Data Science for All!
#11AMAI-GmbH/AI-Expert-Roadmap
Roadmap to becoming an Artificial Intelligence Expert in 2022
#12dataease/dataease
🔥 人人可用的开源 BI 工具,数据可视化神器。An open-source BI tool alternative to Tableau.
#13lukasmasuch/best-of-ml-python
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
#14sinaptik-ai/pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
#15airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
#16allinurl/goaccess
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
#17ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
#18tangyudi/Ai-Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
#19yzhao062/pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
#20rapidsai/cudf
cuDF - GPU DataFrame Library
#21K-Dense-AI/claude-scientific-skills
A set of ready to use scientific skills for Claude
#22flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
#23rhiever/Data-Analysis-and-Machine-Learning-Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.