Best Open Source data Libraries
A curated list of the most popular GitHub repositories tagged with data. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1Asabeneh/30-Days-Of-Python
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
#2TanStack/query
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
#3run-llama/llama_index
LlamaIndex is the leading document agent and OCR platform
#4metabase/metabase
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:
#5DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
#6SheetJS/sheetjs
📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
#7vercel/swr
React Hooks for Data Fetching
#8D4Vinci/Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
#9akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
#10fivethirtyeight/data
Data and code behind the articles and graphics at FiveThirtyEight
#11prestodb/presto
The official home of the Presto distributed SQL query engine for big data
#12faker-js/faker
Generate massive amounts of fake data in the browser and node.js
#13bchavez/Bogus
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
#14rawgraphs/rawgraphs-app
A web interface to create custom vector-based visualizations on top of RAWGraphs core
#15mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
#16cloudquery/cloudquery
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
#17ckan/ckan
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
#18tinyplex/tinybase
A reactive data store & sync engine.
#19speedyapply/2026-AI-College-Jobs
2026 AI/ML internship & new graduate job list updated daily
#20ArroyoSystems/arroyo
Distributed stream processing engine in Rust
#21lk-geimfari/mimesis
Mimesis is a fast Python library for generating fake data in multiple languages.
#22spiceai/spiceai
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
#23projectnessie/nessie
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
#24Canner/wren-engine
The open context engine for AI agents support 15+ data sources. Built on Rust and Apache DataFusion.
#25jser/jser.info
JSer.infoデータリポジトリ
#26glotzerlab/signac
Manage large and heterogeneous data spaces on the file system.