Best Open Source data Libraries
A curated list of the most popular GitHub repositories tagged with data. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1Asabeneh/30-Days-Of-Python
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
#2TanStack/query
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
#3run-llama/llama_index
LlamaIndex is the leading document agent and OCR platform
#4metabase/metabase
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:
#5DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
#6SheetJS/sheetjs
📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
#7vercel/swr
React Hooks for Data Fetching
#8D4Vinci/Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
#9akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
#10fivethirtyeight/data
Data and code behind the articles and graphics at FiveThirtyEight
#11prestodb/presto
The official home of the Presto distributed SQL query engine for big data
#12faker-js/faker
Generate massive amounts of fake data in the browser and node.js
#13bchavez/Bogus
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
#14rawgraphs/rawgraphs-app
A web interface to create custom vector-based visualizations on top of RAWGraphs core
#15mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
#16ckan/ckan
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
#17tinyplex/tinybase
A reactive data store & sync engine.
#18speedyapply/2026-AI-College-Jobs
2026 AI/ML internship & new graduate job list updated daily
#19ArroyoSystems/arroyo
Distributed stream processing engine in Rust
#20lk-geimfari/mimesis
Mimesis is a fast Python library for generating fake data in multiple languages.
#21rilldata/rill
The fastest business intelligence tool for humans and agents.
#22odota/core
OpenDota: Open source Dota 2 data platform with automated replay parsing
#23enviodev/hyperindex
🚢 Ultra-Fast Multichain Indexer
#24easystats/datawizard
Magic potions to clean and transform your data 🧙
#25conveyordata/data-product-portal
Data Product Portal created by Dataminded