back to home

Best Open Source data Libraries

A curated list of the most popular GitHub repositories tagged with data. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1Asabeneh/30-Days-Of-Python

The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

59,617Python
Explore Repo

#2TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

48,845TypeScript
Explore Repo

#3run-llama/llama_index

LlamaIndex is the leading document agent and OCR platform

47,731Python
Explore Repo

#4metabase/metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

46,428Clojure
Explore Repo

#5DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

40,540Jupyter Notebook
Explore Repo

#6SheetJS/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

36,201
Explore Repo

#7vercel/swr

React Hooks for Data Fetching

32,338TypeScript
Explore Repo

#8D4Vinci/Scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

30,564Python
Explore Repo

#9akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

17,387Python
Explore Repo

#10fivethirtyeight/data

Data and code behind the articles and graphics at FiveThirtyEight

17,302Jupyter Notebook
Explore Repo

#11prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,668Java
Explore Repo

#12faker-js/faker

Generate massive amounts of fake data in the browser and node.js

14,979TypeScript
Explore Repo

#13bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

9,628C#
Explore Repo

#14rawgraphs/rawgraphs-app

A web interface to create custom vector-based visualizations on top of RAWGraphs core

8,939JavaScript
Explore Repo

#15mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,677Python
Explore Repo

#16ckan/ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

4,983Python
Explore Repo

#17tinyplex/tinybase

A reactive data store & sync engine.

4,962TypeScript
Explore Repo

#18speedyapply/2026-AI-College-Jobs

2026 AI/ML internship & new graduate job list updated daily

4,891
Explore Repo

#19ArroyoSystems/arroyo

Distributed stream processing engine in Rust

4,842Rust
Explore Repo

#20lk-geimfari/mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

4,798Python
Explore Repo

#21rilldata/rill

The fastest business intelligence tool for humans and agents.

2,636Go
Explore Repo

#22odota/core

OpenDota: Open source Dota 2 data platform with automated replay parsing

1,616TypeScript
Explore Repo

#23enviodev/hyperindex

🚢 Ultra-Fast Multichain Indexer

520ReScript
Explore Repo

#24easystats/datawizard

Magic potions to clean and transform your data 🧙

235R
Explore Repo

#25conveyordata/data-product-portal

Data Product Portal created by Dataminded

201Python
Explore Repo