back to home

Best Open Source data Libraries

A curated list of the most popular GitHub repositories tagged with data. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1Asabeneh/30-Days-Of-Python

The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

59,617Python
Explore Repo

#2TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

48,845TypeScript
Explore Repo

#3run-llama/llama_index

LlamaIndex is the leading document agent and OCR platform

47,731Python
Explore Repo

#4metabase/metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

46,428Clojure
Explore Repo

#5DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

40,540Jupyter Notebook
Explore Repo

#6SheetJS/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

36,201
Explore Repo

#7vercel/swr

React Hooks for Data Fetching

32,338TypeScript
Explore Repo

#8D4Vinci/Scrapling

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

30,564Python
Explore Repo

#9akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

17,387Python
Explore Repo

#10fivethirtyeight/data

Data and code behind the articles and graphics at FiveThirtyEight

17,302Jupyter Notebook
Explore Repo

#11prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,668Java
Explore Repo

#12faker-js/faker

Generate massive amounts of fake data in the browser and node.js

14,979TypeScript
Explore Repo

#13bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

9,628C#
Explore Repo

#14rawgraphs/rawgraphs-app

A web interface to create custom vector-based visualizations on top of RAWGraphs core

8,939JavaScript
Explore Repo

#15mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,677Python
Explore Repo

#16cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

6,377Go
Explore Repo

#17ckan/ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

4,983Python
Explore Repo

#18tinyplex/tinybase

A reactive data store & sync engine.

4,962TypeScript
Explore Repo

#19speedyapply/2026-AI-College-Jobs

2026 AI/ML internship & new graduate job list updated daily

4,891
Explore Repo

#20ArroyoSystems/arroyo

Distributed stream processing engine in Rust

4,842Rust
Explore Repo

#21lk-geimfari/mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

4,798Python
Explore Repo

#22spiceai/spiceai

A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

2,879Rust
Explore Repo

#23projectnessie/nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

1,452Java
Explore Repo

#24Canner/wren-engine

The open context engine for AI agents support 15+ data sources. Built on Rust and Apache DataFusion.

650Java
Explore Repo

#25jser/jser.info

JSer.infoデータリポジトリ

159HTML
Explore Repo

#26glotzerlab/signac

Manage large and heterogeneous data spaces on the file system.

142Python
Explore Repo