back to home
Best Open Source dataset Libraries
A curated list of the most popular GitHub repositories tagged with dataset. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1public-apis/public-apis
A collective list of free APIs
399,378Python
Analyze Code
#2HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
26,476TypeScript
Analyze Code
#3joke2k/faker
Faker is a Python package that generates fake data for you.
19,193Python
Analyze Code
#4ConardLi/easy-dataset
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
13,392JavaScript
Analyze Code
#5zalandoresearch/fashion-mnist
A MNIST-like fashion product database. Benchmark :point_down:
12,650Python
Analyze Code
#6brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
9,854
Analyze Code
#7NirantK/awesome-project-ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
8,942
Analyze Code
#8googlecreativelab/quickdraw-dataset
Documentation on how to access and use the Quick, Draw! Dataset.
6,658
Analyze Code