open-edge-platform / datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

662 stars

158 forks

28 issues

PythonRustC++

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing open-edge-platform/datumaro in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/open-edge-platform/datumaro)

Preview:

Repository Overview (README excerpt)

Crawler view

Dataset Management Framework (Datumaro) A framework and CLI tool to build, transform, and analyze datasets. • Getting started • Level Up • Features • User manual • Developer manual • Contributing Features (Back to top) • Dataset reading, writing, conversion in any direction. • CIFAR-10/100 ( ) • Cityscapes • COCO ( , , , , , , ) • CVAT • ImageNet • Kitti ( , , / ) • LabelMe • LFW ( , , ) • MNIST ( ) • Open Images • PASCAL VOC ( , , , , ) • TF Detection API ( , ) • YOLO ( ) Other formats and documentation for them can be found here. • Dataset building • Merging multiple datasets into one • Dataset filtering by a custom criteria: • remove polygons of a certain class • remove images without annotations of a specific class • remove annotations from images • keep only vertically-oriented images • remove small area bounding boxes from annotations • Annotation conversions, for instance: • polygons to instance masks and vice-versa • apply a custom colormap for mask annotations • rename or remove dataset labels • Splitting a dataset into multiple subsets like , , and : • random split • task-specific splits based on annotations, which keep initial label and attribute distributions • for classification task, based on labels • for detection task, based on bboxes • for re-identification task, based on labels, avoiding having same IDs in training and test splits • Dataset quality checking • Simple checking for errors • Comparison with model inference • Merging and comparison of multiple datasets • Annotation validation based on the task type(classification, etc) • Dataset comparison • Dataset statistics (image mean and std, annotation statistics) > Check > the design document > for a full list of features. > Check > the user manual > for usage instructions. Contributing (Back to top) Feel free to open an Issue, if you think something needs to be changed. You are welcome to participate in development, instructions are available in our contribution guide.