back to home

lightly-ai / lightly-studio

Curate, Annotate, and Manage Your Data in LightlyStudio.

690 stars
16 forks
33 issues
PythonTypeScriptSvelte

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing lightly-ai/lightly-studio in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/lightly-ai/lightly-studio)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

Curate, Annotate, and Manage Your Data in LightlyStudio. --- Welcome to LightlyStudio! We at **Lightly** created **LightlyStudio**, an open-source tool designed to unify your data workflows from curation, annotation and management in a single tool. Since we're big fans of Rust we used it to speed things up. You can work with COCO and ImageNet on a Macbook Pro with M1 and 16GB of memory! Curate, Annotate, and Manage Your Data in LightlyStudio. > **Note:** LightlyStudio is pre-1.0. Expect occasional breaking changes as we iterate quickly. We’re targeting a v1.0 release in March 2026 and will keep changes documented in the changelog. πŸ’» Installation Runs on **Python 3.9 or higher** on Windows, Linux and MacOS. Supported features: | Feature / Task | Classification | Detection | Sem. Segmentation | Inst. Segmentation | Captions (img+text) | Video | Keypoints | 3D Point Clouds | Text | |----------------|:--------------:|:---------:|:---------------------:|:---------------------:|:-------------:|:-----:|:---------:|:---------------:|:---------:| | Visualisation | πŸ› οΈ | βœ… | πŸ› οΈ | βœ… | βœ… | βœ… | ❌ | πŸ› οΈ | πŸ› οΈ | | Filtering | πŸ› οΈ | βœ… | βœ… | πŸ› οΈ | βœ… | βœ… | ❌ | πŸ› οΈ | πŸ› οΈ | | Labeling | πŸ› οΈ | βœ… |πŸ› οΈ | πŸ› οΈ | βœ… | πŸ› οΈ | ❌ | ❌ | πŸ› οΈ | βœ… - supported πŸ› οΈ - support in progress (ETA ❌ - not yet supported πŸš€ Quickstart The examples below download the required example data the first time you run them. You can also directly use your own image, video, or YOLO/COCO dataset. Image Folder To run an example using an image-only dataset, create a file named with the following contents: Run the script with . Now you can inspect samples in the app. Notebook / Colab For Jupyter or Google Colab, you can run the same image folder flow inside a notebook cell and embed the UI. Jupyter: Colab: **Tagging by Folder Structure** When using , you can automatically assign tags based on your folder structure. The folder hierarchy is **relative to the argument** you provide. See our documentation for more information. Video Folder Create a file named with the following contents: Run the script with . Now you can inspect videos in the app. The same call also accepts cloud storage URLs such as after installing . YOLO Object Detection To run an object detection example using a YOLO dataset, create a file named : Run the script with . Now you can inspect samples with their assigned annotations in the app. COCO Instance Segmentation To run an instance segmentation example using a COCO dataset, create a file named : Run the script via . Now you can inspect samples with their assigned annotations in the app. COCO Captions To run a caption example using a COCO dataset, create a file named : Run the script with . Now you can inspect samples with their assigned captions in the app. 🐍 Python Interface LightlyStudio has a powerful Python interface. You can not only index datasets but also query and manipulate them using code. ☁️ Using Cloud Storage To load images or videos directly from a cloud storage provider (like AWS S3, GCS, etc.), you must first install the required dependencies: This installs the necessary libraries: s3fs (for S3), gcsfs (for GCS), and adlfs (for Azure). Our tool uses the fsspec library, which also supports other file systems. If you need a different provider (like FTP, SSH, etc.), you can find the required library in the fsspec documentation and install it manually (e.g., pip install sftpfs). **Current Support Limitations for Annotations (Labels):** Cloud-hosted annotations are currently supported for COCO object detection and instance segmentation; other dataset importers still expect local files. Dataset The dataset is the main entity of the python interface. It is used to setup the dataset, start the GUI, run queries and perform selections. It holds the connection to the database file. Reusing a dataset and appending data Datasets persist in a DuckDB file ( by default). All tags, annotations, captions, metadata, and embeddings are saved, so you can stop and resume anytime. Use to reopen existing datasets: **Notes:** β€’ The first time you run this script a new db is created and the data indexed β€’ If you add more images to the folder only the new data is indexed β€’ All annotations, tags, and metadata persist across sessions as long as the file in the working dir exists. Custom database path To use a different database file, initialize the database manager before creating datasets: Sample A sample is a single data instance, a dataset holds the reference to all samples. One can access samples individually and read or write on a samples attributes. Dataset Query Dataset queries are a combination of filtering, sorting and slicing operations. For this the **Expressions** are used. Selection LightlyStudio offers a premium feature to perform automated data selection. Contact us to get access to premium features. Selecting the right subset of your data can save labeling cost and training time while improving model quality. Selection in LightlyStudio automatically picks the most useful samples - those that are both representative (typical) and diverse (novel). You can mix and match these strategies to fit your goal: stable core data, edge cases, or fixing class imbalances. 🀝 Contribute We welcome contributions! Please check our issues page for current tasks and improvements, or propose new issues yourself. πŸ’¬ Contact