back to home

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

5,644 stars
660 forks
237 issues
C++PythonCuda

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing NVIDIA/DALI in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/NVIDIA/DALI)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

|License| |Documentation| |Format| NVIDIA DALI =========== .. overview-begin-marker-do-not-remove The NVIDIA Data Loading Library (DALI) is a GPU-accelerated library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks. Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user. In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, and PaddlePaddle. .. image:: /dali.png :width: 800 :align: center :alt: DALI Diagram DALI in action: .. container:: dali-tabs **Pipeline mode:** .. code-block:: python from nvidia.dali.pipeline import pipeline_def import nvidia.dali.types as types import nvidia.dali.fn as fn from nvidia.dali.plugin.pytorch import DALIGenericIterator import os # To run with different data, see documentation of nvidia.dali.fn.readers.file # points to https://github.com/NVIDIA/DALI_extra data_root_dir = os.environ['DALI_EXTRA_PATH'] images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg') def loss_func(pred, y): pass def model(x): pass def backward(loss, model): pass @pipeline_def(num_threads=4, device_id=0) def get_dali_pipeline(): images, labels = fn.readers.file( file_root=images_dir, random_shuffle=True, name="Reader") # decode data on the GPU images = fn.decoders.image_random_crop( images, device="mixed", output_type=types.RGB) # the rest of processing happens on the GPU as well images = fn.resize(images, resize_x=256, resize_y=256) images = fn.crop_mirror_normalize( images, crop_h=224, crop_w=224, mean=[0.485 * 255, 0.456 * 255, 0.406 * 255], std=[0.229 * 255, 0.224 * 255, 0.225 * 255], mirror=fn.random.coin_flip()) return images, labels train_data = DALIGenericIterator( [get_dali_pipeline(batch_size=16)], ['data', 'label'], reader_name='Reader' ) for i, data in enumerate(train_data): x, y = data[0]['data'], data[0]['label'] pred = model(x) loss = loss_func(pred, y) backward(loss, model) **Dynamic mode:** .. code-block:: python import os import nvidia.dali.types as types import nvidia.dali.experimental.dynamic as ndd import torch # To run with different data, see documentation of ndd.readers.File # points to https://github.com/NVIDIA/DALI_extra data_root_dir = os.environ['DALI_EXTRA_PATH'] images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg') def loss_func(pred, y): pass def model(x): pass def backward(loss, model): pass reader = ndd.readers.File(file_root=images_dir, random_shuffle=True) for images, labels in reader.next_epoch(batch_size=16): images = ndd.decoders.image_random_crop(images, device="gpu", output_type=types.RGB) # the rest of processing happens on the GPU as well images = ndd.resize(images, resize_x=256, resize_y=256) images = ndd.crop_mirror_normalize( images, crop_h=224, crop_w=224, mean=[0.485 * 255, 0.456 * 255, 0.406 * 255], std=[0.229 * 255, 0.224 * 255, 0.225 * 255], mirror=ndd.random.coin_flip(), ) x = torch.as_tensor(images) y = torch.as_tensor(labels.gpu()) pred = model(x) loss = loss_func(pred, y) backward(loss, model) Highlights ---------- • Easy-to-use functional style Python API. • Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC. • Portable across popular deep learning frameworks: TensorFlow, PyTorch, PaddlePaddle, JAX. • Supports CPU and GPU execution. • Scalable across multiple GPUs. • Flexible graphs let developers create custom pipelines. • Extensible for user-specific needs with custom operators. • Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T). • Allows direct data path between storage and GPU memory with __. • Easy integration with __ with __. • Open source. .. overview-end-marker-do-not-remove ---- DALI success stories: --------------------- • __: __ • __ • __ • __ • __ ---- DALI Roadmap ------------ __ a high-level overview of our 2024 plan. You should be aware that this roadmap may change at any time and the order of its items does not reflect any type of priority. We strongly encourage you to comment on our roadmap and provide us feedback on the mentioned GitHub issue. ---- Installing DALI --------------- To install the latest DALI release for the latest CUDA version (12.x):: pip install nvidia-dali-cuda120 # or pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda120 DALI requires __ supporting the appropriate CUDA version. In case of DALI based on CUDA 12, it requires __ to be installed. DALI comes preinstalled in the __, __, and __ containers on __. For other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc), and specific requirements please refer to the __. To build DALI from source, please refer to the __. ---- Examples and Tutorials ---------------------- An introdu…