back to home

leandromoreira / digital_video_introduction

A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: 🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸

16,187 stars
1,384 forks
19 issues
Jupyter NotebookShell

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing leandromoreira/digital_video_introduction in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/leandromoreira/digital_video_introduction)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸 Intro A gentle introduction to video technology, although it's aimed at software developers / engineers, we want to make it easy **for anyone to learn**. This idea was born during a mini workshop for newcomers to video technology. The goal is to introduce some digital video concepts with a **simple vocabulary, lots of visual elements and practical examples** when possible, and make this knowledge available everywhere. Please, feel free to send corrections, suggestions and improve it. There will be **hands-on** sections which require you to have **docker installed** and this repository cloned. > **WARNING**: when you see a or command, it means we're running a **containerized version** of that program, which already includes all the needed requirements. All the **hands-on should be performed from the folder you cloned** this repository. For the **jupyter examples** you must start the server and copy the URL and use it in your browser. Changelog • added DRM system • released version 1.0.0 • added simplified Chinese translation • added FFmpeg oscilloscope filter example • added Brazilian Portuguese translation • added Spanish translation Index • Intro • Index • Basic terminology • Other ways to encode a color image • Hands-on: play around with image and color • DVD is DAR 4:3 • Hands-on: Check video properties • Redundancy removal • Colors, Luminance and our eyes • Color model • Converting between YCbCr and RGB • Chroma subsampling • Hands-on: Check YCbCr histogram • Frame types • I Frame (intra, keyframe) • P Frame (predicted) • Hands-on: A video with a single I-frame • B Frame (bi-predictive) • Hands-on: Compare videos with B-frame • Summary • Temporal redundancy (inter prediction) • Hands-on: See the motion vectors • Spatial redundancy (intra prediction) • Hands-on: Check intra predictions • How does a video codec work? • What? Why? How? • History • The birth of AV1 • A generic codec • 1st step - picture partitioning • Hands-on: Check partitions • 2nd step - predictions • 3rd step - transform • Hands-on: throwing away different coefficients • 4th step - quantization • Hands-on: quantization • 5th step - entropy coding • VLC coding • Arithmetic coding • Hands-on: CABAC vs CAVLC • 6th step - bitstream format • H.264 bitstream • Hands-on: Inspect the H.264 bitstream • Review • How does H.265 achieve a better compression ratio than H.264? • Online streaming • General architecture • Progressive download and adaptive streaming • Content protection • How to use jupyter • Conferences • References Basic terminology An **image** can be thought of as a **2D matrix**. If we think about **colors**, we can extrapolate this idea seeing this image as a **3D matrix** where the **additional dimensions** are used to provide **color data**. If we chose to represent these colors using the primary colors (red, green and blue), we define three planes: the first one for **red**, the second for **green**, and the last one for the **blue** color. We'll call each point in this matrix **a pixel** (picture element). One pixel represents the **intensity** (usually a numeric value) of a given color. For example, a **red pixel** means 0 of green, 0 of blue and maximum of red. The **pink color pixel** can be formed with a combination of the three colors. Using a representative numeric range from 0 to 255, the pink pixel is defined by **Red=255, Green=192 and Blue=203**. > #### Other ways to encode a color image > Many other possible models may be used to represent the colors that make up an image. We could, for instance, use an indexed palette where we'd only need a single byte to represent each pixel instead of the 3 needed when using the RGB model. In such a model we could use a 2D matrix instead of a 3D matrix to represent our color, this would save on memory but yield fewer color options. > > For instance, look at the picture down below. The first face is fully colored. The others are the red, green, and blue planes (shown as gray tones). We can see that the **red color** will be the one that **contributes more** (the brightest parts in the second face) to the final color while the **blue color** contribution can be mostly **only seen in Mario's eyes** (last face) and part of his clothes, see how **all planes contribute less** (darkest parts) to the **Mario's mustache**. And each color intensity requires a certain amount of bits, this quantity is known as **bit depth**. Let's say we spend **8 bits** (accepting values from 0 to 255) per color (plane), therefore we have a **color depth** of **24 bits** (8 bits * 3 planes R/G/B), and we can also infer that we could use 2 to the power of 24 different colors. > **It's great** to learn how an image is captured from the world to the bits. Another property of an image is the **resolution**, which is the number of pixels in one dimension. It is often presented as width × height, for example, the **4×4** image below. > #### Hands-on: play around with image and color > You can play around with image and colors using jupyter (python, numpy, matplotlib and etc). > > You can also learn how image filters (edge detection, sharpen, blur...) work. Another property we can see while working with images or video is the **aspect ratio** which simply describes the proportional relationship between width and height of an image or pixel. When people says this movie or picture is **16x9** they usually are referring to the **Display Aspect Ratio (DAR)**, however we also can have different shapes of individual pixels, we call this **Pixel Aspect Ratio (PAR)**. > #### DVD is DAR 4:3 > Although the real resolution of a DVD is 704x480 it still keeps a 4:3 aspect ratio because it has a PAR of 10:11 (704x10/480x11) Finally, we can define a **video** as a **succession of *n* frames** in **time** which can be seen as another dimension, *n* is the frame rate or frames per second (FPS). The number of bits per second needed to show a video i…