docker / model-runner
Docker Model Runner
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing docker/model-runner in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewDocker Model Runner Docker Model Runner (DMR) makes it easy to manage, run, and deploy AI models using Docker. Designed for developers, Docker Model Runner streamlines the process of pulling, running, and serving large language models (LLMs) and other AI models directly from Docker Hub or any OCI-compliant registry. Overview This package supports the Docker Model Runner in Docker Desktop and Docker Engine. Installation Docker Desktop (macOS and Windows) For macOS and Windows, install Docker Desktop: https://docs.docker.com/desktop/ Docker Model Runner is included in Docker Desktop. Docker Engine (Linux) For Linux, install Docker Engine from the official Docker repository: Docker Model Runner is included in Docker Engine when installed from Docker's official repositories. Verifying Your Installation To verify that Docker Model Runner is available: If is not available, see the troubleshooting section below. Troubleshooting: Docker Installation Source If you encounter errors like or command is not found: • **Check your Docker installation source:** Look for the source in the output. If it shows a package from your distro, you'll need to reinstall from Docker's official repositories. • **Remove the distro version and install from Docker's official repository:** • **For NVIDIA DGX systems:** If Docker came pre-installed, verify it's from Docker's official repositories. If not, follow the reinstallation steps above. For more details refer to: https://docs.docker.com/ai/model-runner/get-started/ Prerequisites Before building from source, ensure you have the following installed: • **Go 1.25+** - Required for building both model-runner and model-cli • **Git** - For cloning repositories • **Make** - For using the provided Makefiles • **Docker** (optional) - For building and running containerized versions • **CGO dependencies** - Required for model-runner's GPU support: • On macOS: Xcode Command Line Tools ( ) • On Linux: gcc/g++ and development headers • On Windows: MinGW-w64 or Visual Studio Build Tools Building the Complete Stack Step 1: Clone and Build model-runner (Server/Daemon) The binary will be created in the current directory. This is the backend server that manages models. Step 2: Build model-cli (Client) Testing the Complete Stack End-to-End > **Note:** We use port 13434 in these examples to avoid conflicts with Docker Desktop's built-in Model Runner, which typically runs on port 12434. Option 1: Local Development (Recommended for Contributors) • **Start model-runner in one terminal:** • **Use model-cli in another terminal:** Option 2: Using Docker • **Build and run model-runner in Docker:** • **Connect with model-cli:** Additional Resources • Model Runner Documentation • Model CLI README • Model Specification • Community Slack Channel Using the Makefile This project includes a Makefile to simplify common development tasks. Docker targets require Docker Desktop >= 4.41.0. Run for a full list, but the key targets are: • - Build the Go application • - Build the CLI ( plugin) • - Build and install the CLI as a Docker plugin • - Generate CLI documentation • - Run the application locally • - Clean build artifacts • - Run tests • - **Run all CI validations locally** (lint, test, shellcheck, go mod tidy) • - Run Go linting with golangci-lint • - Run shellcheck validation on shell scripts • - Run integration tests (requires Docker) • - Build the Docker image for current platform • - Run the application in a Docker container with TCP port access and mounted model storage • - Show all available targets and configuration options Running in Docker The application can be run in Docker with the following features enabled by default: • TCP port access (default port 8080) • Persistent model storage in a local directory This will: • Create a directory in your current working directory (or use the specified path) • Mount this directory into the container • Start the service on port 8080 (or the specified port) • All models downloaded will be stored in the host's directory and will persist between container runs llama.cpp integration The Docker image includes the llama.cpp server binary from the image. You can specify the version of the image to use by setting the variable. Additionally, you can configure the target OS, architecture, and acceleration type: Default values: • : latest • : cpu Available variants: • : CPU-optimized version • : CUDA-accelerated version for NVIDIA GPUs • : ROCm-accelerated version for AMD GPUs • : MUSA-accelerated version for MTHREADS GPUs • : CANN-accelerated version for Ascend NPUs The binary path in the image follows this pattern: vLLM integration The Docker image also supports vLLM as an alternative inference backend. Building the vLLM variant To build a Docker image with vLLM support: Build Arguments The vLLM variant supports the following build arguments: • **VLLM_VERSION**: The vLLM version to install (default: ) • **VLLM_CUDA_VERSION**: The CUDA version suffix for the wheel (default: ) • **VLLM_PYTHON_TAG**: The Python compatibility tag (default: , compatible with Python 3.8+) Multi-Architecture Support The vLLM variant supports both x86_64 (amd64) and aarch64 (arm64) architectures. The build process automatically selects the appropriate prebuilt wheel: • **linux/amd64**: Uses wheels • **linux/arm64**: Uses wheels To build for multiple architectures: Updating to a New vLLM Version To update to a new vLLM version: The vLLM wheels are sourced from the official vLLM GitHub Releases at , which provides prebuilt wheels for each release version. API Examples The Model Runner exposes a REST API that can be accessed via TCP port. You can interact with it using curl commands. Using the API When running with , you can use regular HTTP requests: The response will contain the model's reply: Features • **Automatic GPU Detection**: Automatically configures NVIDIA GPU support if available • **Persistent Caching**: Models are c…