AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing ROCm/aiter in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewaiter AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework. Some summary of the features: • C++ level API • Python level API • The underneath kernel could come from triton/ck/asm • Not just inference kernels, but also training kernels and GEMM+communication kernels—allowing for workarounds in any kernel-framework combination for any architecture limitation. Installation If you happen to forget the during , you can use the following command after FlyDSL (Optional) AITER's FusedMoE supports FlyDSL-based kernels for mixed-precision MOE (e.g., A4W4). FlyDSL is optional — when not installed, AITER automatically falls back to CK kernels. Or install all optional dependencies at once: Triton-based Communication (Iris) AITER supports GPU-initiated communication using the Iris library. This enables high-performance Triton-based communication primitives like reduce-scatter and all-gather. **Installation** Install with Triton communication support: For more details, see docs/triton_comms.md. Run operators supported by aiter There are number of op test, you can run them with: | **Ops** | **Description** | |-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------| |ELEMENT WISE | ops: + - * / | |SIGMOID | (x) = 1 / (1 + e^-x) | |AllREDUCE | Reduce + Broadcast | |KVCACHE | W_K W_V | |MHA | Multi-Head Attention | |MLA | Multi-head Latent Attention with KV-Cache layout | |PA | Paged Attention | |FusedMoe | Mixture of Experts | |QUANT | BF16/FP16 -> FP8/INT4 | |RMSNORM | root mean square | |LAYERNORM | x = (x - u) / (σ2 + ϵ) e*0.5 | |ROPE | Rotary Position Embedding | |GEMM | D=αAβB+C |