zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

3,257 stars

125 forks

29 issues

ZigStarlarkPython

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing zml/zml in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/zml/zml)

Preview:

Repository Overview (README excerpt)

Crawler view

Website | Getting Started | Documentation | Discord | Contributing [ZML]: https://zml.ai/ [Getting Started]: #getting-started [Documentation]: https://docs.zml.ai [Contributing]: ./CONTRIBUTING.md [Discord]: https://discord.gg/6y72SN2E7H Bonjour 👋 At ZML, we are creating exciting AI products on top of our high-performance AI inference stack. Our stack is built for production, using the amazing Zig language, MLIR, and the power of Bazel. Take me straight to getting started or give me a taste 🥐! ---   We're happy to share! We're very happy to share our inference stack with the World and hope it allows you, too, to build cool and exciting AI projects. To give you a glimpse of what you can do with ZML, here is an early demo: It shows a prototype running a LLama3 model sharded on 1 NVIDIA RTX 4090, 1 AMD 6800XT, and 1 Google Cloud TPU v2. All accelerators were hosted in different locations, with activations being passed over a VPN. All processes used the same model code, cross-compiled on a Mac, and copied onto the servers. For more inspiration, see also the examples below or check out the examples folder. Getting started Prerequisites We use to build ZML and its dependencies. The only prerequisite is , which we recommend to download through , a version manager for . **Please note: If you do not wish to install ** system-wide, we provide bazel.sh which downloads it to your home folder and runs it. **Install Bazel** (recommended): macOS Linux Run a pre-packaged model We have implemented a variety of example models in ZML. See our reference implementations in the examples folder. MNIST The classic handwritten digits recognition task. The model is tasked to recognize a handwritten digit, which has been converted to a 28x28 pixel monochrome image. will download a pre-trained model, and the test dataset. The program will load the model, compile it, and classify a randomly picked example from the test dataset. On the command line: Meta Llama 3.1 8B This model has restrictions, see here. It **requires approval from Meta on Huggingface**, which can take a few hours to get granted. Once you've been granted access, you're ready to download a gated model like ! First, you need to download the model using the huggingface-cli. Note you don't need to install it yourself, you can just use the packaged version . Then, you can run the model. You can also try if you have enough memory. Meta Llama 3.2 1B Like the 8B model above, this model also requires approval. See here for access requirements. For a larger 3.2 model, you can also try . Running Models on GPU / TPU You can compile models for accelerator runtimes by appending one or more of the following arguments to the command line when compiling / running a model: • NVIDIA CUDA: • AMD RoCM: • Google TPU: • AWS Trainium/Inferentia 2: • **AVOID CPU:** The latter, avoiding compilation for CPU, cuts down compilation time. So, to run the Llama 3.1 8B model from above on your host sporting an NVIDIA GPU, run the following: Run Tests A taste of ZML MNIST Tagged Tensors Where to go next: You might want to check out more examples, read through the documentation directly on GitHub, or, for the full rendering experience, browse the online documentation with included API reference. Contributing See [here][Contributing]. License ZML is licensed under the Apache 2.0 license. Thanks to our contributors