state-spaces / mamba
Mamba SSM architecture
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing state-spaces/mamba in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewMamba > **Mamba: Linear-Time Sequence Modeling with Selective State Spaces**\ > Albert Gu*, Tri Dao*\ > Paper: https://arxiv.org/abs/2312.00752 > **Transformers are SSMs: Generalized Models and Efficient Algorithms**\ > **Through Structured State Space Duality**\ > Tri Dao*, Albert Gu*\ > Paper: https://arxiv.org/abs/2405.21060 About Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention. Installation • [Option] : an efficient implementation of a simple causal Conv1d layer used inside the Mamba block. • : the core Mamba package. • : To install core Mamba package and causal-conv1d. • : To install core Mamba package and dev depdencies. It can also be built from source with from this repository. Try passing to if installation encounters difficulties either when building from source or installing from PyPi. Common complaints that can be resolved in this way include PyTorch versions, but other cases exist as well. Other requirements: • Linux • NVIDIA GPU • PyTorch 1.12+ • CUDA 11.6+ For AMD cards, see additional prerequisites below. Usage We expose several levels of interface with the Mamba model. Selective SSM Mamba is based on a selective SSM layer, which is the focus of the paper (Section 3; Algorithm 2). Source: ops/selective_scan_interface.py. Mamba Block The main module of this repository is the Mamba architecture block wrapping the selective SSM. Source: modules/mamba_simple.py. Usage: Mamba-2 The Mamba-2 block is implemented at modules/mamba2.py. A simpler version is at modules/mamba2_simple.py The usage is similar to Mamba(-1): SSD A minimal version of the inner SSD module (Listing 1 from the Mamba-2 paper) with conversion between "discrete" and "continuous" SSM versions is at modules/ssd_minimal.py. Mamba Language Model Finally, we provide an example of a complete language model: a deep sequence model backbone (with repeating Mamba blocks) + language model head. Source: models/mixer_seq_simple.py. This is an example of how to integrate Mamba into an end-to-end neural network. This example is used in the generation scripts below. Pretrained Models Pretrained models are uploaded to Hugging Face: , , , , , , , , , , , , trained on 300B tokens on the Pile, as well as (trained on 600B tokens on the SlimPajama dataset). The models will be autodownloaded by the generation script below. These models were trained on the Pile, and follow the standard model dimensions described by GPT-3 and followed by many open source models: | Parameters | Layers | Model dim. | |------------|--------|------------| | 130M | 24 | 768 | | 370M | 48 | 1024 | | 790M | 48 | 1536 | | 1.4B | 48 | 2048 | | 2.8B | 64 | 2560 | (The layer count of Mamba doubles that of a Transformer with similar size, as two Mamba blocks are needed for each "layer" (MHA block + MLP block) of a Transformer.) Note: these are base models trained only for 300B tokens, without any form of downstream modification (instruction tuning, etc.). Performance is expected to be comparable or better than other architectures trained on similar data, but not to match larger or fine-tuned models. Evaluations To run zero-shot evaluations of models (corresponding to Table 3 of the paper), we use the lm-evaluation-harness library. • Install by . • Run evaluation with (more documentation at the lm-evaluation-harness repo): To reproduce the results on the model reported in the blogposts: To run evaluations on Mamba-2 models, simply replace the model names: Note that the result of each task might differ from reported values by 0.1-0.3 due to noise in the evaluation process. Inference The script benchmarks/benchmark_generation_mamba_simple.py • autoloads a model from the Hugging Face Hub, • generates completions of a user-specified prompt, • benchmarks the inference speed of this generation. Other configurable options include the top-p (nucleus sampling) probability, and the softmax temperature. Examples To test generation latency (e.g. batch size = 1) with different sampling strategies: To test generation throughput with random prompts (e.g. large batch size): With Mamba-2, you just need to change the model name: Troubleshooting Precision Our models were trained using PyTorch AMP for mixed precision. AMP keeps model parameters in float32 and casts to half precision when necessary. On the other hand, other frameworks like DeepSpeed store parameters in float16 and upcasts when necessary (e.g. for optimizer accumulation). We've observed that higher precision for the main model parameters may be necessary, because SSMs are sensitive to their recurrent dynamics. If you are experiencing instabilities, as a first step please try a framework storing parameters in fp32 (such as AMP). Initialization Some parts of the model have initializations inherited from prior work on S4 models. For example, the $\Delta$ parameter has a targeted range by initializing the bias of its linear projection. However, some frameworks may have post-initialization hooks (e.g. setting all bias terms in modules to zero). If this is the case, you may have to add custom logic (e.g. this line turns off re-initializing in our trainer, but would be a no-op in any other framework) that is specific to the training framework. Additional Prerequisites for AMD cards Patching ROCm If you are on ROCm 6.0, run the following steps to avoid errors during compilation. This is not required for ROCm 6.1 onwards. • Locate your ROCm installation directory. This is typically found at , but may vary depending on your installation. • Apply the Patch. Run with in case you encounter permission issues. Citation If you use this codebase, or otherwise find our work valuabl…