Stability-AI / StableLM
StableLM: Stability AI Language Models
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing Stability-AI/StableLM in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewStableLM: Stability AI Language Models *“A Stochastic Parrot, flat design, vector art” — Stable Diffusion XL* This repository contains Stability AI's ongoing development of the StableLM series of language models and will be continuously updated with new checkpoints. The following provides an overview of all currently available models. More coming soon. News *September 29, 2023* • Released StableLM-3B-4E1T model under CC BY-SA-4.0. *August 5, 2023* • Released patched StableLM-Alpha v2 models with 3B and 7B parameters. *April 28, 2023* • Released StableVicuna-13B, our RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. Delta weights over the original Llama model is released under (CC BY-NC-SA-4.0). *April 20, 2023* • Released initial set of StableLM-Alpha models, with 3B and 7B parameters. Base models are released under CC BY-SA-4.0. • Try to chat with our 7B model, , on Hugging Face Spaces. Models StableLM-3B-4E1T > Technical Report: StableLM-3B-4E1T StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. Given prior success in this area (Tay et al., 2023 and Taylor et al., 2022), we train on 1 trillion (1T) tokens for 4 epochs following the observations of Muennighoff et al. (2023) in "Scaling Data-Constrained Language Models" in which they find "training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data." Further inspiration for the token count is taken from "Go smol or go home" (De Vries, 2023), which suggests a 2.96B model trained for 2.85 trillion tokens achieves a similar loss to a Chinchilla compute-optimal 9.87B language model ($k_n = 0.3$). | Size | StableLM-3B-4E1T | Training Tokens | Parameters | |------|--------------------------------------------------------------------|-----------------|---------------| | 3B | checkpoint | 4T | 2,795,443,200 | Model Architecture The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications: | Parameters | Hidden Size | Layers | Heads | Sequence Length | |----------------|-------------|--------|-------|-----------------| | 2,795,443,200 | 2560 | 32 | 32 | 4096 | • **Position Embeddings**: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022). • **Normalization**: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019). • **Tokenizer**: GPT-NeoX (Black et al., 2022). Training Data The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), and RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without *Books3* and other subsets, and StarCoder (Li et al., 2023). > Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks. Training Details Please refer to the provided YAML configuration file for complete hyperparameter settings and the technical report for further details. Downstream Results The following zero-shot evaluations are performed with the using the lm-bench branch of Stability AI's fork. Full JSONs can be found in the directory. | Pre-Trained Model | Average | ARC Challenge | ARC Easy | BoolQ | HellaSwag (✱) | LAMBADA OpenAI | OpenBookQA | PIQA | SciQ | Winogrande | | ------------------------------------------------------------------------------------- |:-----------------:|:----------------:|:-----------:|:-----:|:-------------:|:-----------------:|:----------:|:-----:|:-----:|:----------:| | meta-llama/Llama-2-13b-hf | 71.77 | 48.63 | 79.50 | 80.52 | 79.36 | 76.77 | 35.40 | 79.05 | 94.50 | 72.22 | | huggyllama/llama-7b | 68.84 | 41.89 | 75.25 | 75.05 | 76.22 | 73.55 | 34.40 | 78.67 | 94.60 | 69.93 | | meta-llama/Llama-2-7b-hf | 68.75 | 43.00 | 76.26 | 77.74 | 75.94 | 73.47 | 31.40 | 77.75 | 93.60 | 69.61 | | Qwen/Qwen-7B | 67.91 | 45.39 | 67.38 | 74.56 | 88.85 (?) | 69.67 | 32.20 | 73.99 | 93.20 | 65.98 | | tiiuae/falcon-7b | 67.83 | 40.27 | 74.41 | 73.55 | 76.35 | 74.56 | 30.60 | 79.49 | 94.00 | 67.25 | | mosaicml/mpt-7b | 67.36 | 40.53 | 74.92 | 73.94 | 76.17 | 68.64 | 31.40 | 78.89 | 93.70 | 68.03 | | **stabilityai/stablelm-3b-4e1t** | 66.93 | 37.80 | 72.47 | 75.63 | 73.90 | 70.64 | 31.40 | 79.22 | 94.80 | 66.54 | | baichuan-inc/Baichuan2-7B-Base | 66.93 | 42.24 | 75.00 | 73.09 | 72.29 | 70.99 | 30.40 | 76.17 | 94.60 | 67.56 | | stabilityai/stablelm-base-alpha-7b-v2 | 66.89 | 38.48 | 73.19 | 70.31 | 74.27 | 74.19 | 30.40 | 78.45 | 93.90 | 68.82 | | openlm-research/open_llama_7b_v2 | 66.32 | 38.82 | 71.93 | 71.41 | 74.65 | 71.05 | 30.20 | 79.16 | 93.80 | 65.82 | | microsoft/phi-1_5 | 65.57 | 44.45 | 76.14 | 74.53 | 62.62 | 52.75 | 37.60 | 76.33 | 93.20 | 72.53 | | EleutherAI/gpt-neox-20B | 65.57 | 37.88 | 72.90 | 69.48 | 71.43 | 71.98 | 29.80 | 77.42 | 93.10 | 66.14 | | togethercomputer/RedPajama-INCITE-7B-Base | 65.07 | 37.71 | 72.35 | 70.76 | 70.33 | 71.34 | 29.00 | 77.15 | 92.70 | 64.33 | | cerebras/btlm-3b-8k-base (§) | 63.59 | 34.90 | 70.45 | 69.63 | 69.78 | 66.23 | 27.60 | 75.84 | 92.90 | 64.96 | | EleutherAI/pythia-12b | 62.69 | 31.83 | 70.20 | 67.31 | 67.38 | 70.64 | 26.40 | 76.28 | 90.20 | 64.01 | | openlm-research/open_llama_3b_v2 | 62.43 | 33.87 | 67.59 | 65.69 | 69.99 | 66.74 | 26.00 | 76.66 | 92.40 | 62.90 | | EleutherAI/gpt-j-6B | 62.34 | 33.96 | 66.96 | 65.44 | 66.24 | 68.23 | 29.00 | 75.57 | 91.50 | 64.17 | | stabilityai/stablelm-base-alpha-3b-v2 | 62.19 | 32.42 | 67.26 | 64.56 | 68.58 | 70.25 | 26.40 | 76.01 | 92.10 | 62.12 | | facebook/opt-6.7b | 61.85 | 30.72 | 65.66 | 66.02 | 67.20 | 67.65 | 27.60 | 76.33 | 90.10 | 65.35 | | EleutherAI/pythia-6.9b | 60.58 | 31.83 | 67.21 | 64.01 | 63.88 | 67.01 | 25.80 | 75.08 |…