allenai / open-instruct

AllenAI's post-training codebase

3,643 stars

515 forks

77 issues

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing allenai/open-instruct in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/allenai/open-instruct)

Preview:

Repository Overview (README excerpt)

Crawler view

Training Open Instruction-Following Language Models This repo serves as an open effort on instruction-tuning and post-training popular pretrained language models on publicly available datasets. We release this repo and will keep updating it with: • Code for finetuning language models with latest techniques and instruction datasets in a unified format. • Code for DPO, preference finetuning and reinforcement learning with verifiable rewards (RLVR). • Checkpoints or other useful artifacts that we build in our exploration. We also support some evaluations natively in the codebase, but these are now unmaintained and instead we suggest using OLMES, which we used for TÜLU 3. The latest details on open post-training are found in TÜLU 3: Pushing Frontiers in Open Language Model Post-Training. Please see our first paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources for more thoughts behind this project and our initial findings. Please see our second paper Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 for results using Llama-2 models and direct preference optimization. We are still working on more models. For more recent results involving PPO and DPO please see our third paper Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback. Try some of the models we train with Open Instruct. There is a free demo or download them from HuggingFace: | **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** | **OLMo-2 7B** | **OLMo-2 13B** | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | **Base Model** | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | allenai/OLMo2-7B-1124 | allenai/OLMo-2-13B-1124 | | **SFT** | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT | | **DPO** | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO | | **Final Models (RLVR)** | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct | | **Reward Model (RM)**| allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | allenai/OLMo-2-1124-7B-RM | (Same as 7B) | News • [2024-11-22] We released TÜLU 3: Pushing Frontiers in Open Language Model Post-Training and updated our entire stack of open post-training recipes with both Llama 3.1 and OLMo 2. • [2024-07-01] We released Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback and have majorly updated our codebase to support new models and package versions. • [2023-11-27] We released Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2. Check out our models here. We have added a DPO finetuning script for replicating our results. • [2023-09-26] We switched to use the official alpaca-eval library to run AlpacaFarm evaluation but use regenerated longer reference outputs. This will change our numbers reported in the paper. We will update the paper soon. • [2023-09-25] Supported using vLLM for our evaluations, which speeds up the evaluation by 10x. • [2023-09-17] Supported LoRA and QLoRA finetuning. See here for more details. • [2023-08-18] Added support for ToxiGen/TruthfulQA evaluation. Check our for examples of running them. • [2023-08-08] Supported several new instruction dataset, including LIMA / WizardLM / Open-Orca. See the preparation script for details. Performance hasn't been evaluated yet. • [2023-08-06] Supported LLaMa 2 finetuning and FlashAttention-2 by bumping the version of transformers and many other dependencies. • [2023-06-29] Added licensing info for our released models. • [2023-06-09] Released Tülu (a suite of LLaMa models fully-finetuned on a strong mix of datasets) and many other checkpoints on HuggingFace [[Links]](#released-checkpoints). • [2023-06-09] Initial release of the codebase containing the training and evaluation code for our arxiv paper. Setup Our setup follows our Dockerfile. *Note that Open Instruct is a research codebase and does not guarantee backward compatibility.* Installation with uv We use uv for installation and running code. You can install with . • **Docker installation**: You can also use the Dockerfile to build a Docker image. You can build the image with the following command: If you are internally at AI2, you may launch experiments using our always-up-to-date auto-built image . Training After having setup the environment, you are ready to launch some experiments. We provide a few examples below. To learn more about how to reproduce the Tulu 3 models, please refer to the Tulu 3 README. The instructions and documentations for Tulu 1 and Tulu 2 are in Tulu 1 and 2 README. Finetuning You can run the following command for getting started: **OLMo-core SFT**: For supported models (OLMo, OLMoE, Qwen3), we recommend the more GPU-efficient OLMo-core SFT implementation. See for the list of supported models. Preference Tuning Reinforcement Learning with Verifiable Rewards (RLVR) Contamination checks We release our scripts for measuring the overlap between instruction tuning datasets and evaluation datasets in . See the README for more details. Developing When submitting a PR to this repo, we check the core code in for style with the following: Run the tests with . Pre-commit hooks To automatically run linting and formatting on each commit: To run on all files (recommended after initial setup): Repo structure Licensing This codebase is licensed under Apache 2.0 as given in LICENSE. Th…