greyhaven-ai / autocontext

a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task

View on GitHub

650 stars

44 forks

18 issues

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing greyhaven-ai/autocontext in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/greyhaven-ai/autocontext)

Preview:

Repository Overview (README excerpt)

Crawler view

closed-loop control plane for agent improvement autocontext is a closed-loop control plane for improving agent behavior over repeated runs. It executes tasks, evaluates outcomes, updates persistent knowledge, and can distill successful behavior into cheaper local runtimes. The goal is to move from frontier-model exploration toward validated, reusable, lower-cost execution. Why It Exists Most agent systems start every run cold. They do not reliably carry forward what worked, what failed, and what should change next. autocontext adds that missing feedback loop: • run the task • analyze what happened • persist validated lessons • use those lessons in the next run • optionally train and route to local models when the task is stable enough How It Works Each generation runs through a structured multi-agent loop: • proposes a strategy or artifact for the task • explains what happened and why • turns that analysis into playbook updates and future hints • proposes tools, harness improvements, or structural changes • gates what knowledge is allowed to persist Strategies are then evaluated through scenario execution, staged validation, and gating. Weak changes are rolled back. Successful changes accumulate into reusable knowledge. Choose An Entry Point • Want the full control plane, dashboard, scenario runner, and training loop? Start with the Python package in . • Want a lighter Node/TypeScript toolkit for judging outputs, running improvement loops, queueing work, or exposing MCP tools? Start with . • Want to wire another agent into autocontext? Start with the CLI-first guide in . • Want to contribute or point a coding agent at the repo? Read and . What's New • GEPA-inspired ASI/Pareto optimizer wired into improvement loop • Component sensitivity profiling and credit assignment • Pluggable scoring backends with Elo and Glicko support • Novelty exploration and multi-basin playbook branching • Cost-aware loop control and long-run presets Core Capabilities • Persistent playbooks, hints, tools, reports, and progress snapshots across runs • Staged validation, harness synthesis, and harness-aware execution • Scenario families for simulation, investigation, workflow, coordination, negotiation, artifact editing, operator-in-the-loop, tool-fragility, and schema-evolution tasks • Frontier-to-local distillation with MLX on Apple Silicon • Runtime routing across Anthropic, OpenAI-compatible backends, Ollama, vLLM, MLX, and Pi-based runtimes • OpenClaw-facing APIs and agent integration surfaces • CLI, API server, dashboard, and TypeScript/TUI surfaces for operators and external agents Quick Start From Source The Python application lives in , and most , , , and commands should be run from there. That creates a local run, writes artifacts under and , and works without external API keys. Run with Anthropic: Start the API server and dashboard: Then open . Use the repo-level as the reference for available settings. Installable Packages The repo publishes two installable packages with different scopes: • Python package: • TypeScript package: The Python package exposes the full control-plane CLI ( , , , , , , , and more). The TypeScript package exposes a narrower CLI focused on evaluation, improvement loops, queueing, and MCP serving for Node runtimes. Which Package Should You Use? | If you want to... | Start here | Why | |---|---|---| | Run the full multi-generation control plane | autocontext/README.md | Python has the dashboard, API server, training loop, scenario scaffolding, export/import, and full CLI surface. | | Embed judging or improvement loops in a Node app | ts/README.md | The TypeScript package is smaller and focused on judge-based workflows, queueing, and MCP serving. | | Point an external agent at autocontext | autocontext/docs/agent-integration.md | It documents the CLI-first contract, JSON output, MCP usage, and SDK options. | | Grab copy-paste integration snippets | examples/README.md | The examples cover Python CLI, Claude Code MCP, Python SDK, and TypeScript library usage. | | Catch up on recent repo evolution | CHANGELOG.md | It summarizes the release and current unreleased work. | Common Workflows • Run the generation loop: • Inspect runs: , • Scaffold a custom scenario: • Export training data: • Train a local model: • Start the API server: • Start the MCP server: • Wait on a monitor condition: remains a typed scenario family for capability discovery and experimentation, but autocontext does not scaffold executable operator-loop runtimes. Use datasets, tools, or live-agent experiments instead of harness-owned escalation scripts. MLX training is host-only on Apple Silicon macOS. If you want a sandboxed OpenClaw agent to trigger training, use the file-based host watcher flow documented in autocontext/docs/mlx-training.md. Repository Layout • : Python package, CLI, API server, dashboard, training loop • : published TypeScript package, CLI, and MCP-compatible tooling • : interactive terminal UI • : docs landing page and maintainer checklists • : copy-paste integration snippets for package users and external agents • : Docker, Fly.io, and bootstrap scripts • : shared protocol artifacts • : repo maintenance and generation scripts Where To Look Next • Docs overview: docs/README.md • Analytics and adoption: docs/analytics.md • Python package guide: autocontext/README.md • TypeScript package guide: ts/README.md • Copy-paste examples: examples/README.md • External agent integration: autocontext/docs/agent-integration.md • Recent changes: CHANGELOG.md • Contributor setup: CONTRIBUTING.md • Repo agent guide: AGENTS.md • MLX host training and OpenClaw bridge: autocontext/docs/mlx-training.md • Sandbox and executor notes: autocontext/docs/sandbox.md • License: LICENSE Note This repo was previously named . Some historical references may still use the older name or issue prefixes. Project Signals