alexgreensh / token-optimizer

Find the ghost tokens. Fix them. Survive compaction. Avoid context quality decay.

91 stars

8 forks

0 issues

PythonTypeScriptHTML

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing alexgreensh/token-optimizer in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/alexgreensh/token-optimizer)

Preview:

Repository Overview (README excerpt)

Crawler view

Your AI is getting dumber and you can't see it. Find the ghost tokens. Survive compaction. Track the quality decay. Opus 4.6 drops from 93% to 76% accuracy across a 1M context window. Compaction loses 60-70% of your conversation. Ghost tokens burn through your plan limits on every single message. Token Optimizer tracks the degradation, cuts the waste, checkpoints your decisions before compaction fires, and tells you what to fix. Install Then in Claude Code: Also available as a script installer: Works on Claude Code and OpenClaw. Each platform gets its own native plugin (Python for Claude Code, TypeScript for OpenClaw). No bridging, no shared runtime, zero cross-platform dependencies. Why install this first? Every Claude Code session starts with invisible overhead: system prompt, tool definitions, skills, MCP servers, CLAUDE.md, MEMORY.md. A typical power user burns 50-70K tokens before typing a word. At 200K context, that's 25-35% gone. At 1M, it's "only" 5-7%, but the problems compound: • **Quality degrades as context fills.** MRCR drops from 93% to 76% across 256K to 1M. Your AI gets measurably dumber with every message. • **You hit rate limits faster.** Ghost tokens count toward your plan's usage caps on every message, cached or not. 50K overhead x 100 messages = 5M tokens burned on nothing. • **Compaction is catastrophic.** 60-70% of your conversation gone per compaction. After 2-3 compactions: 88-95% cumulative loss. And each compaction means re-sending all that overhead again. • **Higher effort = faster burn.** More thinking tokens per response means you hit compaction sooner, which means more total tokens consumed across the session. Token Optimizer tracks all of this. Quality score, degradation bands, compaction loss, drift detection. Zero context tokens consumed (runs as external Python). > **"But doesn't removing tokens hurt the model?"** No. Token Optimizer removes structural waste (duplicate configs, unused skill frontmatter, bloated files), not useful context. It also actively *measures* quality: the 7-signal quality score tells you if your session is degrading, and Smart Compaction checkpoints your decisions before auto-compact fires. Most users see quality scores *improve* after optimization because the model has more room for real work. --- NEW in v2.6: Per-Turn Analytics and Cost Intelligence | Feature | What You Get | |---------|-------------| | **Per-turn token breakdown** | Click any session to see input/output/cache per API call. Spike detection highlights context jumps. | | **Cost per session** | Every session shows estimated API cost. Daily totals in the trends view. | | **Four-tier pricing** | Anthropic API, Vertex Global, Vertex Regional (+10%), AWS Bedrock. Set once, all costs update. | | **Cache visualization** | Stacked bars showing input vs output vs cache-read vs cache-write split. See how well prompt caching works. | | **Session quality overlay** | Color-coded quality scores on every session. Green = healthy, yellow = degrading, red = trouble. | | **Kill stale sessions** | Terminates zombie headless sessions. Dashboard shows kill buttons with clear explanation. | | **Live agent tracking** | Status bar shows running subagents with model, description, and elapsed time. Spot misrouted models instantly. | | **Session duration warning** | Appears in the status bar only when quality drops below 75. Contextual, not noise. | --- What questions can you ask? | Command | What You Get | |---------|-------------| | | **"Am I in trouble?"** 10-second answer: context health, degradation risk, biggest token offenders, which model to use. | | | **"Is everything installed correctly?"** Score out of 10. Broken hooks, missing components, exact fix commands. | | | **"Has my setup grown?"** Side-by-side comparison vs your last snapshot. Catches config creep before it costs you. | | | **"How healthy is this session?"** 7-signal analysis of your live conversation. Stale reads, wasted tokens, compaction damage. | | | **"Where are my tokens going?"** Full per-component breakdown. Every skill, every MCP server, every config file. | | | **"What happened each turn?"** Per-message token + cost breakdown with spike detection. | | | **"What am I paying?"** View or switch between Anthropic/Vertex/Bedrock pricing tiers. | | | **"Clean up zombies."** Terminate headless sessions running 12+ hours. | | | **"What's actually being used?"** Skill adoption, model mix, overhead trajectory over time. | | | **"Where do I start?"** Detects 8 named anti-patterns and recommends specific fixes. | | | **"Show me everything."** Interactive HTML dashboard with all analytics. | | | **"Fix it for me."** Interactive audit with 6 parallel agents. Guided fixes with diffs and backups. | Quality Scoring (7 signals) | Signal | Weight | What It Means For You | |--------|--------|----------------| | **Context fill** | 20% | How close are you to the degradation cliff? Based on published MRCR benchmarks. | | **Stale reads** | 20% | Files you read earlier have changed. Your AI is working with outdated info. | | **Bloated results** | 20% | Tool outputs that were never used. Wasting context on noise. | | **Compaction depth** | 15% | Each compaction loses 60-70% of your conversation. After 2: 88% gone. | | **Duplicates** | 10% | The same system reminders injected over and over. Pure waste. | | **Decision density** | 8% | Are you having a real conversation or is it mostly overhead? | | **Agent efficiency** | 7% | Are your subagents pulling their weight or just burning tokens? | Degradation bands in the status bar: • Green (<50% fill): peak quality zone • Yellow (50-70%): degradation starting • Orange (70-80%): quality dropping • Red (80%+): severe, consider /clear What Degradation Actually Looks Like This is a real session. 708 messages, 2 compactions, 88% of the original context gone. Without the quality score, you'd have no idea. --- The Problem Every message you send to Claude Code re-sends ever…