RelayPlane / proxy

Open source cost intelligence proxy for AI agents. Cut costs ~80% with smart model routing. Dashboard, policy engine, 11 providers. MIT licensed.

View on GitHub

104 stars

14 forks

4 issues

TypeScriptJavaScriptShell

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing RelayPlane/proxy in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/RelayPlane/proxy)

Preview:

Repository Overview (README excerpt)

Crawler view

@relayplane/proxy A **Node.js npm LLM proxy** that sits between your AI agents and providers. Drop-in replacement for OpenAI and Anthropic base URLs — no Docker, no Python, just . Tracks every request, shows where the money goes, and offers configurable task-aware routing — all running **locally, for free**. **The npm-native LLM proxy for Node.js developers.** Works with Claude Code, Cursor, OpenClaw, and any tool that supports or . **Free, open-source proxy features:** • 📊 Per-request cost tracking across 11 providers • 💰 **Cache-aware cost tracking** - accurately tracks Anthropic prompt caching with cache read savings, creation costs, and true per-request costs • 🔀 Configurable task-aware routing (complexity-based, cascade, model overrides) • 🛡️ Circuit breaker - if the proxy fails, your agent doesn't notice • 📈 **Local dashboard** at - cost breakdown, savings analysis, provider health, agent breakdown • 💵 **Budget enforcement** - daily/hourly/per-request spend limits with block, warn, downgrade, or alert actions • 🔍 **Anomaly detection** - catches runaway agent loops, cost spikes, and token explosions in real time • 🔔 **Cost alerts** - threshold alerts at configurable percentages, webhook delivery, alert history • ⬇️ **Auto-downgrade** - automatically switches to cheaper models when budget thresholds are hit • 📦 **Aggressive cache** - exact-match response caching with gzipped disk persistence • 🤖 **Per-agent cost tracking** - identifies agents by system prompt fingerprint and tracks cost per agent • 📝 **Content logging** - dashboard shows system prompt preview, user message, and response preview per request • 🔐 **OAuth passthrough** - correctly forwards and headers for Claude Max subscription users (OpenClaw compatible) • 🧠 **Osmosis mesh** - collective learning layer that shares anonymized routing signals across users (on by default, opt-out: ) • 🔧 **systemd/launchd service** - for always-on operation with auto-restart • 🏥 **Health watchdog** - endpoint with uptime tracking and active probing • 🛡️ **Config resilience** - atomic writes, automatic backup/restore, credential separation > **Cloud dashboard available separately** - see Cloud Dashboard & Pro Features below. Your prompts always stay local. Quick Start Works with any agent framework that talks to OpenAI or Anthropic APIs. Point your client at (set or ) and the proxy handles the rest. What's New in v1.8.14+ **Breaking changes for upgraders:** • **Telemetry is now ON by default.** Previously opt-in. Anonymous metadata (model, tokens, cost, latency) is sent to power the cloud dashboard. Your prompts and responses are never collected. Disable: • **Mesh is now ON by default.** Your proxy contributes anonymized routing data to the collective network. Free users get provider health alerts. Pro users get full routing intelligence. Disable: • **Cloud dashboard is now free.** Previously required a paid plan. Just to access your data at relayplane.com/dashboard. If you prefer the old behavior: Supported Providers **Anthropic** · **OpenAI** · **Google Gemini** · **xAI/Grok** · **OpenRouter** · **DeepSeek** · **Groq** · **Mistral** · **Together** · **Fireworks** · **Perplexity** Configuration RelayPlane reads configuration from . Override the path with the environment variable. A minimal config file: All configuration is optional - sensible defaults are applied for every field. The proxy merges your config with its defaults via deep merge, so you only need to specify what you want to change. Architecture How It Works RelayPlane is a local HTTP proxy. You point your agent at by setting or . The proxy: • **Intercepts** your LLM API requests • **Classifies** the task using heuristics (token count, prompt patterns, keyword matching - no LLM calls) • **Routes** to the configured model based on classification and your routing rules (or passes through to the original model by default) • **Forwards** the request directly to the LLM provider (your prompts go straight to the provider, not through RelayPlane servers) • **Records** token counts, latency, and cost locally for your dashboard **Default behavior is passthrough** - requests go to whatever model your agent requested. Routing (cascade, complexity-based) is configurable and must be explicitly enabled. Complexity-Based Routing The proxy classifies incoming requests by complexity (simple, moderate, complex) based on prompt length, token patterns, and the presence of tools. Each tier maps to a different model. **How classification works:** • **Simple** - Short prompts, straightforward Q&A, basic code tasks • **Moderate** - Multi-step reasoning, code review, analysis with context • **Complex** - Architecture decisions, large codebases, tasks with many tools, long prompts with evaluation/comparison language The classifier scores requests based on message count, total token length, tool usage, and content patterns (e.g., words like "analyze", "compare", "evaluate" increase the score). This happens locally - no prompt content is sent anywhere. Model Overrides Map any model name to a different one. Useful for silently redirecting expensive models to cheaper alternatives without changing your agent configuration: Overrides are applied before any other routing logic. The original requested model is logged for tracking. Cascade Mode Start with the cheapest model and escalate only when the response shows uncertainty or refusal. This gives you the cost savings of a cheap model with a safety net. ** options:** | Value | Triggers escalation when... | |-------|----------------------------| | | Response contains hedging language ("I'm not sure", "it's hard to say", "this is just a guess") | | | Model refuses to help ("I can't assist with that", "as an AI") | | | The request fails outright | ** ** caps how many times the proxy will retry with a more expensive model. Default: . The cascade walks through the array in order, starting from the first. Each escalatio…