mgechev / skillgrade

"Unit tests" for your agent skills

297 stars

19 forks

1 issues

TypeScriptHTMLJavaScript

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing mgechev/skillgrade in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/mgechev/skillgrade)

Preview:

Repository Overview (README excerpt)

Crawler view

Skillgrade The easiest way to evaluate your Agent Skills. Tests that AI agents correctly discover and use your skills. See examples/ — superlint (simple) and angular-modern (TypeScript grader). Quick Start **Prerequisites**: Node.js 20+, Docker **1. Initialize** — go to your skill directory (must have ) and scaffold: Generates with AI-powered tasks and graders. Without an API key, creates a well-commented template. **2. Edit** — customize for your skill (see eval.yaml Reference). **3. Run**: The agent is auto-detected from your API key: → Gemini, → Claude, → Codex. Override with . **4. Review**: Reports are saved to . Override with . Presets | Flag | Trials | Use Case | |------|--------|----------| | | 5 | Quick capability check | | | 15 | Reliable pass rate estimate | | | 30 | High-confidence regression detection | Options | Flag | Description | |------|-------------| | | Run specific evals by name (comma-separated) | | | Run only graders of a type ( or ) | | | Override trial count | | | Run trials concurrently | | | Override agent (default: auto-detect from API key) | | | Override provider | | | Output directory (default: ) | | | Verify graders using reference solutions | | | CI mode: exit non-zero if below threshold | | | Pass rate threshold for CI mode | | | Show CLI results after running | eval.yaml Reference String values ( , , ) support **file references** — if the value is a valid file path, its contents are read automatically: Graders Deterministic Runs a command and parses JSON from stdout: Output format: (0.0–1.0) and are required. is optional. **Bash example:** > Use for arithmetic — is not available in . LLM Rubric Evaluates the agent's session transcript against qualitative criteria: Uses Gemini or Anthropic based on available API key. Override with the field. Combining Graders Final reward = CI Integration Use in CI — the runner is already an ephemeral sandbox, so Docker adds overhead without benefit. Exits with code 1 if pass rate falls below (default: 0.8). > **Tip**: Use (the default) for local development to protect your machine. In CI, is faster and simpler. Environment Variables | Variable | Used by | |----------|---------| | | Agent execution, LLM grading, | | | Agent execution, LLM grading, | | | Agent execution (Codex), | Variables are also loaded from in the skill directory. Shell values override . All values are **redacted** from persisted session logs. Best Practices • **Grade outcomes, not steps.** Check that the file was fixed, not that the agent ran a specific command. • **Instructions must name output files.** If the grader checks for , the instruction must tell the agent to save as . • **Validate graders first.** Use with a reference solution before running real evals. • **Start small.** 3–5 well-designed tasks beat 50 noisy ones. For a comprehensive guide on writing high-quality skills, check out skills-best-practices. You can also install the skill creator skill to help author skills: License MIT --- *Inspired by SkillsBench and Demystifying Evals for AI Agents.*