HolmesGPT / holmesgpt
SRE Agent - CNCF Sandbox Project
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing HolmesGPT/holmesgpt in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewHolmesGPT — The CNCF SRE Agent Installation | Docs | Open-source AI agent for investigating production incidents and finding root causes. Works with any stack — Kubernetes, VMs, cloud providers, databases, and SaaS platforms. We are a Cloud Native Computing Foundation sandbox project. Originally created by Robusta.Dev, with major contributions from Microsoft. • **Petabyte-scale data**: Server-side filtering, JSON tree traversal, and tool output transformers keep large payloads out of context windows • **Memory-safe execution**: Per-tool memory limits, streaming large results to disk, and automatic output budgeting prevent OOM kills when querying large observability datasets • **Deep integrations**: Prometheus, Grafana, Datadog, Kubernetes, and many more—plus any REST API • **Bidirectional alert integrations**: Fetch alerts from AlertManager, PagerDuty, OpsGenie, or Jira—and write findings back • **Any LLM provider**: OpenAI, Anthropic, Azure, Bedrock, Gemini, and more • **No Kubernetes required**: Works with any infrastructure — VMs, bare metal, cloud services, or containers • **Operator mode**: Optionally run as a Kubernetes operator for automated investigations How it Works HolmesGPT uses an **agentic loop** to query live observability data from multiple sources and identify root causes. 🔗 Data Sources HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own. | Data Source | Notes | |-------------|-------| | **AKS** | Azure Kubernetes Service cluster and node health diagnostics | | **ArgoCD** | Get status, history and manifests and more of apps, projects and clusters | | **AWS** | RDS events, instances, slow query logs, and more (MCP) | | **Azure** | Azure resources and diagnostics (MCP) | | **Azure SQL** | Database health, performance, connections, and slow queries | | **Confluence** | Private runbooks and documentation | | **Confluence (MCP)** | Private runbooks and documentation (MCP) | | **Coralogix** | Retrieve logs for any resource | | **Datadog** | Query logs, metrics, and traces | | **Docker** | Get images, logs, events, history and more | | **Elasticsearch / OpenSearch** | Query logs, cluster health, shard and index diagnostics | | **GCP** | Google Cloud Platform resources (MCP) | | **GitHub** | Repositories, issues, and pull requests (MCP) | | **Grafana** | Query and analyze dashboard configurations and panels | | **Helm** | Release status, chart metadata, and values | | **Internet** | Public runbooks, community docs etc | | **Kafka** | Fetch metadata, list consumers and topics or find lagging consumer groups | | **Kubernetes** | Pod logs, K8s events, and resource status (kubectl describe) | | **Kubernetes Remediation (MCP)** | Apply fixes like scaling, rollbacks, and resource edits (MCP) | | **Loki** | Query logs for Kubernetes resources or any query | | **MariaDB** | MariaDB database queries and diagnostics (MCP) | | **MongoDB** | Query data, diagnose performance, inspect schemas, find slow operations | | **MongoDB Atlas** | Cluster health, slow queries, and performance diagnostics | | **NewRelic** | Investigate alerts, query tracing data | | **OpenShift** | Projects, routes, builds, security context constraints, and deployment configs | | **Prefect (MCP)** | Workflow orchestration monitoring, flow runs, and worker health (MCP) | | **Prometheus** | Investigate alerts, query metrics and generate PromQL queries | | **RabbitMQ** | Partitions, memory/disk alerts, troubleshoot split-brain scenarios and more | | **Robusta** | Multi-cluster monitoring, historical change data, runbooks, PromQL graphs and more | | **ServiceNow** | Query tables and incident records | | **Sentry** | Error tracking, issues, and performance monitoring (MCP) | | **Slab** | Team knowledge base and runbooks on demand | | **Splunk** | Log search and analysis (MCP) | | **SQL Databases** | PostgreSQL, MySQL, ClickHouse, MariaDB, SQL Server, SQLite | | **Tempo** | Fetch trace info, debug issues like high latency in application | See the full list of built-in toolsets for additional integrations including Cilium, KubeVela, Notion, and more. 🚀 End-to-End Automation HolmesGPT can fetch alerts/tickets to investigate from external systems, then write the analysis back to the source or Slack. | Integration | Status | Notes | |-------------------------|-----------|-------| | Slack | ✅ | Demo. Available via Robusta.dev (commercial platform) | | Microsoft Teams | ✅ | Available via Robusta.dev (commercial platform) | | Prometheus/AlertManager | ✅ | Robusta SaaS or HolmesGPT CLI | | PagerDuty | ✅ | HolmesGPT CLI only | | OpsGenie | ✅ | HolmesGPT CLI only | | Jira | ✅ | HolmesGPT CLI only | | GitHub | ✅ | HolmesGPT CLI only | Installation Read the installation documentation to learn how to install HolmesGPT. Supported LLM Providers Read the LLM Providers documentation to learn how to set up your LLM API key. Using HolmesGPT See the walkthrough documentation for usage guides, including: • Interactive mode for asking questions and follow-ups • Investigating Prometheus alerts • CI/CD troubleshooting 🔐 Data Privacy By design, HolmesGPT has **read-only access** and respects RBAC permissions. It is safe to run in production environments. License Distributed under the Apache 2.0 License. See LICENSE for more information. Community Join our community to discuss the HolmesGPT roadmap and share feedback: • Community Meetups Support If you have any questions, feel free to message us on HolmesGPT Slack Channel How to Contribute Please read our CONTRIBUTING.md for guidelines and instructions. For help, contact us on Slack or ask DeepWiki AI your questions. Please make sure to follow the CNCF code of conduct - details here.