amruth-sn / kong

The world's first agentic reverse engineer.

362 stars

41 forks

0 issues

PythonC

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing amruth-sn/kong in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/amruth-sn/kong)

Preview:

Repository Overview (README excerpt)

Crawler view

Kong: The Agentic Reverse Engineer **LLM orchestration for reverse engineering binaries** What is Kong? Most tasks follow a linear relationship: the more difficult a task, the longer it usually takes. Reverse engineering (and binary analysis) is a task in which the actual difficulty is somewhat trivial, but the time-to-execute can be on the order of hours (and days!), even for a binary with a couple hundred functions. Kong automates the mechanical layer, using an NSA-grade reverse engineering framework. Kong can take a fully obfuscated, stripped binary and run a full analysis pipeline: triaging functions, building call-graph context, recovering types and symbols through LLM-guided decompilation, and writing the results back into Ghidra's program database. The output is a binary where some is now , with recovered structs, parameter names, and calling conventions. **Why this exists** Stripped binaries lose all the context that makes code readable: function names, type information, variable names, struct layouts. Recovering that context is the bulk of the work in most RE tasks, and it's largely pattern matching: recognizing standard library functions, inferring types from usage, propagating names through call graphs. LLMs are good at exactly this kind of pattern matching. But pointing an LLM at raw decompiler output and asking "what does this do?" gives you mediocre results. The model lacks calling context, cross-reference information, and the broader picture of how the binary is structured. In addition, most obfuscated binaries introduce extreme techniques in order to prevent reverse engineering. Kong solves this by building rich context windows from Ghidra's program analysis (call graphs, cross-references, string references, data flow) before ever touching the LLM, then orchestrating the analysis in dependency order so each function benefits from its callees already being named. Additionally, Kong introduces its own, first-of-its-kind, agentic deobfuscation pipeline. In Action Features • **Fully Autonomous Pipeline**: A single command runs the complete analysis. Triage, function analysis, cleanup, semantic synthesis, and export. No manual intervention required. • **In-Process Ghidra Integration**: Runs Ghidra's analysis engine in-process via PyGhidra and JPype. No server, no RPC, no subprocess overhead. Direct access to the program database. • **Call-Graph-Ordered Analysis**: Functions are analyzed bottom-up from the call graph. Leaf functions are named first, so callers benefit from already-resolved context in their decompilation. • **Rich Context Windows**: Each LLM prompt includes the target function's decompilation plus cross-references, string references, caller/callee signatures, and neighboring data; not just raw decompiler output in isolation. • **Semantic Synthesis**: A post-analysis pass that unifies naming conventions across the binary, synthesizes struct definitions from field access patterns, and resolves inconsistencies between independently analyzed functions. • **Signature Matching**: Known standard library and cryptographic functions are identified by pattern before LLM analysis, skipping expensive inference for functions with known identities. • **Syntactic Normalization**: Decompiler output is cleaned up (modulo recovery, negative literal reconstruction, dead assignment removal) before reaching the LLM, reducing noise and token waste. • **Agentic Deobfuscation**: Kong uses an agentic deobfuscation pipeline which can identify and remove obfuscation techniques (Control flow flattening, bogus control flow, instruction substitution, string encryption, VM protection, etc.) from the decompiler output. • **Eval Framework**: Built-in evaluation harness that scores analysis output against ground-truth source code, measuring symbol accuracy (word-based Jaccard) and type accuracy (signature component scoring). • **Multi-Provider LLM Support**: Works with Anthropic (Claude) and OpenAI (GPT-4o) out of the box. An interactive setup wizard configures providers and smart routing auto-selects whichever has a valid key. • **Cost-Tracking**: Tracks token usage and costs per model across providers, with provider-aware pricing. Supported Architectures Kong works with most Ghidra-decompilable binaries (for now, more to come). Confidence | | C | C++ | Go | Rust | |---|---|---|---|---| | x86 | High | High | Medium | Medium | | x86-64 | High | High | Medium | Medium | | ARM (32-bit) | High | High | Medium | Low | | AArch64 | High | High | Medium | Low | | MIPS | Medium | Medium | Low | Low | | PowerPC | Medium | Medium | Low | Low | **High**: Kong reliably decompiles, deobfuscates, and recovers names, types, and structure. **Medium**: Decompilation is usable but noisier. Expect partial recovery and lower confidence scores. **Low**: Decompilation has significant gaps and results will stay incomplete, noisy, or unreadable. **Note**: Binary size scales positively with function count, LLM cost, and time to completion. However, binary size also scales negatively with confidence, so keep this in mind when analyzing larger binaries. Architecture Kong uses a five-phase pipeline orchestrated by a supervisor that coordinates triage, parallel analysis, and post-processing: How it works **Triage** enumerates all functions in the binary, classifies them by size (trivial / small / medium / large), builds the call graph, detects the source language, and runs signature matching against known standard library and crypto functions. Functions matched by signature are marked as resolved and skip LLM analysis entirely. **Analysis** processes functions in bottom-up order from the call graph using a work queue. For each function, Kong builds a context window from Ghidra's program database — decompilation, cross-references, string references, and the signatures of already-analyzed callees — normalizes the decompiler output, and sends it to the LLM for name, type, and parameter recovery. If obfusc…