back to home

projectdiscovery / katana

A next-generation crawling and spidering framework.

16,076 stars
1,028 forks
47 issues
GoJavaScriptShell

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing projectdiscovery/katana in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/projectdiscovery/katana)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

A next-generation crawling and spidering framework Features • Installation • Usage • Scope • Config • Filters • Join Discord Features • Fast And fully configurable web crawling • **Standard** and **Headless** mode • **JavaScript** parsing / crawling • Customizable **automatic form filling** • **Scope control** - Preconfigured field / Regex • **Customizable output** - Preconfigured fields • INPUT - **STDIN**, **URL** and **LIST** • OUTPUT - **STDOUT**, **FILE** and **JSON** Installation katana requires Go 1.25+ to install successfully. If you encounter any installation issues, we recommend trying with the latest available version of Go, as the minimum required version may have changed. Run the command below or download a pre-compiled binary from the release page. **More options to install / run katana-** Docker > To install / update docker to latest tag - > To run katana in standard mode using docker - > To run katana in headless mode using docker - Ubuntu > It's recommended to install the following prerequisites - > install katana - Usage This will display help for the tool. Here are all the switches it supports. Running Katana Input for katana **katana** requires **url** or **endpoint** to crawl and accepts single or multiple inputs. Input URL can be provided using option, and multiple values can be provided using comma-separated input, similarly **file** input is supported using option and additionally piped input (stdin) is also supported. URL Input Multiple URL Input (comma-separated) List Input STDIN (piped) Input Example running katana - Crawling Mode Standard Mode Standard crawling modality uses the standard go http library under the hood to handle HTTP requests/responses. This modality is much faster as it doesn't have the browser overhead. Still, it analyzes HTTP responses body as is, without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications depending, for example, on browser-specific events. Headless Mode Headless mode hooks internal headless calls to handle HTTP requests/responses directly within the browser context. This offers two advantages: • The HTTP fingerprint (TLS and user agent) fully identify the client as a legitimate browser • Better coverage since the endpoints are discovered analyzing the standard raw response, as in the previous modality, and also the browser-rendered one with javascript enabled. Headless crawling is optional and can be enabled using option. Here are other headless CLI options - • * ---- Runs headless chrome browser with **no-sandbox** option, useful when running as root user. • * ---- Runs headless chrome browser without incognito mode, useful when using the local browser. • * ---- When crawling in headless mode, additional chrome options can be specified using , for example - Captcha Solving Katana supports automatic captcha detection and solving during headless crawling. When a captcha page is encountered, katana identifies the captcha provider, solves it via an external service, and continues crawling. Supported captcha types: **reCAPTCHA v2**, **reCAPTCHA v3**, **reCAPTCHA Enterprise**, **Cloudflare Turnstile**, **hCaptcha** • * ---- Option to specify the captcha solver provider. Currently supported: . • * ---- API key for the captcha solver provider. The provider and key can also be set via environment variables: Scope Control Crawling can be endless if not scoped, as such katana comes with multiple support to define the crawl scope. • * ---- Most handy option to define scope with predefined field name, being default option for field scope. • - crawling scoped to root domain name and all subdomains (e.g. ) (default) • - crawling scoped to given sub(domain) (e.g. or ) • - crawling scoped to domain name keyword (e.g. ) • * ------ For advanced scope control, option can be used that comes with **regex** support. For multiple in scope rules, file input with multiline string / regex can be passed. • * ----- For defining what not to crawl, option can be used and also support **regex** input. For multiple out of scope rules, file input with multiline string / regex can be passed. • * ---- Katana is default to scope , to disable this option can be used and also to crawl the internet. • * ---- As default, when scope option is used, it also applies for the links to display as output, as such **external URLs are default to exclude** and to overwrite this behavior, option can be used to display all the external URLs that exist in targets scoped URL / Endpoint. Here is all the CLI options for the scope control - Crawler Configuration Katana comes with multiple options to configure and control the crawl as the way we want. • * ---- Option to define the to follow the urls for crawling, the more d _...truncated for preview_