odvcencio / gotreesitter
Pure Go tree-sitter runtime
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing odvcencio/gotreesitter in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewgotreesitter Pure-Go tree-sitter runtime. No CGo, no C toolchain. Cross-compiles to any / target Go supports, including . gotreesitter loads the same parse-table format that tree-sitter's C runtime uses. Grammar tables are extracted from upstream files by , compressed into binary blobs, and deserialized on first use. 206 grammars ship in the registry. Motivation Every Go tree-sitter binding in the ecosystem depends on CGo: • Cross-compilation requires a C cross-toolchain per target. , from a Linux host, or any Windows build without MSYS2/MinGW, will not link. • CI images must carry and the grammar's C sources. fails for downstream users who don't have a C compiler. • The Go race detector, coverage instrumentation, and fuzzer cannot see across the CGo boundary. Bugs in the C runtime or in FFI marshaling are invisible to . gotreesitter eliminates the C dependency entirely. The parser, lexer, query engine, incremental reparsing, arena allocator, external scanners, and tree cursor are all implemented in Go. The only input is the grammar blob. Quick start resolves a filename to the appropriate . Queries The query engine supports the full S-expression pattern language: structural quantifiers ( , , ), alternation ( ), field constraints, negated fields, anchor ( ), and all standard predicates. See Query API. Typed query codegen Generate type-safe Go wrappers from query files: Given a query like , generates: Multi-pattern queries generate one struct per pattern with conversion helpers. Multi-language documents (injection parsing) Parse documents with embedded languages (HTML+JS+CSS, Markdown+code fences, Vue/Svelte templates): Supports static ( ) and dynamic ( capture) language detection, recursive nested injections, and incremental reparse with child tree reuse. Source rewriting Collect source-level edits and apply atomically, producing records for incremental reparse: returns both the new source bytes and the records. is a convenience that calls for each edit and returns source ready for . Incremental reparsing walks the old tree's spine, identifies the edit region, and reuses unchanged subtrees by reference. Only the invalidated span is re-lexed and re-parsed. Both leaf and non-leaf subtrees are eligible for reuse; non-leaf reuse is driven by pre-goto state tracking on interior nodes, so the parser can skip entire subtrees without re-deriving their contents. When no edit has occurred, detects the nil-edit on a pointer check and returns in single-digit nanoseconds with zero allocations. Tree cursor maintains an explicit frame stack. Parent, child, and sibling movement are O(1) with zero allocations — sibling traversal indexes directly into the parent's slice. Movement methods: , , , , , named-only variants ( , etc.), field-based ( , ), and position-based ( , ). Cursors hold direct pointers into tree nodes. Recreate after , , or incremental reparse. Highlighting Tagging Benchmarks All measurements below use the same workload: a generated Go source file with 500 functions ( bytes). Numbers are medians from 10 runs on: | Runtime | Full parse | Incremental (1-byte edit) | Incremental (no edit) | |---|---:|---:|---:| | Native C (pure C runtime) | 1.76 ms | 102.3 μs | 101.7 μs | | CGo binding (C runtime via cgo) | ~2.0 ms | ~130 μs | — | | gotreesitter (pure Go) | 4.20 ms | 1.49 μs | 2.18 ns | On this workload: • Full parse is ~2.4x slower than native C. • Incremental single-byte edits are ~69x faster than native C (~87x faster than CGo). • No-edit reparses are ~46,600x faster than native C, zero allocations. Raw benchmark output | Benchmark | Median ns/op | B/op | allocs/op | |---|---:|---:|---:| | Native C full parse | 1,764,436 | — | — | | Native C incremental (1-byte edit) | 102,336 | — | — | | Native C incremental (no edit) | 101,740 | — | — | | | ~1,990,000 | 600 | 6 | | | ~130,000 | 648 | 7 | | | 4,197,811 | 585 | 7 | | | 1,490 | 1,584 | 9 | | | 2.181 | 0 | 0 | Benchmark matrix For repeatable multi-workload tracking: Emits (machine-readable), (summary), and raw logs under . Supported languages 206 grammars ship in the registry. All 206 produce error-free parse trees on smoke samples. Run for current status. • 116 external scanners (hand-written Go implementations of upstream C scanners) • 7 hand-written Go token sources (authzed, c, cpp, go, java, json, lua) • Remaining languages use the DFA lexer generated from grammar tables Parse quality Each carries a field: | Quality | Meaning | |---|---| | | All scanner and lexer components present. Parser has full access to the grammar. | | | Missing external scanner. DFA lexer handles what it can; external tokens are skipped. | | | Cannot parse. | means the parser has every component the grammar requires. It does not guarantee error-free trees on all inputs — grammars with high GLR ambiguity may produce syntax errors on very large or deeply nested constructs due to parser safety limits (iteration cap, stack depth cap, node count cap). These limits scale with input size. Check at runtime. Full language list (206) , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Query API | Feature | Status | |---|---| | Compile + execute ( , , ) | supported | | Cursor streaming ( , , ) | supported | | Structural quantifiers ( , , ) | supported | | Alternation ( ) | supported | | Field matching ( ) | supported | | / | supported | | / | supported | | / | supported | | | supported | | / | supported | | | supported | | / | supported | | / | supported | | / | supported | | | supported | | | supported | | / directives | parsed and accepted | | (read metadata from matc…