scylladb / scylla-bench
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing scylladb/scylla-bench in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewscylla-bench scylla-bench is a benchmarking tool for Scylla written in Go. It aims at minimising the client overhead and provide a wide range of test scenarios. Install The recommended way to install scylla-bench is to download the repository and then install it from source: __It is not recommended to download and install the tool directly using or __. If you do that, a scylla-bench binary will be built __without using ScyllaDB's fork of the gocql driver__, and the shard-awareness __won't work__. This is due to the tool not honoring replace directives in the file: https://github.com/golang/go/issues/30354 Docker scylla-bench can be built and run using Docker, which is useful for local development, testing, and deployment scenarios. Building a Local Docker Image To build a local Docker image for development: The Dockerfile supports multiple build targets: • ** ** (default): Minimal image with the static binary (~8.6MB) • ** **: Development image with debugging tools (gdb, delve debugger) • ** **: Alternative production build for SCT (Scylla Cluster Tests) Building Specific Targets Running with Docker Debug Mode with Docker The debug image includes delve debugger and development tools: Docker Environment Variables The Docker images support several environment variables: • ** **: Go runtime debugging options (pre-configured for optimal performance) • ** **: Binary path (automatically configured) • ** **: Timezone (defaults to UTC) Docker Image Variants | Image Target | Size | Use Case | Debugging Tools | |--------------|------|----------|----------------| | | ~8.6MB | Production, CI/CD | ❌ | | | ~500MB | Development, troubleshooting | ✅ (gdb, delve, vim) | | | ~8.6MB | SCT integration testing | ❌ | Development Workflow with Docker Usage Schema adjustments The default scylla-bench schema for regular columns looks like this: scylla-bench allows configuring the number of partitions, number of rows in a partition and the size of a single row. This is done using flags , and respectively. Modes scylla-bench can operate in several modes (flag ) which basically determine what kind of requests are sent to the server. Some of the modes allow additional, further configuration. Write mode ( ) The behaviour in this mode differs depending on the configured number of rows per requests. If is set to 1 (default) scylla-bench sends simple INSERT requests like this: Otherwise, writes are sent in unlogged batches each containing at most insertions. All writes in a single batch refer to the same partition. The consequence of this is that in some configuration the number of rows written in a single requests can be actually smaller than the set value (e.g. ). Counter update mode ( ) Counter updates are written to a separate column family: Each request updates all five counters in a row and only one row per request is supported: Read mode ( ) Read mode is essentially split into four sub-modes and offers most configurability. The default requests resemble single partition paging queries, there is a lower bound of clustering keys and a limit which is can be adjusted using flag : It is possible to send request without a lower bound if flag is set: Limit can be replaced by upper bound (flag ). In this case scylla-bench will choose the upper bound so that the expected number of rows equals the one specified by . Finally, scylla-bench can send a request with an IN restriction (flag ). Again, the number of requested clustering keys will equal . Counter read mode ( ) Counter read mode works in exactly the same as regular read mode (with the same configuration flags available) except that it reads data from the counter table . Scan mode ( ) Scan the entire table. This mode does not allow the to be configured (it has its own workload called ). The scan mode allows for the token-space to be split into a user configurable sub-ranges and for querying these sub-ranges concurrently. The algorithm used is that descibed by Avi's efficient range scans blog post. The amount of sub-ranges that the token-space will be split into can be set by the flag. The recommended number to set this to is: -range-count = (nodes in cluster) ✕ (cores in node) ✕ 300 The number of sub-ranges to be read concurrency can be set by the flag as usual. The recommended concurrency is: -concurrency = range-count/100 For more details on these numbers see the above mentioned blog post. Essentially the following query is executed: SELECT * FROM scylla_bench.test WHERE token(pk) >= ? AND token(pk) = ). Then each goroutine prepends to its partitions (partitions are chosen in a round-robind manner) new rows. Newer rows have smaller clustering keys than the older ones. Once the partition reches rows the goroutine will switch to a new partiton key. This means that the total partition count will be larger than , since in time series workload that flag specifies only the number of partitions to which data is concurrently written. The rate at which rows depends on flag which must be specified in this workload. Since sets the total maximum request rate of the whole client the rate at which rows will be appended to a single partition may be lower. The acutal per-partition is printed in scylla-bench configuration as (it is ). scylla-bench also prints "Start timestamp" which is necessary if there is a time series read load running. Read mode Time series workload in read mode is supposed to be run simultanously with time series writes. It requires specifying the start timestamp and per-partition write rate both of which are printed by scylla-bench running in write mode. The time series workload in read mode chooses partition and clustering keys randomly from the range that has been written up to this point (using start timestamp and write rate). The distribution can be either uniform (flag ) or half-normal with the latest rows being most likely ( ). Note that if the effective write rate is lower than the specified one th…