mirror of
https://github.com/valkey-io/valkey.git
synced 2026-05-06 13:36:47 -04:00
39036c7c06
## Background Add structured datasets loading capability. Support CSV and TSV file formats. Use `__field:fieldname__` placeholders to replace the corresponding fields from the dataset file. Support natural content size of varying length. Allow mixed placeholder usage combining dataset fields with random generators. Enable automatic field discovery from CSV/TSV headers. Use `--maxdocs` to limit the dataset loading. Rather than modifying the existing placeholder system, we detect field placeholders and switch to a separate code path that builds commands from scratch using `valkeyFormatCommandArgv()`. This ensures: - Zero impact on existing functionality - Full support for variable-size content - Thread-safe atomic record iteration - Compatible with pipelining and threading modes __Usage examples__ ```sh # Strings - Simple key-value with dataset fields ./valkey-benchmark --dataset products.csv -n 10000 SET product:__rand_int__ "__field:name__" # Sets - Unique collections from dataset ./valkey-benchmark --dataset categories.csv -n 10000 SADD tags:__rand_int__ "__field:category__" # CSV dataset with document limit ./valkey-benchmark --dataset wiki.csv --maxdocs 100000 -n 50000 HSET doc:__rand_int__ title "__field:title__" body "__field:abstract__" # Mixed placeholders (dataset + random) ./valkey-benchmark --dataset terms.csv -r 5000000 -n 50000 HSET search:__rand_int__ term "__field:term__" score __rand_1st__ ``` __Full-Text Search Benchmarking__ ```sh # Search hit scenarios (existing terms) ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__" # Search miss scenarios (non-existent terms) ./valkey-benchmark --dataset miss_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__" # Query variations ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "@title:__field:term__" ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__*" ``` __Benchmark Results__ Test environment: __Instance:__ AWS c7i.16xlarge, 64 vCPU Test Dataset: 5M+ Wikipedia XML documents, 5.8GB memory | Configuration | Throughput | CPU Usage | Wall Time | Memory Peak | |---------------|------------|-----------|-----------|-------------| | Single-threaded, P1 | 93,295 RPS | 99% | 71.4s | 5.8GB | | Multi-threaded (10), P1 | 93,332 RPS | 137% | 71.5s | 5.8GB | | Single-threaded, P10 | 274,499 RPS | 96% | 36.1s | 5.8GB | | Multi-threaded (4), P10 | 344,589 RPS | 161% | 32.4s | 5.8GB | --------- Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com> Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>