valkey

mirror of https://github.com/valkey-io/valkey.git synced 2026-05-06 13:36:47 -04:00

Files

T

History

VoletiRam 39036c7c06 Add structured datasets loading capability in valkey benchmark (#2823 )

## Background

Add structured datasets loading capability. Support CSV and TSV file
formats. Use `__field:fieldname__` placeholders to replace the
corresponding fields from the dataset file. Support natural content size
of varying length. Allow mixed placeholder usage combining dataset
fields with random generators. Enable automatic field discovery from
CSV/TSV headers. Use `--maxdocs` to limit the dataset loading.

Rather than modifying the existing placeholder system, we detect field
placeholders and switch to a separate code path that builds commands
from scratch using `valkeyFormatCommandArgv()`. This ensures:

- Zero impact on existing functionality
- Full support for variable-size content
- Thread-safe atomic record iteration
- Compatible with pipelining and threading modes

__Usage examples__

```sh
# Strings - Simple key-value with dataset fields
./valkey-benchmark --dataset products.csv -n 10000 SET product:__rand_int__ "__field:name__"

# Sets - Unique collections from dataset
./valkey-benchmark --dataset categories.csv -n 10000 SADD tags:__rand_int__ "__field:category__"

# CSV dataset with document limit
./valkey-benchmark --dataset wiki.csv --maxdocs 100000 -n 50000 HSET doc:__rand_int__ title "__field:title__" body "__field:abstract__"

# Mixed placeholders (dataset + random)
./valkey-benchmark --dataset terms.csv -r 5000000 -n 50000 HSET search:__rand_int__ term "__field:term__" score __rand_1st__
```

__Full-Text Search Benchmarking__

```sh
# Search hit scenarios (existing terms)
./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__"

# Search miss scenarios (non-existent terms)
./valkey-benchmark --dataset miss_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__"

# Query variations
./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "@title:__field:term__"
./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__*"
```

__Benchmark Results__


Test environment:
__Instance:__ AWS c7i.16xlarge, 64 vCPU

Test Dataset: 5M+ Wikipedia XML documents, 5.8GB memory

| Configuration | Throughput | CPU Usage | Wall Time | Memory Peak |
|---------------|------------|-----------|-----------|-------------|
| Single-threaded, P1 | 93,295 RPS | 99% | 71.4s | 5.8GB |
| Multi-threaded (10), P1 | 93,332 RPS | 137% | 71.5s | 5.8GB |
| Single-threaded, P10 | 274,499 RPS | 96% | 36.1s | 5.8GB |
| Multi-threaded (4), P10 | 344,589 RPS | 161% | 32.4s | 5.8GB |

---------

Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>

2026-04-29 09:18:37 -07:00

Packaging.cmake

…

SourceFiles.cmake

Add structured datasets loading capability in valkey benchmark (#2823 )

2026-04-29 09:18:37 -07:00

Utils.cmake

CMake fixes + README update (#1276 )

2024-11-22 12:17:53 -08:00

ValkeySetup.cmake

Adds new (hidden) ValkeyModule_CallArgv API functions (#3122 )

2026-04-13 12:48:41 +02:00