valkey

mirror of https://github.com/valkey-io/valkey.git synced 2026-05-06 05:26:42 -04:00

Author	SHA1	Message	Date
Ran Shidlansik	fea0b4064c	Fix invalid memory access in RESTORE with malformed zipmap (CVE-2026-25243) (#3619 ) Root cause: zipmapValidateIntegrity() and zipmapNext() use different methods to calculate pointer advancement for length-encoded fields. Validation reads the actual encoded size via zipmapGetEncodedLengthSize() (which returns 5 for the 0xFE prefix), but zipmapRawKeyLength() (used by zipmapNext during hash conversion) recalculates via zipmapEncodeLength() which returns 1 for decoded lengths < 254. A crafted zipmap with an overlong 5-byte encoding for a small length passes validation but causes a 4-byte pointer mismatch in zipmapNext(), leading to heap buffer over-reads during the zipmap-to-listpack conversion. Fix: add sanity checks in zipmapValidateIntegrity() to reject entries where the decoded length < ZIPMAP_BIGLEN (254) but the encoding uses more than 1 byte. This is applied to both field-name and value lengths. Test: added a regression test in tests/unit/dump.tcl that crafts a RESTORE payload with a 2-entry zipmap where the first field uses an overlong 5-byte length encoding for value 3. Post-patch, this is cleanly rejected by zipmapValidateIntegrity(). Pre-patch, the misaligned zipmapNext() reads garbage (confirmed via server log: "Hash zipmap with dup elements, or big length (0)") which also produces an error, so the test serves as a defense-in-depth regression anchor rather than a strict pass/fail differentiator. The actual heap over-read is detectable with AddressSanitizer builds. Signed-off-by: ikolomi <ikolomin@amazon.com> Co-authored-by: ikolomi <ikolomin@amazon.com> 8.0.8	2026-05-05 16:48:38 -07:00
Ran Shidlansik	c7c92db43b	Delay full sync during yielding Lua scripts to prevent use-after-free (CVE-2026-23631) (#3625 ) During a full sync, the functions/scripting engine is freed right before loading the RDB from the primary. If a Lua script is still running and yielding via the long-command mechanism at that moment, the freed engine can be accessed when the script resumes, causing a use-after-free. Add a guard at the top of replicaReceiveRDBFromPrimaryToMemory() to check isInsideYieldingLongCommand() and return early, deferring the sync processing until the script completes. No validating test was added because the vulnerability is a race condition between a yielding Lua script and a replication event handler, which cannot be reliably triggered in a deterministic Tcl test. Signed-off-by: ikolomi <ikolomin@amazon.com> Co-authored-by: ikolomi <ikolomin@amazon.com>	2026-05-05 15:47:16 -07:00
sananes	797c626046	Fix SIGSEGV in VM_GetLRU/SetLRU/GetLFU/SetLFU on NULL key (#3610 ) ## Fix SIGSEGV in VM_GetLRU, VM_SetLRU, VM_GetLFU, VM_SetLFU on NULL key ### Description `VM_GetLRU`, `VM_SetLRU`, `VM_GetLFU`, and `VM_SetLFU` crash with SIGSEGV when passed a NULL `ValkeyModuleKey` pointer. This happens because all four functions dereference `key->value` without first checking if `key` itself is NULL. When a module opens a nonexistent key in `VALKEYMODULE_READ` mode, `VM_OpenKey` returns NULL. If a module passes that NULL pointer into any of these functions, the server crashes. ### Reproduction ``` valkey-server --loadmodule tests/modules/misc.so valkey-cli test.getlru nonexistent_key # Server crashes: SIGSEGV (signal 11) ``` ### Fix `src/module.c` — Add a `!key` guard before dereferencing `key->value` in all four functions: ```c // Before: if (!key->value) return VALKEYMODULE_ERR; // After: if (!key \|\| !key->value) return VALKEYMODULE_ERR; ``` `tests/modules/misc.c` — Add early return after `open_key_or_reply()` in `test_getlru`, `test_setlru`, `test_getlfu`, and `test_setlfu`. The helper already sends the error reply to the client when the key is not found, so the command handler just needs to stop processing: ```c ValkeyModuleKey *key = open_key_or_reply(ctx, argv[1], VALKEYMODULE_READ\|VALKEYMODULE_OPEN_KEY_NOTOUCH); if (!key) return VALKEYMODULE_OK; ``` ### After fix ``` valkey-cli test.getlru nonexistent_key (error) key not found # Server stays up ``` Signed-off-by: Yaron Sananes <yaron.sananes@gmail.com>	2026-05-04 23:17:39 +03:00
Brad Bebee	8891441ab9	Fix checkPrefixCollisionsOrReply returning non-zero on self-overlap (#3583 )	2026-05-03 11:44:09 -07:00
Sarthak Aggarwal	f2f4e5dbfc	Run ASan Tests on run-extra-tests label (#3512 ) It's important to enabled ASAN on run-extra-tests label so we can catch some of the bugs in the PRs before they are merged into unstable. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2026-05-01 12:10:29 +08:00
Rain Valentine	cea9354b56	Big Endian: add daily workflow UT job and fix UTs (#3330 ) Big endian support on Valkey is "best effort" and not guaranteed, but we haven't been doing any regular testing at all afaik. This PR adds a job to the daily workflow to run UTs on an emulated big endian platform. Integration tests failed excessively because of how slow emulation is. I fixed several problems with tests and improved UT coverage of key points where endian byte order matters - and fwiw I didn't find any bugs. I think the main coverage gap remaining after this is RDB serialization (maybe little endian <-> big endian round trips?) There are couple lines of endian-specific code for #3166 and this change can test it. Signed-off-by: Rain Valentine <rsg000@gmail.com>	2026-05-01 12:09:23 +08:00
FAN PEI	46d37e4d5e	Fix off-by-one boundary in lpEncodeBacklen() for 3 values (#3601 ) The function lpEncodeBacklen() uses `<= 127` for the 1-byte case but `< 16383`, `< 2097151`, and `< 268435455` for the subsequent cases. This means the exact values 16383, 2097151, and 268435455 (i.e. 2^14-1, 2^21-1, 2^28-1) unnecessarily use one extra byte than needed: - `l < 16383` → `16383` (2^14-1) uses 3 bytes instead of 2 - `l < 2097151` → `2097151` (2^21-1) uses 4 bytes instead of 3 - `l < 268435455` → `268435455` (2^28-1) uses 5 bytes instead of 4 The decoding side (`lpDecodeBacklen`) is unaffected since it parses continuation bits continuously without discrete range checks. This is a correctness issue and has no impact on data integrity since encoding and decoding use the same function boundaries, but it wastes up to 1 byte per affected entry. Signed-off-by: fanpei91 <fanpei91@gmail.com>	2026-05-01 12:06:38 +08:00
Saurabh K	54bdf5737b	Handle NULL pointer in streamTrim listpack delta calculation (#3591 ) When XTRIM marks the last entry in a listpack node as deleted, lpNext() returns NULL after the lp-count field (EOF). The delta calculation (p - lp) on a NULL pointer is undefined behavior and produces a garbage pointer, corrupting the listpack. A subsequent XREAD hitting the corrupted node triggers the lpValidateNext assertion failure and crashes the server. Guard the delta calculation with a NULL check so the while(p) loop terminates naturally when the last entry is reached. Fixes #3569 Signed-off-by: Saurabh Kher <saurabh@amazon.com> Co-authored-by: Saurabh Kher <saurabh@amazon.com>	2026-05-01 12:05:01 +08:00
chenshi	cba05103de	Fix: prevent NULL dereference crash in connectSlotExportJob when target node disappears (#3596 ) ### Summary This PR fixes a NULL pointer dereference (SIGSEGV) in `connectSlotExportJob()` (`src/cluster_migrateslots.c`) that can crash a Valkey cluster node, causing a denial-of-service condition. ### Root Cause When `CLUSTER MIGRATESLOTS` is issued, a migration job is created with state `SLOT_EXPORT_CONNECTING`. On the next `clusterCron()` tick, `proceedWithSlotMigration()` calls `connectSlotExportJob()`, which looks up the target node via `clusterLookupNode()`. `clusterLookupNode()` can legitimately return `NULL` — for example, if the target node is removed from the cluster (e.g. via `CLUSTER FORGET`) between the time the migration job is created and the time the cron fires. This is a realistic race condition in any cluster topology change scenario. The return value was never checked, so the subsequent call to `getNodeDefaultReplicationPort(n)` immediately dereferences the NULL pointer, crashing the process: ```c // Before fix — vulnerable clusterNode *n = clusterLookupNode(job->target_node_name, CLUSTER_NAMELEN); int port = getNodeDefaultReplicationPort(n); // SIGSEGV if n == NULL serverLog(..., n->ip, port); // second dereference Signed-off-by: chenshi5012 <chenshi5012@163.com>	2026-04-30 16:25:58 -07:00
Jeff Duffy	72fc5b14b1	Fix compilation error: replace deprecated je_calloc with zcalloc_num (#3592 ) Fixes #1905 ## Summary The direct use of `je_calloc` in `src/allocator_defrag.c` causes compilation failures on systems (e.g., Arch Linux with GCC 14.2.1) where `calloc` is marked as deprecated and `-Werror=deprecated` is enabled. ## Changes Replace the two `je_calloc` calls in `allocatorDefragInit()` with `zcalloc_num`, which is the proper Valkey allocation wrapper that provides the same semantics (num × size with zero-fill) without directly invoking the deprecated `calloc` symbol. ## Testing - Build compiles cleanly - Integration tests pass (unit/memefficiency, defrag, unit/other — 51 passed, 0 failed) Signed-off-by: jaduffy <jaduffy@amazon.com>	2026-04-30 15:59:19 -04:00
Jacob Murphy	81639e3975	fix: validate key count before allocating result in keyspec (#3598 ) In `getKeysUsingKeySpecs`, when extracting keys based on the `KSPEC_FK_KEYNUM `spec (like in the `EVAL` command), the server read the number of keys from the arguments and calculated the expected end index. However, it called `getKeysPrepareResult` to allocate memory for the result array before validating whether last was within the bounds of the actual arguments provided. If a client sent a command with a huge declared number of keys (e.g., `COMMAND GETKEYS EVAL "return 1" 2147483647 key1`), the server would allocate a massive amount of memory. Since `vm.overcommit_memory` is recommended, this allocation would NOT normally have triggered OOM (we never wrote to it so there is no physical memory allocated), but if you disable overcommit, this could trigger an OOM. You can reproduce it with: ``` $ prlimit --as=1073741824 src/valkey-server --save "" ... 384270:M 30 Apr 2026 04:27:24.456 * Ready to accept connections tcp ... <in valkey-cli> 127.0.0.1:6379> command getkeys eval "return 1" 2147483647 key1 ... <in server log> 384270:M 30 Apr 2026 04:29:26.950 # Out Of Memory allocating 17179869176 bytes! ``` ## Solution * Moved the bounds check `if (last >= argc \|\| last < first \|\| first >= argc)` to execute before the call to `getKeysPrepareResult`, preventing the large allocation on invalid input. * To further catch issues like this, protected against integer overflow during the calculation of last by using a long long temporary variable. If it exceeds INT_MAX or falls below INT_MIN, the spec is marked invalid immediately. Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2026-04-30 11:21:43 -07:00
abmathur-ie	7e2a2f7c4a	fix(cluster): Remove per-call srand in clusterManagerNodePrimaryRandom (#3586 ) clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every invocation, then immediately rand() % primary_count. When called in a tight loop for uncovered slots, all calls within the same wall-clock second produce the identical seed, causing every uncovered slot to be assigned to the same primary node. Remove the srand() call since the PRNG is already seeded at startup (srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to advance its state across calls, distributing uncovered slots randomly across available primaries. --------- Signed-off-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com>	2026-04-30 18:33:34 +03:00
Ping Xie	5b7ac66918	Fix verify-provenance action pin (#3594 )	2026-04-29 21:30:40 -07:00
Ping Xie	98724dda08	Update provenance action to refine layer2 exemption policies (#3593 )	2026-04-29 17:06:57 -07:00
abmathur-ie	678a06d216	Set errno on EOF in syncRead and propagate it in logs (#3580 ) When read() returns 0 (EOF/connection closed) in syncRead(), errno is not set by POSIX, so it retains a stale value (typically 0). This causes callers using connGetLastError() to log strerror(0) which is the misleading string "Success". Set errno = ECONNRESET on EOF in syncRead(), matching the existing pattern used for the timeout case (errno = ETIMEDOUT). Also set conn->last_errno = errno in connSocketSyncWrite, connSocketSyncRead, and connSocketSyncReadLine wrappers, matching the pattern used by their async counterparts connSocketWrite and connSocketRead. After this fix, replica logs will show: "I/O error reading bulk count from PRIMARY: Connection reset by peer" instead of the misleading: "I/O error reading bulk count from PRIMARY: Success" --------- Signed-off-by: Abhishek Mathur <matshek@amazon.com> Signed-off-by: djk1027 <djk9510271@gmail.com> Co-authored-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Daejun Kim <djk9510271@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2026-04-29 14:13:40 -07:00
bandalgomsu	7817ca8a73	Fix GEOSEARCH BYPOLYGON leak on invalid COUNT (#3568 ) Free BYPOLYGON points before returning from invalid COUNT parsing paths in GEOSEARCH/GEOSEARCHSTORE. Closes #3567 --------- Signed-off-by: Su Ko <rhtn1128@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2026-04-29 13:39:12 -04:00
Jim Brunner	ad404cd266	fix compile warning in util.c (#3585 ) Address this compile warning: ```c CC util.o util.c:638:1: warning: ‘no_sanitize’ attribute directive ignored [-Wattributes] __attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (string2ll_resolver(void))(const char , size_t, long long *) { ^~~~~~~~~~~~~ ``` Addresses portability concerns around these attributes. --------- Signed-off-by: Jim Brunner <brunnerj@amazon.com>	2026-04-29 09:31:04 -07:00
VoletiRam	39036c7c06	Add structured datasets loading capability in valkey benchmark (#2823 ) ## Background Add structured datasets loading capability. Support CSV and TSV file formats. Use `__field:fieldname__` placeholders to replace the corresponding fields from the dataset file. Support natural content size of varying length. Allow mixed placeholder usage combining dataset fields with random generators. Enable automatic field discovery from CSV/TSV headers. Use `--maxdocs` to limit the dataset loading. Rather than modifying the existing placeholder system, we detect field placeholders and switch to a separate code path that builds commands from scratch using `valkeyFormatCommandArgv()`. This ensures: - Zero impact on existing functionality - Full support for variable-size content - Thread-safe atomic record iteration - Compatible with pipelining and threading modes __Usage examples__ ```sh # Strings - Simple key-value with dataset fields ./valkey-benchmark --dataset products.csv -n 10000 SET product:__rand_int__ "__field:name__" # Sets - Unique collections from dataset ./valkey-benchmark --dataset categories.csv -n 10000 SADD tags:__rand_int__ "__field:category__" # CSV dataset with document limit ./valkey-benchmark --dataset wiki.csv --maxdocs 100000 -n 50000 HSET doc:__rand_int__ title "__field:title__" body "__field:abstract__" # Mixed placeholders (dataset + random) ./valkey-benchmark --dataset terms.csv -r 5000000 -n 50000 HSET search:__rand_int__ term "__field:term__" score __rand_1st__ ``` __Full-Text Search Benchmarking__ ```sh # Search hit scenarios (existing terms) ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__" # Search miss scenarios (non-existent terms) ./valkey-benchmark --dataset miss_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__" # Query variations ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "@title:__field:term__" ./valkey-benchmark --dataset search_terms.csv -n 50000 FT.SEARCH rd0 "__field:term__*" ``` __Benchmark Results__ Test environment: __Instance:__ AWS c7i.16xlarge, 64 vCPU Test Dataset: 5M+ Wikipedia XML documents, 5.8GB memory \| Configuration \| Throughput \| CPU Usage \| Wall Time \| Memory Peak \| \|---------------\|------------\|-----------\|-----------\|-------------\| \| Single-threaded, P1 \| 93,295 RPS \| 99% \| 71.4s \| 5.8GB \| \| Multi-threaded (10), P1 \| 93,332 RPS \| 137% \| 71.5s \| 5.8GB \| \| Single-threaded, P10 \| 274,499 RPS \| 96% \| 36.1s \| 5.8GB \| \| Multi-threaded (4), P10 \| 344,589 RPS \| 161% \| 32.4s \| 5.8GB \| --------- Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com> Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>	2026-04-29 09:18:37 -07:00
Jim Brunner	16ed690fec	fix LTO compilation warning in eval (#3584 ) I noticed this LTO compile warning in the eval code. Looks like it's getting confused about an sds length, even though checked above. Just added an assert to clarify. The warning: ```c LINK valkey-server eval.c: In function ‘evalExtractShebangFlags’: eval.c:263:27: warning: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=] out_engine = zcalloc(engine_name_len + 1); ^ zmalloc.c:324:7: note: in a call to allocation function ‘valkey_calloc’ declared here void zcalloc(size_t size) { ^ ``` Signed-off-by: Jim Brunner <brunnerj@amazon.com>	2026-04-28 23:13:56 -07:00
Binbin	bef46dacc1	Skip cluster resharding test under valgrind (#3574 ) This change was introduced in #3382. This test is already very slow on its own. Under valgrind it gets slow enough that the per-node restart step lets primaries be marked FAIL and triggers failovers, after which "Verify slaves consistency" no longer holds since it assumes the original topology. It was never run under valgrind before and exercises nothing valgrind meaningfully covers, so just tag it valgrind:skip. Signed-off-by: Binbin <binloveplay1314@qq.com>	2026-04-29 10:27:04 +08:00
Daejun Kim	8091c6c10a	Remove redundant count division in genericHgetallCommand (#3573 ) The argument `count /= 2` modifies `count` as a side effect, and the following `count /= 2` divides it again unnecessarily. Since `count` is not used after this point, fix it by using `count / 2` without the side effect and remove the redundant second assignment. Signed-off-by: djk1027 <djk9510271@gmail.com>	2026-04-28 11:43:56 +03:00
Jun Yeong Kim	ff80b2d1dc	Migrate the remaining cluster tests to the new framework and remove legacy files (#2297 ) (#3382 ) Migrated the remaining cluster tests to tests/unit/cluster/ to use the same framework for all cluster tests. Cleaned up the obsolete cluster test framework files and updated the CI workflows to use the new unified test runner. Changes: Moved and mapped 6 test files: - 03-failover-loop.tcl → Merged into existing failover.tcl - 04-resharding.tcl → resharding.tcl - 12-replica-migration-2.tcl + 12.1-replica-migration-3.tcl → replica-migration-slow.tcl - 07-replica-migration.tcl → Merged into existing replica-migration.tcl - 28-cluster-shards.tcl → Merged into existing cluster-shards.tcl Other changes: - Converted old framework APIs (e.g., K, RI) to new framework APIs (e.g., R, srv) - Added process_is_alive check in cluster_util.tcl to fix an exception in failover tests caused by executing ps on dead processes - Heavy tests (resharding, replica-migration-slow) marked with slow tag and wrapped in run_solo to prevent resource contention in sanitizer environments - replica-migration-slow marked with valgrind:skip tag since it is very slow - Removed the entire tests/cluster/ directory including run.tcl, cluster.tcl, includes/, and helpers/ - Kept runtest-cluster as a wrapper script (exec ./runtest --cluster "$@") - Removed ./runtest-cluster calls from .github/workflows/daily.yml as cluster tests are now included in ./runtest Closes #2297. Signed-off-by: Jun Yeong Kim <junyeonggim5@gmail.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>	2026-04-27 17:31:37 +08:00
eifrah-aws	6dbb7f81a9	Fix remove cached eval scripts on engine unregister (#3503 ) Remove eval script cache entries that belong to a scripting engine when that engine is unregistered. This prevents the eval cache from retaining dangling engine pointers and keeps the tracked script memory in sync after engine shutdown. The scripting engine unregister path now invokes a new eval cleanup helper, which scans the cached scripts, drops matching entries from the LRU list and dictionary, and adjusts cache memory accounting accordingly. * scripting engine * eval cache Signed-off-by: Eran Ifrah <eifrah@amazon.com>	2026-04-27 11:29:20 +08:00
Jacob Murphy	28ecbd204f	Ensure client slot migration pointer is cleared during reset (#3554 ) If not cleared, the job may no longer be valid by the time the client goes to cleanup. This dangling reference could cause a crash if you set slot-migration-log-max-len to 0 and are very unlucky. Signed-off-by: Jacob Murphy <jkmurphy@google.com>	2026-04-27 11:05:35 +08:00
Binbin	a3e44a55d3	Fix lua-enable-insecure-api default value cannot be changed to yes (#3548 ) The default value of lua-enable-insecure-api cannot be safely changed from no to yes due to two issues: 1. In createEngineContext(), lua_enable_insecure_api was hardcoded to 0 before initializing Lua states, so deprecated APIs (newproxy, setfenv, getfenv) were never registered in the global table regardless of the actual config value. Once the global table is locked, the config change has no effect. 2. lua_insecure_api_current was initialized to 0 (struct zero-init) and never synced with the final config value. If the default was changed to yes(1), a subsequent CONFIG SET no would see both values as 0 and skip the evalReset() call in updateLuaEnableInsecureApi(). Fix by reading the real config via isLuaInsecureAPIEnabled() in createEngineContext() before Lua state initialization, and syncing lua_insecure_api_current after all config sources (default, config file, command-line args) are applied. Signed-off-by: Binbin <binloveplay1314@qq.com>	2026-04-27 11:04:14 +08:00
Binbin	ac9ca9de3d	Fix rdmaServer leaks when create listen cm id error (#3557 ) In here we should go to error to free the resources: ``` error: if (listen_cmid) rdma_destroy_id(listen_cmid); if (listen_channel) rdma_destroy_event_channel(listen_channel); ret = ANET_ERR; end: freeaddrinfo(servinfo); return ret; } ``` Signed-off-by: Binbin <binloveplay1314@qq.com>	2026-04-27 11:01:52 +08:00
charsyam	bb88665578	hashtable: fix dismissHashtable madvise size (#3533 ) The bug was in dismissHashtable(), which computes the size passed to zmadvise_dontneed() for the top-level hashtable tables. ht->tables[i] points to a contiguous array of bucket objects, but the code used sizeof(bucket ) instead of sizeof(bucket) when calculating the length. That means it treated the allocation like an array of pointers rather than an array of buckets. As a result, the advised range was much smaller than the actual table allocation. On 64-bit builds, bucket is 64 bytes while bucket is 8 bytes, so only about one eighth of the table was covered. This does not usually break correctness, but it defeats the purpose of the function: after a fork, we want to tell the kernel that the hashtable pages are no longer needed so we reduce copy-on-write overhead. With the wrong size, most of the table memory was never included in that hint. The fix is to use sizeof(bucket) so the full top-level bucket array is passed to zmadvise_dontneed(). Signed-off-by: DaeMyung Kang <charsyam@gmail.com>	2026-04-26 19:45:50 -07:00
Hanxi Zhang	edc0d26ada	Strip LTO flags from static Lua module build (#3555 ) ### Summary The daily CI sanitizer jobs with clang are failing during the build step. When the static Lua module is built with `-flto`, the `.o` files contain LLVM bitcode that gets archived into `libvalkeylua.a`. The system linker cannot read this bitcode, causing build failures: `/usr/bin/ld: /home/runner/work/valkey/valkey/src/modules/lua/libvalkeylua.a: member /home/runner/work/valkey/valkey/src/modules/lua/libvalkeylua.a(debug_lua.o) in archive is not an object` The previous fix (#3546) pinned clang to version 17, but this was insufficient, the issue is not just a version mismatch but that the system linker fundamentally cannot read LTO bitcode from `.a` archives. Example failure: https://github.com/valkey-io/valkey/actions/runs/24865821147/job/72801509768 ### Fix Strip LTO flags from OPTIMIZATION in the Lua module Makefile using `override` Tested: https://github.com/hanxizh9910/valkey/actions/runs/24913834442 --------- Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>	2026-04-26 19:28:21 -07:00
Ping Xie	c861184762	Implement Provenance Guard (#3109 ) This PR bootstraps Valkey's provenance guard integration. The provenance guard is a content-based similarity detection system that helps maintain proper code provenance by comparing incoming PR changes against fingerprint databases built from Redis commits and PRs. The matching logic now lives in the external `valkey-io/verify-provenance` action repository; this PR wires Valkey to that action and seeds the required database branch. Key features: * Content-based detection: Uses normalized diff fingerprints and fuzzy matching to detect similar changes, including cases where files have moved or been refactored. * Externalized action logic: The check and refresh implementation is maintained in `valkey-io/verify-provenance` and is pinned by exact commit SHA from Valkey workflows. * Provenance Guard workflow: Runs on PR activity to check incoming changes against the provenance databases and report potential matches. * Daily Refresh workflow: Runs daily to refresh PR fingerprints and commits updated data back to `verify-provenance-db`. * Dedicated DB branch: Stores provenance databases on the orphan `verify-provenance-db` branch, separate from Valkey source code. * Privacy-first storage: Stores compressed non-reversible fingerprints, not source code. The initial `verify-provenance-db` branch has been bootstrapped with fingerprints of Redis commits and PRs. --------- Signed-off-by: Ping Xie <pingxie@outlook.com>	2026-04-26 14:36:18 -07:00
Rain Valentine	a7d495352a	extra UT Signed-off-by: Rain Valentine <rsg000@gmail.com>	2026-04-24 15:40:54 -07:00
Rain Valentine	54980ece3a	hashtable iterator safety: invalidate on exhaustion Signed-off-by: Rain Valentine <rsg000@gmail.com>	2026-04-24 15:40:54 -07:00
Sarthak Aggarwal	d2db0c268c	Fix module commandresult event cleanup during unsubscribe and module unload (#3545 ) This follows up on the commandresult API work and fixes cleanup around unsubscribe and module unload. The main issue was that command-result event listeners could leave stale state behind. On unload, we removed the listeners themselves but didn’t fully update the fast-path listener counters. Separately, unsubscribing with a NULL callback could behave badly if the listener wasn’t present anymore. In practice, that meant later commands could still walk into command-result event handling after the module was supposed to be cleaned up. Failed in Daily as well yesterday: https://github.com/valkey-io/valkey/actions/runs/24753491944/job/72421581610#step:10:852 Related Failures: https://github.com/valkey-io/valkey/pull/2936#issuecomment-4290490199 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2026-04-23 19:10:20 -07:00
Harkrishn Patro	cb2cfdd4e0	Revert "Pin clang to version 17 in sanitizer CI jobs" (#3556 ) Reverts valkey-io/valkey#3546 This didn't help fix the build issue. Follow up PR is performed on https://github.com/valkey-io/valkey/pull/3555 Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>	2026-04-23 19:01:52 -07:00
Hanxi Zhang	7db5b70737	Pin clang to version 17 in sanitizer CI jobs (#3546 ) ### Analysis The daily CI sanitizer jobs with clang are failing during the build step. The `ubuntu-latest` runner now has clang 18, but the LLVM gold plugin is still version 17. When the static Lua module is built with `-flto`, the `.o` files contain LLVM 18 bitcode that the gold plugin (v17) cannot read: `bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (91) (Producer: 'LLVM18.1.3' Reader: 'LLVM 17.0.6') ` Example failure: https://github.com/valkey-io/valkey/actions/runs/24753491944/job/72421581512 ### Fix Pin the sanitizer jobs to `clang-17` so the compiler and gold plugin versions match. Tested(successfully built): https://github.com/hanxizh9910/valkey/actions/runs/24859845008 ### Note If `clang-17` is removed from the `ubuntu-latest` image in the future, we may need to either add an explicit install step Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>	2026-04-23 16:14:24 -07:00
Sarthak Aggarwal	5abf79e0e3	Add zmalloc_aligned() and fix SPMC queue buffer alignment (#3504 ) The SPMC queue from #3324 needs each `spmcCell` to be cache-line aligned, but plain `zmalloc()` does not guarantee that in all build configurations. This change introduces `zmalloc_cache_aligned()` and uses it for the SPMC queue buffer allocation in `spmcInit()`. Failing CI: https://github.com/valkey-io/valkey/actions/runs/24374139344 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>	2026-04-23 11:46:22 -07:00
charsyam	9709843446	Optimize HGETDEL to pause auto shrink when deleting multiple items (#3535 ) Match HGETDEL with the existing batch-delete pattern used by HDEL. HDEL already pauses hashtable auto-shrink while deleting multiple fields so shrink evaluation is deferred until the batch completes. HGETDEL was missing the same optimization even though it also deletes fields in a loop. Pause auto-shrink for hashtable-encoded hashes before the HGETDEL delete loop and resume it once afterwards. This preserves observable behavior and reduces redundant shrink work for multi-field deletes. Same as #3144. Signed-off-by: DaeMyung Kang <charsyam@gmail.com>	2026-04-23 12:56:52 +08:00
Madelyn Olson	651c40a89e	Fix FD leak in connSocketBlockingConnect on timeout (#3541 ) ## Summary Fix a file descriptor leak in `connSocketBlockingConnect()` when `aeWait()` times out. ## Bug When `anetTcpNonBlockConnect()` succeeds but `aeWait()` times out (e.g., MIGRATE to an unreachable host), the fd is leaked because it was never assigned to `conn->fd`. The caller's `connClose()` checks `conn->fd != -1` and skips cleanup. ## Fix Assign `conn->fd = fd` immediately after `anetTcpNonBlockConnect()` succeeds, before `aeWait()`. This way the caller's normal `connClose()` cleanup path handles the fd on any error, which is consistent with how the rest of the connection lifecycle works. TLS connections also benefit since `connTLSBlockingConnect` delegates to this function for the TCP layer. ## Reproducer ``` valkey-cli SET key hello # Repeat against unreachable host: for i in $(seq 1 30); do valkey-cli MIGRATE 192.0.2.1 6379 key 0 500; done # Check: /proc/<pid>/fd shows 30 leaked socket fds ``` This issue was generated by AI but verified, with love, by a human. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2026-04-23 12:34:34 +08:00
Binbin	c403eecd5b	Fix double free in stream consumer PEL loading with corrupt RDB data (#3498 ) There is a double free issue in the code. The error handling path called both decrRefCount(o) and streamFreeNACK(nack), but the nack was obtained from cgroup->pel via raxFind and is still referenced there. decrRefCount(o) frees it through freeStream -> streamFreeCG -> raxFreeWithCallback(cg->pel, zfree), so the explicit streamFreeNACK(nack) causes a double free. Remove the redundant streamFreeNACK(nack) call and add a regression test with a crafted corrupt payload that triggers the duplicate consumer PEL entry path. This was introduced in `492d8d0961`. Signed-off-by: Binbin <binloveplay1314@qq.com>	2026-04-23 12:32:10 +08:00
Roshan Khatri	04896c1e6d	Deflake many-slot-migration under valgrind (#3462 ) ## Problem `Fix cluster` in `tests/unit/cluster/many-slot-migration.tcl` has been timing out daily on valgrind jobs since April 3, 2026. The test runs 10 cluster nodes under valgrind, migrating 40,000 keys across 1,000 slots — too much work for valgrind-instrumented builds. The slowdown is caused by #3366 (dict→hashtable wrapper). Under `-O0` (valgrind builds), the `static inline` wrappers become real function calls that valgrind instruments, adding ~75% overhead to hot paths like `dictSize`. This compounds across 10 valgrind processes over a 20-minute migration test. No impact on production builds (`-O2` inlines everything). ## Fix Scale the test workload down under valgrind: 10,000 keys / 250 slots instead of 40,000 / 1,000. Normal runs are unchanged. Still exercises the same cluster repair path. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: sarthakaggarwal97 <sarthakaggarwal97@users.noreply.github.com>	2026-04-23 12:31:32 +08:00
Deepak Nandihalli	3ab9d9797e	Fix race condition during async client freeing with IO threading enabled (#3458 ) When close_asap flag is set, set bytes read to 0 In the readToQueryBuf, the c->nread represents the number of bytes read. When close_asap flag is set, there is a bug where the c->nread isn't reset to 0 and this breaks the invariant. IOThreads then incorrectly think there is data to read and results in a crash. This change fixes this bug. To elaborate on the race possible: 1. Let's say that a IO thread job for reading query from a client got enqueued as part of a epoll - https://github.com/valkey-io/valkey/blob/unstable/src/io_threads.c#L417. 2. Later the client gets freed async and is marked as close_asap - https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L2175 3. While processing the io_thread job for the client, it invokes iothreadReadQueryFromClient. Here, [`readToQueryBuf`](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6497) returns as a no-op since the client is marked close-asap. Also, the c->nread is not reset to 0 and count contain the value from a previous read. 4. Later parseInputBuffer [gets invoked](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6514). 5. The parseInputBuffer then [accesses the query_buf](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L3864). The query_buf here would be null in resetSharedQueryBuf as part of beforeNextClient. Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>	2026-04-22 17:39:44 -07:00
Sarthak Aggarwal	03c2d4c2a2	Stabilize diskless no-drop replication test (#3511 ) This deflakes all variants of `diskless replicas drop during rdb pipe`. The main issue turned out to be that the test was too sensitive to timing and log ordering under TLS, not that the core behavior was wrong. This keeps the same five subcases (no, slow, fast, all, timeout) but makes them much less CI-fragile. CI passes 200 times: https://github.com/sarthakaggarwal97/valkey/actions/runs/24547258515 --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com> Co-authored-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com>	2026-04-22 00:14:18 +02:00
Yang Zhao	fc00f7be03	Fix VLA warning in io_threads (#3518 ) https://github.com/valkey-io/valkey/pull/3324 introduced `BATCH_SIZE` as a const int local variable and used it as an array bound. Clang 17 rejects this with: ``` io_threads.c:305:22: error: variable length array folded to constant array as an extension [-Werror,-Wgnu-folding-constant] 305 \| void batch_jobs[BATCH_SIZE]; \| ^~~~~~~~~~ 1 error generated. make[1]: [io_threads.o] Error 1 make: * [all] Error 2 ``` Old Clang versions do not emit this warning, maybe that is why the CI passed. Fix by promoting `BATCH_SIZE` to a file-scope `#define`. Signed-off-by: Yang Zhao <zymy701@gmail.com>	2026-04-21 13:07:58 -07:00
martinrvisser	6444717517	Module command result callback addition (#2936 ) ## Add Command Result Event Notifications for Modules ### Summary 1. Adds new server events `ValkeyModuleEvent_CommandResultSuccess` and `ValkeyModuleEvent_CommandResultFailure` for that can notify subscribed modules after command execution. This enables modules to implement audit logging, error monitoring, performance tracking, and observability without modifying core server code. 2. Adds new server event `ValkeyModuleEvent_CommandResultACLDenied` for commands rejected by ACL. Together with PR #2237 this covers auditing of authentication and authorisation. ### Motivation There is currently no module API to observe command outcomes after execution or to capture ACL denied commands. Modules that need audit logging or error monitoring have no mechanism to be notified when commands succeed or fail, what arguments were used, how long they took, or how many keys were modified. This feature fills that gap using the existing `ValkeyModule_SubscribeToServerEvent()` infrastructure. ### API #### Events \| Event \| Description \| \|---\|---\| \| `ValkeyModuleEvent_CommandResultSuccess` \| Fired after a command completes successfully \| \| `ValkeyModuleEvent_CommandResultFailure` \| Fired after a command returns an error \| \| `ValkeyModuleEvent_CommandACLDenied` \| Fired after a command is rejected by ACL \| These are separate events (not sub-events), so modules can for example only subscribe to failures without incurring any callback overhead for successful commands. #### Event Data: `ValkeyModuleCommandResultInfo` The `data` pointer passed to the callback can be cast to `ValkeyModuleCommandResultInfo`: ```c typedef struct ValkeyModuleCommandResultInfo { uint64_t version; /* Version of this structure for ABI compat. / const char command_name; /* Full command name (e.g., "SET", "CLIENT\|LIST"). / long long duration_us; / Execution duration in microseconds. / long long dirty; / Number of keys modified. / uint64_t client_id; / Client ID that executed the command. / int is_module_client; / 1 if command was from RM_Call, 0 otherwise. / int argc; / Number of command arguments. / ValkeyModuleString argv; / Command arguments array (zero-copy, read-only). / int acl_deny_reason; / ACL_DENIED_CMD/KEY/CHANNEL/AUTH; 0 for non-ACL events / const char acl_object; /* Denied resource name (key/channel); NULL for CMD/AUTH / } ValkeyModuleCommandResultInfoV1; ``` The struct is versioned (`VALKEYMODULE_COMMANDRESULTINFO_VERSION`) for forward-compatible API evolution. ### Usage Example ```c / Callback receives events for whichever event(s) you subscribed to / void OnCommandResult(ValkeyModuleCtx ctx, ValkeyModuleEvent eid, uint64_t subevent, void data) { VALKEYMODULE_NOT_USED(ctx); VALKEYMODULE_NOT_USED(subevent); ValkeyModuleCommandResultInfo info = (ValkeyModuleCommandResultInfo )data; if (info->version != VALKEYMODULE_COMMANDRESULTINFO_VERSION) return; int failed = (eid.id == VALKEYMODULE_EVENT_COMMAND_RESULT_FAILURE); / Access fields directly / printf("command=%s status=%s duration=%lldus dirty=%lld client=%llu\n", info->command_name, failed ? "FAIL" : "OK", info->duration_us, info->dirty, info->client_id); / Access argv (read-only, zero-copy) / for (int i = 0; i < info->argc; i++) { size_t len; const char arg = ValkeyModule_StringPtrLen(info->argv[i], &len); printf(" argv[%d] = %.s\n", i, (int)len, arg); } } / Subscribe in ValkeyModule_OnLoad or at runtime / / Option A: command failures only (recommended for audit logging) / ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultFailure, OnCommandResult); / Option B: command successes only / ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultSuccess, OnCommandResult); / Option C: both command outcomes/ ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultSuccess, OnCommandResult); ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultFailure, OnCommandResult); / Subscribe to ACL Denied / ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultACLDenied, onCommandResult); / Unsubscribe pass NULL callback / ValkeyModule_SubscribeToServerEvent(ctx, ValkeyModuleEvent_CommandResultFailure, NULL); ``` ### Design Decisions - Separate events instead of sub-events: Modules subscribing only to failures have zero overhead for successful commands (~2ns listener-list check vs ~30ns callback invocation per command). This is critical since success events fire on the hot path of every command. - Stack-allocated info struct: The `ValkeyModuleCommandResultInfoV1` is built on the stack ΓÇö no heap allocation per event. - Zero-copy argv: Arguments are passed directly from the client's argv array. Any integer-encoded arguments (from `tryObjectEncoding()` during command execution) are decoded to string-encoded objects before being passed to the callback, ensuring compatibility with `ValkeyModule_StringPtrLen()`. - Early exit: If no modules are subscribed to any server events, the event firing function returns immediately before building the info struct. - Uses existing server event infrastructure*: Follows the `ValkeyModule_SubscribeToServerEvent()` pattern used by all other server events, rather than introducing a new callback mechanism. ### Files Changed \| File \| Change \| \|---\|---\| \| `src/valkeymodule.h` \| Event IDs, event constants, `ValkeyModuleCommandResultInfoV1` struct \| \| `src/module.c` \| `moduleFireCommandResultEvent()`, event documentation, event version entries \| \| `src/module.h` \| Function declaration \| \| `src/server.c` \| Call `moduleFireCommandResultEvent()` from `call()` after command execution \| \| `src/server.c` \| Call to `moduleFireCommandACLDeniedEvent` in `processCommand` after ACL rejection \| \| `tests/modules/commandresult.c` \| Test module exercising the full API \| \| `tests/unit/moduleapi/commandresult.tcl` \| Integration tests \| --------- Signed-off-by: martinrvisser <mvisser@hotmail.com> Signed-off-by: martinrvisser <martinrvisser@users.noreply.github.com> Co-authored-by: Ricardo Dias <rjd15372@gmail.com>	2026-04-21 09:14:14 -04:00
Dietrich Daroch	9d51f5ff8a	Document VALKEYCLI_HOST/PORT variables in help (#3520 ) Follow-up to #3402 as we missed documenting this. --- Signed-off-by: Dietrich Daroch <Dietrich@Daroch.me>	2026-04-21 11:52:54 +02:00
eifrah-aws	0327c27131	Add Static Module Support (#3392 ) Add a build option to compile the Lua scripting engine as a static module and wire the server to load it directly at startup when enabled. The module load path now resolves on-load and on-unload entry points from the main binary, and the module lifecycle keeps those callbacks so unload works without a shared library handle. The Lua module build was updated to support both static and shared variants, with the static path exporting visible wrapper symbols and linking the server with the module archive. While touching the Lua code, a few internal symbols were renamed for consistency and the monotonic time helper was clarified. Note that this PR addresses the LUA module, but it can be applied to other "core" modules (like: Bloom, Json, Search and others). With this change, it will be easier to ship Valkey bundle with modules. Areas touched: * CMake * Makefile * Lua scripting module * Core module loading Generated by CodeLite --------- Signed-off-by: Eran Ifrah <eifrah@amazon.com>	2026-04-20 14:45:57 +03:00
Daniil Kashapov	269b1c5eda	Improve COB memory tracking with copy avoidance (#3306 ) This improves COB memory tracking when using copy avoidance for bulk string replies. This fix addresses underestimation of client memory usage that occurred when reply buffers stored pointers to shared `robj` instead of copying data. IO threads calculate actual reply sizes by calling `sdslen()` on strings before writing, for that we need atomic `tracked_for_cob` flag in payload headers to prevent race conditions and double accounting. See #2396 --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>	2026-04-20 14:45:51 +03:00
Madelyn Olson	4a42c95853	Fix HPERSIST RESP protocol violation on wrong-type key (#3516 ) `hpersistCommand` calls `addReplyArrayLen` before `lookupKeyWrite` + `checkType`. When HPERSIST targets a non-hash key, the server writes a RESP array header followed by a WRONGTYPE error — a malformed response that permanently desynchronizes the client connection. This moves `lookupKeyWrite` + `checkType` before `addReplyArrayLen`, matching the pattern used by every other HFE command (e.g. `hgetdelCommand`, `hexpireGenericCommand`). Added a test for HPERSIST on a wrong-type key. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2026-04-17 17:02:04 -07:00
Sarthak Aggarwal	109ef346f3	[Flaky Tests] Avoid re-triggering io-thread activation (#3509 ) The test was accidentally waking the IO threads while trying to check that they had gone idle. After the recent IO-thread refactor in #3324, the [test](https://github.com/valkey-io/valkey/pull/3324/changes#diff-21314ec3a338f739eab1536f91f528d1efe7c6a93935a71b9c02f77a3858f121R112) started forcing `io-threads-always-active`, and its repeated `INFO` polling counted as fresh activity. So instead of just observing the worker threads, the test kept reactivating them and then flaked. --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com> Co-authored-by: Sarthak Aggarwal <25262500+sarthakaggarwal97@users.noreply.github.com>	2026-04-17 17:00:43 -07:00
Roshan Khatri	b2d08c9ef9	Fix use-after-unload crash in test auth module's blocking thread (#3464 ) ## Problem The test `Test module aof save on server start from empty` in `tests/unit/moduleapi/hooks.tcl` sporadically crashes with `I/O error reading reply`. Frequency: 2 out of 15 days (March 26 on `centosstream9-tls-module-no-tls`, April 8 on `fedorarawhide-jemalloc`). Example failing run: https://github.com/valkey-io/valkey/actions/runs/24110987718/job/70345236353 ## Root Cause The crash is a use-after-unload in the auth test module's blocking authentication thread, NOT a timing issue in the AOF test. The crash log from April 8 shows: ``` 71112:M 00:42:59.710 * Module testacl unloaded 71112:M 00:42:59.711 # crashed by signal: 11, si_code: 1 71112:M 00:42:59.711 # Crashed running the instruction at: 0x7f9dc717384b ``` The sequence: 1. `blocking_auth_cb` spawns a background thread (`AuthBlock_ThreadMain`) that sleeps 500ms 2. Thread wakes, calls `ValkeyModule_UnblockClient()` → main thread processes unblock, decrements `module->blocked_clients` 3. Auth command completes, test calls `r module unload testacl` 4. `moduleUnloadInternal` checks `blocked_clients == 0` if true, proceeds with `dlclose()` 5. But the background thread is still executing cleanup code (freeing strings, returning from function) 6. Thread returns into unmapped memory → SIGSEGV The `invalidFunctionWasCalled` in the stack trace is the crash handler's safety stub, and the crashing address `0x7f9dc717384b` is in the unmapped auth.so address space. ## Fix Track the background thread ID and `pthread_join()` it in `ValkeyModule_OnUnload` before the module is dlclose'd. This ensures the thread has fully exited before the code is unmapped. The key insight is that `ValkeyModule_UnblockClient()` signals "auth is done" but not "thread is done" — the thread still has cleanup code to execute after that call. `pthread_join()` is the correct synchronization point because it only returns after the thread has fully exited. No mutex is needed since both `blocking_auth_cb` (which creates the thread) and `OnUnload` (which joins it) run on the main event loop thread. Changes to `tests/modules/auth.c`: - Add global `blocking_auth_tid` and `blocking_auth_tid_valid` flag - Set `blocking_auth_tid_valid = 1` after successful `pthread_create` - In `OnUnload`, `pthread_join` the thread if one was created ## Testing Ran `unit/moduleapi/hooks` 100 loops on rpm-distros and ubuntu runners — all passed: - Workflow run: https://github.com/roshkhatri/valkey/actions/runs/24164276124 - Config: `--loops 100 --single unit/moduleapi/hooks` on `almalinux8`, `almalinux9`, `fedoralatest`, `fedorarawhide`, `centosstream9`, `ubuntu-jemalloc`, `ubuntu-arm` - Result: 7/7 jobs ✅, zero failures across 700 total test iterations Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>	2026-04-17 10:05:03 -07:00
Viktor Söderqvist	8a91a12398	Unique samples in hashtableSampleEntries (#3460 ) Instead of "scanning" random bucket chains using a random cursor for each scan call, start at a random cursor and then continue sampling buckets in scan order. The scan stops when we have sampled all elements, so the cursor never wraps around to sample the same buckets again. This ensures that we don't get any duplicate samples. The functions hashtableRandomEntry and hashtableFairRandomEntry keeps the old behavior using another sampling function preserving their behavior. The fairness tests fail if we use the modified hashtableSampleEntries. This restores the behavior of dictGetSomeKeys (which is now an alias of hashtableSampleEntries) and deflakes the test case: Gossip count scales with higher percentage of `cluster-message-gossip-perc` in tests/unit/cluster/packet.tcl Fixes #3454 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2026-04-16 16:53:47 +02:00

1 2 3 4 5 ...

13778 Commits