When read() returns 0 (EOF/connection closed) in syncRead(), errno is
not set by POSIX, so it retains a stale value (typically 0). This causes
callers using connGetLastError() to log strerror(0) which is the
misleading string "Success".
Set errno = ECONNRESET on EOF in syncRead(), matching the existing
pattern used for the timeout case (errno = ETIMEDOUT).
Also set conn->last_errno = errno in connSocketSyncWrite,
connSocketSyncRead, and connSocketSyncReadLine wrappers, matching the
pattern used by their async counterparts connSocketWrite and
connSocketRead.
After this fix, replica logs will show:
"I/O error reading bulk count from PRIMARY: Connection reset by peer"
instead of the misleading:
"I/O error reading bulk count from PRIMARY: Success"
---------
Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Signed-off-by: djk1027 <djk9510271@gmail.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Daejun Kim <djk9510271@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Free BYPOLYGON points before returning from invalid COUNT parsing paths
in GEOSEARCH/GEOSEARCHSTORE.
Closes#3567
---------
Signed-off-by: Su Ko <rhtn1128@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Address this compile warning:
```c
CC util.o
util.c:638:1: warning: ‘no_sanitize’ attribute directive ignored [-Wattributes]
__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) {
^~~~~~~~~~~~~
```
Addresses portability concerns around these attributes.
---------
Signed-off-by: Jim Brunner <brunnerj@amazon.com>
I noticed this LTO compile warning in the eval code. Looks like it's
getting confused about an sds length, even though checked above. Just
added an assert to clarify.
The warning:
```c
LINK valkey-server
eval.c: In function ‘evalExtractShebangFlags’:
eval.c:263:27: warning: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
*out_engine = zcalloc(engine_name_len + 1);
^
zmalloc.c:324:7: note: in a call to allocation function ‘valkey_calloc’ declared here
void *zcalloc(size_t size) {
^
```
Signed-off-by: Jim Brunner <brunnerj@amazon.com>
This change was introduced in #3382. This test is already very slow on
its own. Under valgrind it gets slow enough that the per-node restart
step lets primaries be marked FAIL and triggers failovers, after which
"Verify slaves consistency" no longer holds since it assumes the original
topology.
It was never run under valgrind before and exercises nothing valgrind
meaningfully covers, so just tag it valgrind:skip.
Signed-off-by: Binbin <binloveplay1314@qq.com>
The argument `count /= 2` modifies `count` as a side effect, and the
following `count /= 2` divides it again unnecessarily.
Since `count` is not used after this point, fix it by using `count / 2`
without the side effect and remove the redundant second assignment.
Signed-off-by: djk1027 <djk9510271@gmail.com>
Migrated the remaining cluster tests to tests/unit/cluster/ to use the same
framework for all cluster tests. Cleaned up the obsolete cluster test framework
files and updated the CI workflows to use the new unified test runner.
Changes:
Moved and mapped 6 test files:
- 03-failover-loop.tcl → Merged into existing failover.tcl
- 04-resharding.tcl → resharding.tcl
- 12-replica-migration-2.tcl + 12.1-replica-migration-3.tcl →
replica-migration-slow.tcl
- 07-replica-migration.tcl → Merged into existing replica-migration.tcl
- 28-cluster-shards.tcl → Merged into existing cluster-shards.tcl
Other changes:
- Converted old framework APIs (e.g., K, RI) to new framework APIs (e.g., R, srv)
- Added process_is_alive check in cluster_util.tcl to fix an exception in
failover tests caused by executing ps on dead processes
- Heavy tests (resharding, replica-migration-slow) marked with slow tag and
wrapped in run_solo to prevent resource contention in sanitizer environments
- replica-migration-slow marked with valgrind:skip tag since it is very slow
- Removed the entire tests/cluster/ directory including run.tcl, cluster.tcl,
includes/, and helpers/
- Kept runtest-cluster as a wrapper script (exec ./runtest --cluster "$@")
- Removed ./runtest-cluster calls from .github/workflows/daily.yml as cluster
tests are now included in ./runtest
Closes#2297.
Signed-off-by: Jun Yeong Kim <junyeonggim5@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Remove eval script cache entries that belong to a scripting engine when
that engine is unregistered. This prevents the eval cache from retaining
dangling engine pointers and keeps the tracked script memory in sync
after engine shutdown.
The scripting engine unregister path now invokes a new eval cleanup
helper, which scans the cached scripts, drops matching entries from the
LRU list and dictionary, and adjusts cache memory accounting accordingly.
* scripting engine
* eval cache
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
If not cleared, the job may no longer be valid by the time the client
goes to cleanup. This dangling reference could cause a crash if you set
slot-migration-log-max-len to 0 and are very unlucky.
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
The default value of lua-enable-insecure-api cannot be safely changed
from no to yes due to two issues:
1. In createEngineContext(), lua_enable_insecure_api was hardcoded to 0
before initializing Lua states, so deprecated APIs (newproxy, setfenv,
getfenv) were never registered in the global table regardless of the
actual config value. Once the global table is locked, the config
change has no effect.
2. lua_insecure_api_current was initialized to 0 (struct zero-init) and
never synced with the final config value. If the default was changed
to yes(1), a subsequent CONFIG SET no would see both values as 0 and
skip the evalReset() call in updateLuaEnableInsecureApi().
Fix by reading the real config via isLuaInsecureAPIEnabled() in
createEngineContext() before Lua state initialization, and syncing
lua_insecure_api_current after all config sources (default, config file,
command-line args) are applied.
Signed-off-by: Binbin <binloveplay1314@qq.com>
In here we should go to error to free the resources:
```
error:
if (listen_cmid) rdma_destroy_id(listen_cmid);
if (listen_channel) rdma_destroy_event_channel(listen_channel);
ret = ANET_ERR;
end:
freeaddrinfo(servinfo);
return ret;
}
```
Signed-off-by: Binbin <binloveplay1314@qq.com>
The bug was in dismissHashtable(), which computes the size passed to
zmadvise_dontneed() for the top-level hashtable
tables.
ht->tables[i] points to a contiguous array of bucket objects, but the
code used sizeof(bucket *) instead of
sizeof(bucket) when calculating the length. That means it treated the
allocation like an array of pointers rather than an
array of buckets.
As a result, the advised range was much smaller than the actual table
allocation. On 64-bit builds, bucket is 64 bytes
while bucket * is 8 bytes, so only about one eighth of the table was
covered. This does not usually break correctness,
but it defeats the purpose of the function: after a fork, we want to
tell the kernel that the hashtable pages are no
longer needed so we reduce copy-on-write overhead. With the wrong size,
most of the table memory was never included in
that hint.
The fix is to use sizeof(bucket) so the full top-level bucket array is
passed to zmadvise_dontneed().
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
### Summary
The daily CI sanitizer jobs with clang are failing during the build
step.
When the static Lua module is built with `-flto`, the `.o` files contain
LLVM bitcode that gets archived into `libvalkeylua.a`. The system linker
cannot read this bitcode, causing build failures:
`/usr/bin/ld:
/home/runner/work/valkey/valkey/src/modules/lua/libvalkeylua.a: member
/home/runner/work/valkey/valkey/src/modules/lua/libvalkeylua.a(debug_lua.o)
in archive is not an object`
The previous fix (#3546) pinned clang to version 17, but this was
insufficient, the issue is not just a version mismatch but that the
system linker fundamentally cannot read LTO bitcode from `.a` archives.
Example failure:
https://github.com/valkey-io/valkey/actions/runs/24865821147/job/72801509768
### Fix
Strip LTO flags from OPTIMIZATION in the Lua module Makefile using
`override`
Tested: https://github.com/hanxizh9910/valkey/actions/runs/24913834442
---------
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
This PR bootstraps Valkey's provenance guard integration.
The provenance guard is a content-based similarity detection system that helps maintain proper code provenance by comparing incoming PR changes against fingerprint databases built from Redis commits and PRs. The matching logic now lives in the external `valkey-io/verify-provenance` action repository; this PR wires Valkey to that action and seeds the required database branch.
Key features:
* Content-based detection: Uses normalized diff fingerprints and fuzzy matching to detect similar changes, including cases where files have moved or been refactored.
* Externalized action logic: The check and refresh implementation is maintained in `valkey-io/verify-provenance` and is pinned by exact commit SHA from Valkey workflows.
* Provenance Guard workflow: Runs on PR activity to check incoming changes against the provenance databases and report potential matches.
* Daily Refresh workflow: Runs daily to refresh PR fingerprints and commits updated data back to `verify-provenance-db`.
* Dedicated DB branch: Stores provenance databases on the orphan `verify-provenance-db` branch, separate from Valkey source code.
* Privacy-first storage: Stores compressed non-reversible fingerprints, not source code.
The initial `verify-provenance-db` branch has been bootstrapped with fingerprints of Redis commits and PRs.
---------
Signed-off-by: Ping Xie <pingxie@outlook.com>
This follows up on the commandresult API work and fixes cleanup around
unsubscribe and module unload.
The main issue was that command-result event listeners could leave stale
state behind. On unload, we removed the listeners themselves but didn’t
fully update the fast-path listener counters. Separately, unsubscribing
with a NULL callback could behave badly if the listener wasn’t present
anymore. In practice, that meant later commands could still walk into
command-result event handling after the module was supposed to be
cleaned up.
Failed in Daily as well yesterday:
https://github.com/valkey-io/valkey/actions/runs/24753491944/job/72421581610#step:10:852
Related Failures:
https://github.com/valkey-io/valkey/pull/2936#issuecomment-4290490199
---------
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
### Analysis
The daily CI sanitizer jobs with clang are failing during the build
step.
The `ubuntu-latest` runner now has clang 18, but the LLVM gold plugin
is still version 17. When the static Lua module is built with `-flto`,
the `.o` files contain LLVM 18 bitcode that the gold plugin (v17) cannot
read:
`bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown
attribute kind (91) (Producer: 'LLVM18.1.3' Reader: 'LLVM 17.0.6')
`
Example failure:
https://github.com/valkey-io/valkey/actions/runs/24753491944/job/72421581512
### Fix
Pin the sanitizer jobs to `clang-17` so the compiler and gold plugin
versions match.
Tested(successfully built):
https://github.com/hanxizh9910/valkey/actions/runs/24859845008
### Note
If `clang-17` is removed from the `ubuntu-latest` image in the future,
we may need to either add an explicit install step
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
The SPMC queue from #3324 needs each `spmcCell` to be cache-line
aligned, but plain `zmalloc()` does not guarantee that in all build
configurations.
This change introduces `zmalloc_cache_aligned()` and uses it for the
SPMC queue buffer allocation in `spmcInit()`.
Failing CI: https://github.com/valkey-io/valkey/actions/runs/24374139344
---------
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Match HGETDEL with the existing batch-delete pattern used by HDEL.
HDEL already pauses hashtable auto-shrink while deleting multiple fields
so shrink evaluation is deferred until the batch completes. HGETDEL was
missing the same optimization even though it also deletes fields in a loop.
Pause auto-shrink for hashtable-encoded hashes before the HGETDEL delete
loop and resume it once afterwards. This preserves observable behavior
and reduces redundant shrink work for multi-field deletes.
Same as #3144.
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
## Summary
Fix a file descriptor leak in `connSocketBlockingConnect()` when
`aeWait()` times out.
## Bug
When `anetTcpNonBlockConnect()` succeeds but `aeWait()` times out (e.g.,
MIGRATE to an unreachable host), the fd is leaked because it was never
assigned to `conn->fd`. The caller's `connClose()` checks `conn->fd !=
-1` and skips cleanup.
## Fix
Assign `conn->fd = fd` immediately after `anetTcpNonBlockConnect()`
succeeds, before `aeWait()`. This way the caller's normal `connClose()`
cleanup path handles the fd on any error, which is consistent with how
the rest of the connection lifecycle works.
TLS connections also benefit since `connTLSBlockingConnect` delegates to
this function for the TCP layer.
## Reproducer
```
valkey-cli SET key hello
# Repeat against unreachable host:
for i in $(seq 1 30); do valkey-cli MIGRATE 192.0.2.1 6379 key 0 500; done
# Check: /proc/<pid>/fd shows 30 leaked socket fds
```
*This issue was generated by AI but verified, with love, by a human.*
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
There is a double free issue in the code. The error handling path called
both decrRefCount(o) and streamFreeNACK(nack), but the nack was obtained
from cgroup->pel via raxFind and is still referenced there. decrRefCount(o)
frees it through freeStream -> streamFreeCG -> raxFreeWithCallback(cg->pel, zfree),
so the explicit streamFreeNACK(nack) causes a double free.
Remove the redundant streamFreeNACK(nack) call and add a regression
test with a crafted corrupt payload that triggers the duplicate consumer
PEL entry path.
This was introduced in 492d8d0961.
Signed-off-by: Binbin <binloveplay1314@qq.com>
## Problem
`Fix cluster` in `tests/unit/cluster/many-slot-migration.tcl` has been
timing out daily on valgrind jobs since April 3, 2026. The test runs 10
cluster nodes under valgrind, migrating 40,000 keys across 1,000 slots —
too much work for valgrind-instrumented builds.
The slowdown is caused by #3366 (dict→hashtable wrapper). Under `-O0`
(valgrind builds), the `static inline` wrappers become real function
calls that valgrind instruments, adding ~75% overhead to hot paths like
`dictSize`. This compounds across 10 valgrind processes over a 20-minute
migration test. No impact on production builds (`-O2` inlines everything).
## Fix
Scale the test workload down under valgrind: 10,000 keys / 250 slots
instead of 40,000 / 1,000. Normal runs are unchanged. Still exercises
the same cluster repair path.
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: sarthakaggarwal97 <sarthakaggarwal97@users.noreply.github.com>
https://github.com/valkey-io/valkey/pull/3324 introduced `BATCH_SIZE` as
a const int local variable and used it as an array bound. Clang 17
rejects this with:
```
io_threads.c:305:22: error: variable length array folded to constant array as an extension [-Werror,-Wgnu-folding-constant]
305 | void *batch_jobs[BATCH_SIZE];
| ^~~~~~~~~~
1 error generated.
make[1]: *** [io_threads.o] Error 1
make: *** [all] Error 2
```
Old Clang versions do not emit this warning, maybe that is why the CI
passed. Fix by promoting `BATCH_SIZE` to a file-scope `#define`.
Signed-off-by: Yang Zhao <zymy701@gmail.com>
## Add Command Result Event Notifications for Modules
### Summary
1. Adds new server events `ValkeyModuleEvent_CommandResultSuccess` and
`ValkeyModuleEvent_CommandResultFailure` for that can notify subscribed
modules after command execution. This enables modules to implement audit
logging, error monitoring, performance tracking, and observability
without modifying core server code.
2. Adds new server event `ValkeyModuleEvent_CommandResultACLDenied` for
commands rejected by ACL. Together with PR #2237 this covers auditing of
authentication and authorisation.
### Motivation
There is currently no module API to observe command outcomes after
execution or to capture ACL denied commands. Modules that need audit
logging or error monitoring have no mechanism to be notified when
commands succeed or fail, what arguments were used, how long they took,
or how many keys were modified. This feature fills that gap using the
existing `ValkeyModule_SubscribeToServerEvent()` infrastructure.
### API
#### Events
| Event | Description |
|---|---|
| `ValkeyModuleEvent_CommandResultSuccess` | Fired after a command
completes successfully |
| `ValkeyModuleEvent_CommandResultFailure` | Fired after a command
returns an error |
| `ValkeyModuleEvent_CommandACLDenied` | Fired after a command is
rejected by ACL |
These are separate events (not sub-events), so modules can for example
only subscribe to failures without incurring any callback overhead for
successful commands.
#### Event Data: `ValkeyModuleCommandResultInfo`
The `data` pointer passed to the callback can be cast to
`ValkeyModuleCommandResultInfo`:
```c
typedef struct ValkeyModuleCommandResultInfo {
uint64_t version; /* Version of this structure for ABI compat. */
const char *command_name; /* Full command name (e.g., "SET", "CLIENT|LIST"). */
long long duration_us; /* Execution duration in microseconds. */
long long dirty; /* Number of keys modified. */
uint64_t client_id; /* Client ID that executed the command. */
int is_module_client; /* 1 if command was from RM_Call, 0 otherwise. */
int argc; /* Number of command arguments. */
ValkeyModuleString **argv; /* Command arguments array (zero-copy, read-only). */
int acl_deny_reason; /* ACL_DENIED_CMD/KEY/CHANNEL/AUTH; 0 for non-ACL events */
const char *acl_object; /* Denied resource name (key/channel); NULL for CMD/AUTH */
} ValkeyModuleCommandResultInfoV1;
```
The struct is versioned (`VALKEYMODULE_COMMANDRESULTINFO_VERSION`) for
forward-compatible API evolution.
### Usage Example
```c
/* Callback receives events for whichever event(s) you subscribed to */
void OnCommandResult(ValkeyModuleCtx *ctx, ValkeyModuleEvent eid,
uint64_t subevent, void *data) {
VALKEYMODULE_NOT_USED(ctx);
VALKEYMODULE_NOT_USED(subevent);
ValkeyModuleCommandResultInfo *info = (ValkeyModuleCommandResultInfo *)data;
if (info->version != VALKEYMODULE_COMMANDRESULTINFO_VERSION) return;
int failed = (eid.id == VALKEYMODULE_EVENT_COMMAND_RESULT_FAILURE);
/* Access fields directly */
printf("command=%s status=%s duration=%lldus dirty=%lld client=%llu\n",
info->command_name,
failed ? "FAIL" : "OK",
info->duration_us,
info->dirty,
info->client_id);
/* Access argv (read-only, zero-copy) */
for (int i = 0; i < info->argc; i++) {
size_t len;
const char *arg = ValkeyModule_StringPtrLen(info->argv[i], &len);
printf(" argv[%d] = %.*s\n", i, (int)len, arg);
}
}
/* Subscribe in ValkeyModule_OnLoad or at runtime */
/* Option A: command failures only (recommended for audit logging) */
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultFailure, OnCommandResult);
/* Option B: command successes only */
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultSuccess, OnCommandResult);
/* Option C: both command outcomes*/
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultSuccess, OnCommandResult);
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultFailure, OnCommandResult);
/* Subscribe to ACL Denied */
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultACLDenied, onCommandResult);
/* Unsubscribe pass NULL callback */
ValkeyModule_SubscribeToServerEvent(ctx,
ValkeyModuleEvent_CommandResultFailure, NULL);
```
### Design Decisions
- **Separate events instead of sub-events**: Modules subscribing only to
failures have zero overhead for successful commands (~2ns listener-list
check vs ~30ns callback invocation per command). This is critical since
success events fire on the hot path of every command.
- **Stack-allocated info struct**: The `ValkeyModuleCommandResultInfoV1`
is built on the stack ΓÇö no heap allocation per event.
- **Zero-copy argv**: Arguments are passed directly from the client's
argv array. Any integer-encoded arguments (from `tryObjectEncoding()`
during command execution) are decoded to string-encoded objects before
being passed to the callback, ensuring compatibility with
`ValkeyModule_StringPtrLen()`.
- **Early exit**: If no modules are subscribed to any server events, the
event firing function returns immediately before building the info
struct.
- **Uses existing server event infrastructure**: Follows the
`ValkeyModule_SubscribeToServerEvent()` pattern used by all other server
events, rather than introducing a new callback mechanism.
### Files Changed
| File | Change |
|---|---|
| `src/valkeymodule.h` | Event IDs, event constants,
`ValkeyModuleCommandResultInfoV1` struct |
| `src/module.c` | `moduleFireCommandResultEvent()`, event
documentation, event version entries |
| `src/module.h` | Function declaration |
| `src/server.c` | Call `moduleFireCommandResultEvent()` from `call()`
after command execution |
| `src/server.c` | Call to `moduleFireCommandACLDeniedEvent` in
`processCommand` after ACL rejection |
| `tests/modules/commandresult.c` | Test module exercising the full API
|
| `tests/unit/moduleapi/commandresult.tcl` | Integration tests |
---------
Signed-off-by: martinrvisser <mvisser@hotmail.com>
Signed-off-by: martinrvisser <martinrvisser@users.noreply.github.com>
Co-authored-by: Ricardo Dias <rjd15372@gmail.com>
Add a build option to compile the Lua scripting engine as a static
module and wire the server to load it directly at startup when enabled.
The module load path now resolves on-load and on-unload entry points
from the main binary, and the module lifecycle keeps those callbacks so
unload works without a shared library handle.
The Lua module build was updated to support both static and shared
variants, with the static path exporting visible wrapper symbols and
linking the server with the module archive. While touching the Lua code,
a few internal symbols were renamed for consistency and the monotonic
time helper was clarified.
Note that this PR addresses the LUA module, but it can be applied to
other "core" modules (like: Bloom, Json, Search and others). With this
change, it will be easier to ship Valkey bundle with modules.
Areas touched:
* CMake
* Makefile
* Lua scripting module
* Core module loading
**Generated by CodeLite**
---------
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
This improves COB memory tracking when using copy avoidance for bulk
string replies. This fix addresses underestimation of client memory
usage that occurred when reply buffers stored pointers to shared `robj`
instead of copying data.
IO threads calculate actual reply sizes by calling `sdslen()` on strings
before writing, for that we need atomic `tracked_for_cob` flag in
payload headers to prevent race conditions and double accounting.
See #2396
---------
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
`hpersistCommand` calls `addReplyArrayLen` before `lookupKeyWrite` +
`checkType`. When HPERSIST targets a non-hash key, the server writes a
RESP array header followed by a WRONGTYPE error — a malformed response
that permanently desynchronizes the client connection.
This moves `lookupKeyWrite` + `checkType` before `addReplyArrayLen`,
matching the pattern used by every other HFE command (e.g.
`hgetdelCommand`, `hexpireGenericCommand`).
Added a test for HPERSIST on a wrong-type key.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
## Problem
The test `Test module aof save on server start from empty` in
`tests/unit/moduleapi/hooks.tcl` sporadically crashes with `I/O error
reading reply`.
**Frequency:** 2 out of 15 days (March 26 on
`centosstream9-tls-module-no-tls`, April 8 on `fedorarawhide-jemalloc`).
**Example failing run:**
https://github.com/valkey-io/valkey/actions/runs/24110987718/job/70345236353
## Root Cause
The crash is a **use-after-unload** in the auth test module's blocking
authentication thread, NOT a timing issue in the AOF test.
The crash log from April 8 shows:
```
71112:M 00:42:59.710 * Module testacl unloaded
71112:M 00:42:59.711 # crashed by signal: 11, si_code: 1
71112:M 00:42:59.711 # Crashed running the instruction at: 0x7f9dc717384b
```
The sequence:
1. `blocking_auth_cb` spawns a background thread
(`AuthBlock_ThreadMain`) that sleeps 500ms
2. Thread wakes, calls `ValkeyModule_UnblockClient()` → main thread
processes unblock, decrements `module->blocked_clients`
3. Auth command completes, test calls `r module unload testacl`
4. `moduleUnloadInternal` checks `blocked_clients == 0` if true,
proceeds with `dlclose()`
5. **But the background thread is still executing cleanup code**
(freeing strings, returning from function)
6. Thread returns into unmapped memory → **SIGSEGV**
The `invalidFunctionWasCalled` in the stack trace is the crash handler's
safety stub, and the crashing address `0x7f9dc717384b` is in the
unmapped auth.so address space.
## Fix
Track the background thread ID and `pthread_join()` it in
`ValkeyModule_OnUnload` before the module is dlclose'd. This ensures the
thread has fully exited before the code is unmapped.
The key insight is that `ValkeyModule_UnblockClient()` signals "auth is
done" but not "thread is done" — the thread still has cleanup code to
execute after that call. `pthread_join()` is the correct synchronization
point because it only returns after the thread has fully exited.
No mutex is needed since both `blocking_auth_cb` (which creates the
thread) and `OnUnload` (which joins it) run on the main event loop
thread.
Changes to `tests/modules/auth.c`:
- Add global `blocking_auth_tid` and `blocking_auth_tid_valid` flag
- Set `blocking_auth_tid_valid = 1` after successful `pthread_create`
- In `OnUnload`, `pthread_join` the thread if one was created
## Testing
Ran `unit/moduleapi/hooks` 100 loops on rpm-distros and ubuntu runners —
**all passed**:
- **Workflow run:**
https://github.com/roshkhatri/valkey/actions/runs/24164276124
- **Config:** `--loops 100 --single unit/moduleapi/hooks` on
`almalinux8`, `almalinux9`, `fedoralatest`, `fedorarawhide`,
`centosstream9`, `ubuntu-jemalloc`, `ubuntu-arm`
- **Result:** 7/7 jobs ✅, zero failures across 700 total test iterations
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Instead of "scanning" random bucket chains using a random cursor for
each scan call, start at a random cursor and then continue sampling
buckets in scan order. The scan stops when we have sampled all elements,
so the cursor never wraps around to sample the same buckets again. This
ensures that we don't get any duplicate samples.
The functions hashtableRandomEntry and hashtableFairRandomEntry keeps
the old behavior using another sampling function preserving their
behavior. The fairness tests fail if we use the modified
hashtableSampleEntries.
This restores the behavior of dictGetSomeKeys (which is now an alias of
hashtableSampleEntries) and deflakes the test case:
Gossip count scales with higher percentage of
`cluster-message-gossip-perc`
in tests/unit/cluster/packet.tcl
Fixes#3454
---------
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The test introduced in #2227 is fragile because it unconditionally asserted
that "best ranked replica" would be logged for every shard's replica. This
assumption was incorrect for two reasons:
1. myselfIsBestRankedReplica() requires failover_failed_primary_rank == 0.
When two primaries fail simultaneously, only the shard with the lowest
shard_id gets failed_primary_rank == 0. The other shard's replicas can
only trigger "Myself become the best ranked replica" after the first
shard completes failover and the result propagates via gossip.
2. clusterAllReplicasThinkPrimaryIsFail() requires all sibling replicas
to have their MY_PRIMARY_FAIL flag propagated via gossip. If the
election delay is shorter than the gossip propagation time, the
replica may start the election before receiving the flag.
It's also too strict and checks too many things. Like we can check that there
is no delay before starting the failover, but we shouldn't check that the rank
0 replica actually wins, because maybe it doesn't. Also we're not sure the other
replica does a psync afterwards, it can also do a full sync.
Restructure the test into three focused scenarios with retry logic:
- Test 0 (3 primaries + 1 replica): The sole replica is deterministically
the best ranked replica. All conditions are trivially satisfied.
- Test 1 (3 primaries + 2 replicas): Single primary failure with two
competing replicas. failed_primary_rank is deterministically 0, but
MY_PRIMARY_FAIL gossip timing may prevent triggering in some runs.
Uses internal retry with cluster recovery to ensure at least one
successful trigger.
- Test 2 (5 primaries + 7 replicas): Two primaries failure. Verifies no
failover timeout and uses retry to cover the "best ranked replica"
path for at least one shard.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Adds cluster bus byte metrics to `CLUSTER INFO`, split by admin,
pub/sub, and module traffic. The change tracks sent and received bytes
on the cluster bus, exposes them as
`cluster_stats_*_bytes_{sent,received}`.
Example:
```
> src/valkey-cli CLUSTER INFO
cluster_state:fail
...
cluster_stats_bytes_sent:46088
cluster_stats_bytes_received:46080
cluster_stats_pubsub_bytes_sent:0
cluster_stats_pubsub_bytes_received:0
cluster_stats_module_bytes_sent:0
cluster_stats_module_bytes_received:0
total_cluster_links_buffer_limit_exceeded:0
```
Fixes: https://github.com/valkey-io/valkey/issues/1929
---------
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Minor cleanup, the function will return -1 on error.
It worked anyway before because it is implicitly converted:
-1 => SIZE_MAX => -1 again when it's implicitly cast back to
ssize_t.
Signed-off-by: Binbin <binloveplay1314@qq.com>
This allows setting the host and port using environment variables, which
makes using valkey-cli way easier in some development environments. No
need to recall the right host/port, just let your `.env` do the work.
---
Built locally and tested with,
```fish
VALKEY_PORT=36379 ./src/valkey-cli
```
```console
127.0.0.1:36379>
```
And using both,
```fish
VALKEYCLI_HOST=(hostname).local VALKEYCLI_PORT=36379 ./src/valkey-cli
```
```console
<HOSTNAME>.local:36379>
```
Signed-off-by: Dietrich Daroch <Dietrich@Daroch.me>
#### Summary
This PR redesigns the IO threading communication model, replacing the
inefficient client-list polling approach with a high-performance,
lock-free queue architecture. This change improves throughput by
**8–17%** across various workloads and lays the groundwork for
offloading command execution to IO threads in following PRs.
### Performance Comparison: Unstable vs New IO Queues
| Type | Operation | Unstable Branch (M TPS) | New IO Queues (M TPS)|
Difference (%) |
| :--- | :--- | :--- | :--- | :--- |
| **CME**<sup>1</sup> | SET | 1.02 | 1.19 | **+16.67%** |
| **CME** | GET | 1.30 | 1.47 | **+13.08%** |
| **CMD**<sup>2</sup> | SET | 1.15 | 1.35 | **+17.39%** |
| **CMD** | GET | 1.52 | 1.64 | **+7.89%** |
<sup>1</sup> Amazon terminology for cluster mode
<sup>2</sup> Amazon termonology for standalone mode, i.e. config
`cluster-enabled no`
- Test Configuration: 8 IO threads • 400 clients • 512-byte values • 3M
keys
#### Motivation
The previous IO model had several limitations that created performance
bottlenecks:
* **Inefficient Polling:** The main thread lacks a direct notification
mechanism for completed work. Instead, it must constantly iterate
through a list of all pending clients to check their state, wasting
significant CPU cycles.
* **Manual Load Balancing:** Jobs are assigned to specific threads
upfront. This requires the main thread to predict which thread to use,
often leaving some threads idle while others are overloaded.
* **Static Scaling:** Thread activation relies on a fixed heuristic
(e.g., 1 thread per 2 events). This approach fails to adapt to varying
workloads, such as TLS connections or differing read/write sizes.
### The Solution
To address these inefficiencies, this PR replaces the single SPSC queue
used currently with three specialized queues to handle communication and
load balancing more effectively.
#### 1. Main > IO: Shared Queue (Single Producer Multi Consumer)
Single queue from the main-thread to IO threads.
* **Automatic Load Balancing:** All threads pull from the same source.
Busy threads take less work, and idle threads take more, so we don't
need to manually select a thread.
* **Adaptive Scaling:** We now use the queue depth to decide when to add
or remove threads. If the queue is full, we scale up; if it's empty, we
scale down.
* *Ignition:* To get things started before the queue fills up, we
monitor the main thread's CPU. If usage goes over 30%, we wake up the
first IO thread.
* **Implementation:** To prevent contention among consumers, each item
in the ring buffer is padded to reside in its own cache line. Sequence
numbers are utilized to indicate whether a cell is empty or populated,
allowing threads to safely claim work.
#### 2. IO > Main: The Response Channel (MPSC Queue)
We replaced the old polling loop with a response queue.
* ** Faster Completion:** IO threads push completed jobs into this
queue. The main thread detects new data simply by checking if the queue
is not empty, removing the need to scan pending clients.
* **Contention Management:** To avoid lock contention, each thread
reserves a slot by atomically incrementing the tail index. In the rare
event that the queue is full, pending jobs are buffered in a local
temporary list until space becomes available.
#### 3. MAIN > IO (Thread-Specific): Private Inbox (SPSC Queue)
We kept the existing Single-Producer Single-Consumer (SPSC) queues for
tasks that must happen on a specific thread (like freeing memory
allocated by that thread). IO threads always check their private inbox
before looking at the shared queue.
### Changes Required
* **Async client release**
The main thread no longer busy-waits for IO threads to finish with a
client. Since the client must be popped from the multi-producer queue
before it can be released, clients with pending IO are now marked for
asynchronous closure.
* **eviction clients logic**
Updated evictClients() to account for memory pending release (clients
marked close_asap). freeClient() now returns a status code (1 for freed,
0 for async-close) to ensure the eviction loop does not over-evict by
ignoring memory that is about to be reclaimed.
* **events-per-io-thread config**
Replaced the `events-per-io-thread` configuration with
`io-threads-always-active`. as we no longer track events, since this
config is use only for tests no backward compatibility issue arises.
* **packed job instead of handlers**
Jobs are now represented as tagged pointers (using lower 3 bits for job
type) instead of separate `{handler, data}` structs. This reduces memory
overhead and allows jobs to be passed through the queues as single
pointers.
* **head caching in spsc queue**
The SPSC queue now caches the `head` index on the producer side
(`head_cache`) to avoid frequent atomic loads. The producer only
refreshes from the atomic `head` when the cache indicates the queue
might be full, reducing cross-thread cache-line bouncing.
* **deferred commit in SPSC queue**.
`spscEnqueue()` supports batching via a `commit` flag. Multiple jobs can
be enqueued with `commit=false`, then flushed with a single
`spscCommit()` call, reducing atomic operations and cache-line bouncing.
* **rollback on fullness check failure**
When `spmcEnqueue()` fails due to a full queue, the client state is
rolled back (e.g., `io_write_state` reset to `CLIENT_IDLE`). This
rollback approach removes the need to call an expensive `isFull` check
before every enqueue, we just attempt the enqueue and revert if it
fails.
* **epoll offloading via SPSC at high thread counts**.
When `active_io_threads_num > 9`, poll jobs are sent to per-thread SPSC
queues (round-robin). Since threads check their private queue first,
this ensures poll jobs are processed promptly without waiting behind
jobs in the shared SPMC queue.
* **avoid offload write before read comes back**
Added a check `if (c->io_read_state == CLIENT_PENDING_IO) return C_OK`
in `trySendWriteToIOThreads()`. In the previous per-thread SPSC
implementation, we could send consecutive read and write jobs for the
same client knowing a single thread would handle them in order. With the
shared SPMC queue, different threads may pick up the jobs, so we must
wait for the read to complete before sending a write to avoid 2 threads
handling the same client.
* **removing pending_read_list_node from client and
clients_pending_io_read/write lists from server**
Removed `pending_read_list_node` from the `client` struct and
`clients_pending_io_read`/`clients_pending_io_write` lists from
`valkeyServer`. as the new mpsc eliminates the need for these tracking
structures.
* **added inst metrics for pending io jobs**
Added `instantaneous_io_pending_jobs` metric via `STATS_METRIC_IO_WAIT`
to track average queue depth over time.
* **added stat for current active threads number**
Added `active_io_threads_num` to the INFO stats output for better
visibility.
* **added internal inst metric for main-thread cpu (non apple
compliant)**
Added `STATS_METRIC_MAIN_THREAD_CPU_SYS` to track main thread CPU usage
via `getrusage(RUSAGE_THREAD)`. This powers the "ignition" policy, when
CPU exceeds 30%, the first IO thread is activated. `RUSAGE_THREAD` is
Linux-specific, so macOS falls back to event-count heuristics.
* **added stat for pending read and writes for io**
Added `io_threaded_reads_pending` and `io_threaded_writes_pending` stats
to track how many read/write jobs are currently in-flight to IO threads.
* **added volatile for crashed**
Changed `server.crashed` from `int` to `volatile int` to ensure the
crash flag is visible across threads immediately, allowing IO threads to
detect a crash and stop sending responses back to the main thread to
avoid deadlock on crash.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Dan Touitou <dan.touitou@gmail.com>
This PR introduces `ValkeyModule_CallArgv`, a new low-level module API
for calling server commands. It is designed as a complement to the
existing `ValkeyModule_Call` covering two scenarios where
`ValkeyModule_Call` is suboptimal:
1. **The module already holds the arguments as an argv array.**
`ValkeyModule_Call` always allocates a new argv array, creates a new
string object for the command name, and calls
`incrRefCount`/`decrRefCount` on every argument. `ValkeyModule_CallArgv`
borrows the caller's argv array directly — no allocation, no refcount
manipulation. Ownership of the array stays with the caller.
2. **The module wants to forward the reply to its own client without
inspecting it.** `ValkeyModule_Call` always parses the raw RESP bytes
into an intermediate `ValkeyModuleCallReply` tree and then re-serializes
it back to RESP to send to the caller. `ValkeyModule_CallArgv` exposes
the raw RESP bytes directly through a callback, allowing a pass-through
module command to copy bytes from the inner command's output buffer
straight into the outer client's output buffer.
---
## New API
### `ValkeyModule_CallArgv`
```c
int ValkeyModule_CallArgv(ValkeyModuleCtx *ctx,
ValkeyModuleString **argv,
int argc,
int flags,
ValkeyModuleReplyHandlers *reply_handlers,
void *reply_ctx);
```
Calls a server command using an existing argv array. The caller retains
ownership of `argv` and its elements — they are never freed or
ref-counted by the function.
Returns `VALKEYMODULE_OK` on success or `VALKEYMODULE_ERR` on error,
with `errno` set to the same values as `ValkeyModule_Call`. The function
does **not** return a `ValkeyModuleCallReply`; the reply is delivered
through `reply_handlers` instead.
`flags` is a bitwise OR of one or more `VALKEYMODULE_CALL_ARGV_*`
constants, which correspond directly to the format-string specifiers of
`ValkeyModule_Call`:
| Flag | Equivalent format char | Meaning |
|---|---|---|
| `VALKEYMODULE_CALL_ARGV_REPLICATE` | `!` | Propagate to replicas and
AOF |
| `VALKEYMODULE_CALL_ARGV_NO_AOF` | `A` | Skip AOF propagation |
| `VALKEYMODULE_CALL_ARGV_NO_REPLICAS` | `R` | Skip replica propagation
|
| `VALKEYMODULE_CALL_ARGV_RESP_3` | `3` | Force RESP3 for the inner call
|
| `VALKEYMODULE_CALL_ARGV_RESP_AUTO` | `0` | Match the calling client's
protocol (RESP2 or RESP3) |
| `VALKEYMODULE_CALL_ARGV_RUN_AS_USER` | `C` | Apply ACL checks for the
context user |
| `VALKEYMODULE_CALL_ARGV_SCRIPT_MODE` | `S` | Mark as script execution
|
| `VALKEYMODULE_CALL_ARGV_NO_WRITES` | `W` | Disallow write commands |
| `VALKEYMODULE_CALL_ARGV_ERRORS_AS_REPLIES` | `E` | Return error
replies instead of raising errno |
| `VALKEYMODULE_CALL_ARGV_RESPECT_DENY_OOM` | `M` | Honour deny-oom
policy |
| `VALKEYMODULE_CALL_ARGV_DRY_RUN` | `D` | Dry-run mode (implies
`ERRORS_AS_REPLIES`) |
| `VALKEYMODULE_CALL_ARGV_ALLOW_BLOCK` | `K` | Allow the inner command
to block |
| `VALKEYMODULE_CALL_ARGV_REPLY_EXACT` | `X` | Disable reply type
coercion |
---
### `ValkeyModuleReplyHandlers`
```c
typedef struct ValkeyModuleReplyHandlers {
/* Scalar types */
void (*null)(void *ctx);
void (*bulkString)(void *ctx, const char *str, size_t len);
void (*simpleString)(void *ctx, const char *str, size_t len);
void (*error)(void *ctx, const char *msg, size_t len);
void (*integer)(void *ctx, long long val);
void (*doubleVal)(void *ctx, double val);
void (*boolVal)(void *ctx, int val);
void (*bigNumber)(void *ctx, const char *str, size_t len);
void (*verbatimString)(void *ctx, const char *str, size_t len, const char *fmt);
/* Collection types — paired start/end callbacks */
void (*arrayStart)(void *ctx, size_t len);
void (*arrayEnd)(void *ctx);
void (*mapStart)(void *ctx, size_t len);
void (*mapEnd)(void *ctx);
void (*setStart)(void *ctx, size_t len);
void (*setEnd)(void *ctx);
void (*attributeStart)(void *ctx, size_t len);
void (*attributeEnd)(void *ctx);
/* Invoked on parse error */
void (*replyParsingError)(void *ctx);
/* Raw-bytes intercept — called before per-type callbacks */
int (*onRespAvailable)(void *ctx, ValkeyModuleCtx *module_ctx, const char *proto, size_t proto_len);
/* Blocking command support */
void (*onBlocked)(void *ctx, ValkeyModuleCtx *module_ctx, ValkeyModuleCallArgvBlockedHandle *handle);
void *context; /* Passed as first argument to every callback */
} ValkeyModuleReplyHandlers;
```
All fields are optional; set unused ones to `NULL`.
**`onRespAvailable`** is called first, before any per-type callback,
with the complete raw RESP reply in the `proto`/`proto_len` arguments.
Return `1` to continue into the per-type callbacks; return `0` to stop
after the raw bytes. Pass `NULL` to skip and go straight to per-type
dispatching.
**`onBlocked`** is only called when `VALKEYMODULE_CALL_ARGV_ALLOW_BLOCK`
is set and the inner command blocks. The `handle` can be passed to
`ValkeyModule_CallArgvAbort` to cancel the call. The handle is valid
until either `onRespAvailable` is invoked or
`ValkeyModule_CallArgvAbort` is called, whichever comes first.
---
### `ValkeyModule_CallArgvAbort`
```c
int ValkeyModule_CallArgvAbort(ValkeyModuleCallArgvBlockedHandle *handle);
```
Cancels a blocking call whose `onBlocked` callback has already fired.
Returns `VALKEYMODULE_OK` if the abort succeeded (neither
`onRespAvailable` nor any other reply callback will be invoked), or
`VALKEYMODULE_ERR` if the call has already completed.
---
### `ValkeyModule_ReplyRaw`
```c
int ValkeyModule_ReplyRaw(ValkeyModuleCtx *ctx, const char *proto, size_t proto_len);
```
Appends raw RESP bytes directly to the calling client's output buffer,
with no parsing or re-serialization. Intended for use inside an
`onRespAvailable` callback to implement zero-copy reply forwarding.
---
## Usage Example: `ValkeyModule_Call` vs `ValkeyModule_CallArgv`
The following two implementations of a generic pass-through proxy
command are functionally equivalent, but use different code paths
internally.
### With `ValkeyModule_Call` (existing API)
```c
int ProxyCommand(ValkeyModuleCtx *ctx, ValkeyModuleString **argv, int argc) {
if (argc < 2) return ValkeyModule_WrongArity(ctx);
/* VM_Call builds a new argv array internally, creates a string object for
* the command name, and calls incrRefCount/decrRefCount on every argument.
* The reply is parsed into a CallReply tree, then re-serialized to RESP. */
ValkeyModuleCallReply *reply =
ValkeyModule_Call(ctx, ValkeyModule_StringPtrLen(argv[1], NULL), "v",
argv + 2, (size_t)(argc - 2));
return ValkeyModule_ReplyWithCallReply(ctx, reply);
}
```
### With `ValkeyModule_CallArgv` and `onRespAvailable` (new API —
pass-through)
```c
static int forwardRawReply(void *ctx, ValkeyModuleCtx *mctx,
const char *proto, size_t proto_len) {
(void)ctx;
/* Write the raw RESP bytes directly to the outer client — no parse, no
* re-serialize, no intermediate allocation. */
ValkeyModule_ReplyRaw(mctx, proto, proto_len);
return 0; /* stop — do not invoke per-type callbacks */
}
static ValkeyModuleReplyHandlers passthrough_handlers = {
.onRespAvailable = forwardRawReply,
};
int ProxyCommand(ValkeyModuleCtx *ctx, ValkeyModuleString **argv, int argc) {
if (argc < 2) return ValkeyModule_WrongArity(ctx);
/* VM_CallArgv borrows argv directly (no copy, no refcount changes).
* The reply is written straight into the outer client's output buffer
* through the onRespAvailable callback above. */
if (ValkeyModule_CallArgv(ctx, argv + 1, argc - 1,
VALKEYMODULE_CALL_ARGV_RESP_AUTO,
&passthrough_handlers,
NULL) == VALKEYMODULE_ERR) {
return ValkeyModule_ReplyWithError(ctx, "ERR command failed");
}
return VALKEYMODULE_OK;
}
```
Use `VALKEYMODULE_CALL_ARGV_RESP_AUTO` to ensure the inner command
replies in the same protocol version as the outer client (RESP2 or
RESP3), so the raw bytes can be forwarded without modification.
---
## Internal changes
### `argv_borrowed` client flag
A new `argv_borrowed` bit is added to the client flags struct. When
`VM_CallArgv` runs the inner command on a temporary client, it sets this
flag to signal that the temp client does not own the argv array.
`freeClientArgv` and `freeClientOriginalArgv` respect the flag and skip
freeing. `resetClient` clears it unconditionally after each command.
Several string commands (`SET`, `MSET`, `APPEND`, …) that previously
encoded or reused argv objects in-place are guarded so they do not
mutate argv elements they do not own.
A debug-only assertion (enabled when `enable-debug-assert yes` is set in
the config) verifies at runtime that no command modifies the argv array
or decrements any element's ref-count when `argv_borrowed` is set.
---------
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Ricardo Dias <rjd15372@gmail.com>
This commit introduces a new configuration option `cluster-config-save-behavior`
that controls how the cluster handles nodes.conf file save failures.
The option supports two modes:
- `sync` (default): Synchronously save the config file. If the save fails,
the process exits. This maintains backward compatibility with the old
behavior (before 9.1).
- `best-effort`: Synchronously save the config file. If the save fails,
only log a warning and continue running. This allows the node to survive
disk failures (e.g., disk full, read-only filesystem) without exitting,
giving administrators time to address the issue.
Note that this modifies the behavior of #1032, whereas #1032 was "best-effort",
we have now introduced a configuration option that defaults to "sync."
See #1032 discussion for more details.
Background:
When a disk becomes read-only or full, any cluster metadata change would
trigger a nodes.conf save attempt. With the old behavior, the node would
immediately exit via clusterSaveConfigOrDie(), potentially causing multiple
nodes on the same machine to crash simultaneously, leading to cluster
unavailability.
The new `best-effort` mode addresses this by allowing nodes to continue
operating even when disk writes fail. This is particularly useful in cloud
environments where disk failures are more common due to scale.
Note: Startup-time config saves (in clusterInit and verifyClusterConfigWithData)
still use clusterSaveConfigOrDie() since disk issues at startup should cause
immediate failure.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Previously, `redactClientCommandArgument()` forced a full copy of
`c->argv` into `original_argv` immediately, then replaced the sensitive
slot with `shared.redacted`. This meant any call to redact an argument
triggered an eager argv backup even when no rewriting was needed.
The new approach stores the indices of arguments to be redacted in a
small dynamic array (`redact_args`) on the client struct. Redaction is
applied lazily when the argv is actually consumed (e.g. by the command
log), avoiding the unnecessary argv copy on the hot path.
This change is required for the implementation of
`ValkeyModule_CallArgv` module API function (#3122) , because CallArgv
owns the `argv` array, which can't be modified by the redact function.
---------
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Two changes to tests/unit/cluster/faster-failover.tcl:
1. FAIL detection timeout: `wait_for_condition 1000 10` → `1000 50`
(10s → 50s)
2. psync_max_retries: 1200 → 2400 normal (120s → 240s),
6000 → 12000 valgrind (600s → 1200s)
The test `The best replica can initiate an election immediately in an
automatic failover` in `tests/unit/cluster/faster-failover.tcl` has been
flaky since it was introduced on March 27, 2026 by #2227.
**Frequency:** 8 out of 15 days (Mar 27 – Apr 8), across valgrind,
sanitizer, and slow CI runners.
**Common errors:**
- `log message of "Successful partial resynchronization with primary"
not found` (timeout waiting for psync)
- `expected pattern found in srv -N log file: *best ranked replica*`
(timeout waiting for FAIL propagation)
The test spins up a 12-node cluster (5 primaries + 7 replicas), pauses
nodes, and waits for FAIL detection to propagate across all nodes before
failover + partial resync.
A previous fix attempt #3424 increased the psync timeout from 50s to
120s (600s valgrind), which reduced frequency but did not eliminate it.
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
In rdbLoadObject, when sdstrynewlen is OK and hashtableAdd is OK,
but lpSafeToAdd is FAIL, field will be double-freed.
But lpSafeToAdd will only fail when len + add > LISTPACK_MAX_SAFETY_SIZE(1GB),
so it can rarely happened and trivial.
Signed-off-by: charsyam <charsyam@naver.com>
rewriteConfigFormatMemory() and rewriteConfigBytesOption() used long long
parameters to represent memory bytes, but configs like maxmemory are stored
as unsigned long long and allow values up to ULLONG_MAX.
When the value exceeds LLONG_MAX (e.g. 9223372036854775808), it overflows to
negative when passed as long long, causing config rewrite to produce incorrect
output like "maxmemory -8589934592gb".
Change both functions to use unsigned long long, matching the actual semantics
of memory byte values. This is consistent with how numericConfigGet() already
handles MEMORY_CONFIG using ull2string().
There are some MEMORY_CONFIG are signed and can be negative:
1. maxmemory-clients is a PERCENT_CONFIG
2. slot-migration-max-failover-repl-bytes is a SIGNED_MEMORY_CONFIG (after #3443)
None of them were affected:
```
static void numericConfigRewrite(standardConfig *config, const char *name, struct rewriteConfigState *state) {
...
if (config->data.numeric.flags & PERCENT_CONFIG && value < 0) {
rewriteConfigPercentOption(state, name, -value, config->data.numeric.default_value);
} else if (config->data.numeric.flags & SIGNED_MEMORY_CONFIG && value < 0) {
rewriteConfigNumericalOption(state, name, value, config->data.numeric.default_value);
} else if (config->data.numeric.flags & MEMORY_CONFIG) {
rewriteConfigBytesOption(state, name, value, config->data.numeric.default_value);
...
```
Signed-off-by: Binbin <binloveplay1314@qq.com>