Files
valkey/tests/unit/moduleapi/hash_stringref.tcl
Yair Gottdenker e9014fd02b Adding support for sharing memory between the module and the engine (#2472)
## Overview

Sharing memory between the module and engine reduces memory overhead by
eliminating redundant copies of stored records in the module. This is
particularly beneficial for search workloads that require indexing large
volumes of documents.

### Vectors

Vector similarity search requires storing large volumes of
high-cardinality vectors. For example, a single vector with 512
dimensions consumes 2048 bytes, and typical workloads often involve
millions of vectors. Due to the lack of a memory-sharing mechanism
between the module and the engine, valkey-search currently doubles
memory consumption when indexing vectors, significantly increasing
operational costs. This limitation introduces adoption friction and
reduces valkey-search's competitiveness.

## Memory Allocation Strategy

At a fundamental level, there are two primary allocation strategies:
- [Chosen] Module-allocated memory shared with the engine.
- Engine-allocated memory shared with the module.

For valkey-search, it is crucial that vectors reside in cache-aligned
memory to maximize SIMD optimizations. Allowing the module to allocate
memory provides greater flexibility for different use cases, though it
introduces slightly higher implementation complexity.

## Old Implementation

The old [implementation](https://github.com/valkey-io/valkey/pull/1804)
was based on ref-counting and introduced a new SDS type. After further
discussion, we
[agreed](https://github.com/valkey-io/valkey/pull/1804#issuecomment-2905115712)
to simplify the design by removing ref-counting and avoiding the
introduction of a new SDS type.

## New Implementation - Key Points

1. The engine exposes a new interface, `VM_HashSetViewValue`, which set
value as a view of a buffer which is owned by the module. The function
accepts the hash key, hash field, and a buffer along with its length.
2. `ViewValue` is a new data type that captures the externalized buffer
and its length.


## valkey-search Usage

### Insertion
1. Upon receiving a key space notification for a new hash or JSON key
with an indexed vector attribute, valkey-search allocates cache-aligned
memory and deep-copies the vector value.
2. valkey-search then calls `VM_HashSetViewValue` to avoid keeping two
copies of the vector.
### Deletion
When receiving a key space notification for a deleted hash key or hash
field that was indexed as a vector, valkey-search deletes the
corresponding entry from the index.

### Update
Handled similarly to insertion.

---------

Signed-off-by: yairgott <yairgott@gmail.com>
Signed-off-by: Yair Gottdenker <yairg@google.com>
Signed-off-by: Yair Gottdenker <yairgott@gmail.com>
Co-authored-by: Yair Gottdenker <yairg@google.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Jim Brunner <brunnerj@amazon.com>
2025-12-19 15:55:57 +02:00

21 lines
597 B
Tcl

set testmodule [file normalize tests/modules/hash_stringref.so]
start_server {tags {"modules"}} {
r module load $testmodule
test {Module hash set} {
r del k
set status [catch {r hash.set_stringref k f hello} errmsg]
assert {$status == 1}
r hset k f hello1
assert_equal "0" [r hash.has_stringref k f]
r hash.set_stringref k f hello1
assert_equal "hello1" [r hget k f]
assert_equal "1" [r hash.has_stringref k f]
}
test "Unload the module - hash" {
assert_equal {OK} [r module unload hash.stringref]
}
}