mirror of
https://github.com/valkey-io/valkey.git
synced 2026-05-09 06:49:36 -04:00
e9014fd02b
## Overview Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents. ### Vectors Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, valkey-search currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces valkey-search's competitiveness. ## Memory Allocation Strategy At a fundamental level, there are two primary allocation strategies: - [Chosen] Module-allocated memory shared with the engine. - Engine-allocated memory shared with the module. For valkey-search, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity. ## Old Implementation The old [implementation](https://github.com/valkey-io/valkey/pull/1804) was based on ref-counting and introduced a new SDS type. After further discussion, we [agreed](https://github.com/valkey-io/valkey/pull/1804#issuecomment-2905115712) to simplify the design by removing ref-counting and avoiding the introduction of a new SDS type. ## New Implementation - Key Points 1. The engine exposes a new interface, `VM_HashSetViewValue`, which set value as a view of a buffer which is owned by the module. The function accepts the hash key, hash field, and a buffer along with its length. 2. `ViewValue` is a new data type that captures the externalized buffer and its length. ## valkey-search Usage ### Insertion 1. Upon receiving a key space notification for a new hash or JSON key with an indexed vector attribute, valkey-search allocates cache-aligned memory and deep-copies the vector value. 2. valkey-search then calls `VM_HashSetViewValue` to avoid keeping two copies of the vector. ### Deletion When receiving a key space notification for a deleted hash key or hash field that was indexed as a vector, valkey-search deletes the corresponding entry from the index. ### Update Handled similarly to insertion. --------- Signed-off-by: yairgott <yairgott@gmail.com> Signed-off-by: Yair Gottdenker <yairg@google.com> Signed-off-by: Yair Gottdenker <yairgott@gmail.com> Co-authored-by: Yair Gottdenker <yairg@google.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Jim Brunner <brunnerj@amazon.com>
21 lines
597 B
Tcl
21 lines
597 B
Tcl
set testmodule [file normalize tests/modules/hash_stringref.so]
|
|
|
|
start_server {tags {"modules"}} {
|
|
r module load $testmodule
|
|
|
|
test {Module hash set} {
|
|
r del k
|
|
set status [catch {r hash.set_stringref k f hello} errmsg]
|
|
assert {$status == 1}
|
|
r hset k f hello1
|
|
assert_equal "0" [r hash.has_stringref k f]
|
|
r hash.set_stringref k f hello1
|
|
assert_equal "hello1" [r hget k f]
|
|
assert_equal "1" [r hash.has_stringref k f]
|
|
}
|
|
|
|
test "Unload the module - hash" {
|
|
assert_equal {OK} [r module unload hash.stringref]
|
|
}
|
|
}
|