mirror of
https://github.com/valkey-io/valkey.git
synced 2026-05-06 05:26:42 -04:00
d7993b78d8
Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover). This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at #1591, modified to meet the designs we had outlined in #23. ## New commands * `CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id`: Begin sending the slot via replication to the target. Multiple targets can be specified by repeating `SLOTSRANGE ... NODE ...` * `CLUSTER CANCELMIGRATION ALL`: Cancel all slot migrations * `CLUSTER GETSLOTMIGRATIONS`: See a recent log of migrations This PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core. ## Slot migration jobs Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it. All jobs, including those that finished recently, can be observed using the `CLUSTER GETSLOTMIGRATIONS` command. ## Replication * Since the snapshot uses AOF, the snapshot can be replayed verbatim to any replicas of the target node. * We use the same proxying mechanism used for chaining replication to copy the content sent by the source node directly to the replica nodes. ## `CLUSTER SYNCSLOTS` To coordinate the state machine transitions across the two nodes, a new command is added, `CLUSTER SYNCSLOTS`, that performs this control flow. Each end of the slot migration connection is expected to install a read handler in order to handle `CLUSTER SYNCSLOTS` commands: * `ESTABLISH`: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots. * `SNAPSHOT-EOF`: appended to the end of the snapshot to signal that the snapshot is done being written to the target. * `PAUSE`: informs the source node to pause whenever it gets the opportunity * `PAUSED`: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable size * `REQUEST-FAILOVER`: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short duration * `FAILOVER-GRANTED`: sent to the target to inform that REQUEST-FAILOVER is granted * `ACK`: heartbeat command used to ensure liveness ## Interactions with other commands * FLUSHDB on the source node (which flushes the migrating slot) will result in the source dropping the connection, which will flush the slot on the target and reset the state machine back to the beginning. The subsequent retry should very quickly succeed (it is now empty) * FLUSHDB on the target will fail the slot migration. We can iterate with better handling, but for now it is expected that the operator would retry. * Genearlly, FLUSHDB is expected to be executed cluster wide, so preserving partially migrated slots doesn't make much sense * SCAN and KEYS are filtered to avoid exposing importing slot data ## Error handling * For any transient connection drops, the migration will be failed and require the user to retry. * If there is an OOM while reading from the import connection, we will fail the import, which will drop the importing slot data * If there is a client output buffer limit reached on the source node, it will drop the connection, which will cause the migration to fail * If at any point the export loses ownership or either node is failed over, a callback will be triggered on both ends of the migration to fail the import. The import will not reattempt with a new owner * The two ends of the migration are routinely pinging each other with SYNCSLOTS ACK messages. If at any point there is no interaction on the connection for longer than `repl-timeout`, the connection will be dropped, resulting in migration failure * If a failover happens, we will drop keys in all unowned slots. The migration does not persist through failovers and would need to be retried on the new source/target. ## State machine ``` Target/Importing Node State Machine ───────────────────────────────────────────────────────────── ┌────────────────────┐ │SLOT_IMPORT_WAIT_ACK┼──────┐ └──────────┬─────────┘ │ ACK│ │ ┌──────────────▼─────────────┐ │ │SLOT_IMPORT_RECEIVE_SNAPSHOT┼──┤ └──────────────┬─────────────┘ │ SNAPSHOT-EOF│ │ ┌───────────────▼──────────────┐ │ │SLOT_IMPORT_WAITING_FOR_PAUSED┼─┤ └───────────────┬──────────────┘ │ PAUSED│ │ ┌───────────────▼──────────────┐ │ Error Conditions: │SLOT_IMPORT_FAILOVER_REQUESTED┼─┤ 1. OOM └───────────────┬──────────────┘ │ 2. Slot Ownership Change FAILOVER-GRANTED│ │ 3. Demotion to replica ┌──────────────▼─────────────┐ │ 4. FLUSHDB │SLOT_IMPORT_FAILOVER_GRANTED┼──┤ 5. Connection Lost └──────────────┬─────────────┘ │ 6. No ACK from source (timeout) Takeover Performed│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS┼────┤ └──────────────────────────┘ │ │ ┌─────────────────────────────────────▼─┐ │SLOT_IMPORT_FINISHED_WAITING_TO_CLEANUP│ └────────────────────┬──────────────────┘ Unowned Slots Cleaned Up│ ┌─────────────▼───────────┐ │SLOT_MIGRATION_JOB_FAILED│ └─────────────────────────┘ Source/Exporting Node State Machine ───────────────────────────────────────────────────────────── ┌──────────────────────┐ │SLOT_EXPORT_CONNECTING├─────────┐ └───────────┬──────────┘ │ Connected│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_AUTHENTICATING┼───────┤ └─────────────┬────────────┘ │ Authenticated│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_SEND_ESTABLISH┼───────┤ └─────────────┬────────────┘ │ ESTABLISH command written│ │ ┌─────────────────────▼─────────────┐ │ │SLOT_EXPORT_READ_ESTABLISH_RESPONSE┼──────┤ └─────────────────────┬─────────────┘ │ Full response read (+OK)│ │ ┌────────────────▼──────────────┐ │ Error Conditions: │SLOT_EXPORT_WAITING_TO_SNAPSHOT┼─────┤ 1. User sends CANCELMIGRATION └────────────────┬──────────────┘ │ 2. Slot ownership change No other child process│ │ 3. Demotion to replica ┌────────────▼───────────┐ │ 4. FLUSHDB │SLOT_EXPORT_SNAPSHOTTING┼────────┤ 5. Connection Lost └────────────┬───────────┘ │ 6. AUTH failed Snapshot done│ │ 7. ERR from ESTABLISH command ┌───────────▼─────────┐ │ 8. Unpaused before failover completed │SLOT_EXPORT_STREAMING┼──────────┤ 9. Snapshot failed (e.g. Child OOM) └───────────┬─────────┘ │ 10. No ack from target (timeout) PAUSE│ │ 11. Client output buffer overrun ┌──────────────▼─────────────┐ │ │SLOT_EXPORT_WAITING_TO_PAUSE┼──────┤ └──────────────┬─────────────┘ │ Buffer drained│ │ ┌──────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_PAUSED┼───────┤ └──────────────┬────────────┘ │ Failover request granted│ │ ┌───────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_GRANTED┼───────┤ └───────────────┬────────────┘ │ New topology received│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS│ │ └──────────────────────────┘ │ │ ┌─────────────────────────┐ │ │SLOT_MIGRATION_JOB_FAILED│◄────────┤ └─────────────────────────┘ │ │ ┌────────────────────────────┐ │ │SLOT_MIGRATION_JOB_CANCELLED│◄──────┘ └────────────────────────────┘ ``` Co-authored-by: Binbin <binloveplay1314@qq.com> --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
355 lines
12 KiB
Python
Executable File
355 lines
12 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
import os
|
|
import glob
|
|
import json
|
|
import sys
|
|
|
|
import jsonschema
|
|
import subprocess
|
|
import redis
|
|
import time
|
|
import argparse
|
|
import multiprocessing
|
|
import collections
|
|
import io
|
|
import traceback
|
|
from datetime import timedelta
|
|
from functools import partial
|
|
try:
|
|
from jsonschema import Draft201909Validator as schema_validator
|
|
except ImportError:
|
|
from jsonschema import Draft7Validator as schema_validator
|
|
|
|
"""
|
|
The purpose of this file is to validate the reply_schema values of COMMAND DOCS.
|
|
Basically, this is what it does:
|
|
1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c)
|
|
2. For each request-response pair, it validates the response against the request's reply_schema (obtained from COMMAND DOCS)
|
|
|
|
This script spins up a valkey-server and a valkey-cli in order to obtain COMMAND DOCS.
|
|
|
|
In order to use this file you must run the redis testsuite with the following flags:
|
|
./runtest --dont-clean --force-resp3 --log-req-res
|
|
|
|
And then:
|
|
./utils/req-res-log-validator.py
|
|
|
|
The script will fail only if:
|
|
1. One or more of the replies doesn't comply with its schema.
|
|
2. One or more of the commands in COMMANDS DOCS doesn't have the reply_schema field (with --fail-missing-reply-schemas)
|
|
3. The testsuite didn't execute all of the commands (with --fail-commands-not-all-hit)
|
|
|
|
Future validations:
|
|
1. Fail the script if one or more of the branches of the reply schema (e.g. oneOf, anyOf) was not hit.
|
|
"""
|
|
|
|
IGNORED_COMMANDS = {
|
|
# Commands that don't work in a req-res manner (see logreqres.c)
|
|
"debug", # because of DEBUG SEGFAULT
|
|
"sync",
|
|
"psync",
|
|
"monitor",
|
|
"subscribe",
|
|
"unsubscribe",
|
|
"ssubscribe",
|
|
"sunsubscribe",
|
|
"psubscribe",
|
|
"punsubscribe",
|
|
# Commands to which we decided not write a reply schema
|
|
"pfdebug",
|
|
"lolwut",
|
|
# Slot migration commands are not tested for RC1
|
|
"cluster|syncslots",
|
|
"cluster|cancelslotmigrations",
|
|
"cluster|getslotmigrations",
|
|
"cluster|migrateslots",
|
|
}
|
|
|
|
class Request(object):
|
|
"""
|
|
This class represents a Redis request (AKA command, argv)
|
|
"""
|
|
def __init__(self, f, docs, line_counter):
|
|
"""
|
|
Read lines from `f` (generated by logreqres.c) and populates the argv array
|
|
"""
|
|
self.command = None
|
|
self.schema = None
|
|
self.argv = []
|
|
|
|
while True:
|
|
line = f.readline()
|
|
line_counter[0] += 1
|
|
if not line:
|
|
break
|
|
length = int(line)
|
|
arg = str(f.read(length))
|
|
f.read(2) # read \r\n
|
|
line_counter[0] += 1
|
|
if arg == "__argv_end__":
|
|
break
|
|
self.argv.append(arg)
|
|
|
|
if not self.argv:
|
|
return
|
|
|
|
self.command = self.argv[0].lower()
|
|
doc = docs.get(self.command, {})
|
|
if not doc and len(self.argv) > 1:
|
|
self.command = f"{self.argv[0].lower()}|{self.argv[1].lower()}"
|
|
doc = docs.get(self.command, {})
|
|
|
|
if not doc:
|
|
self.command = None
|
|
return
|
|
|
|
self.schema = doc.get("reply_schema")
|
|
|
|
def __str__(self):
|
|
return json.dumps(self.argv)
|
|
|
|
|
|
class Response(object):
|
|
"""
|
|
This class represents a Redis response in RESP3
|
|
"""
|
|
def __init__(self, f, line_counter):
|
|
"""
|
|
Read lines from `f` (generated by logreqres.c) and build the JSON representing the response in RESP3
|
|
"""
|
|
self.error = False
|
|
self.queued = False
|
|
self.json = None
|
|
|
|
line = f.readline()[:-2]
|
|
line_counter[0] += 1
|
|
if line[0] == '+':
|
|
self.json = line[1:]
|
|
if self.json == "QUEUED":
|
|
self.queued = True
|
|
elif line[0] == '-':
|
|
self.json = line[1:]
|
|
self.error = True
|
|
elif line[0] == '$':
|
|
self.json = str(f.read(int(line[1:])))
|
|
f.read(2) # read \r\n
|
|
line_counter[0] += 1
|
|
elif line[0] == ':':
|
|
self.json = int(line[1:])
|
|
elif line[0] == ',':
|
|
self.json = float(line[1:])
|
|
elif line[0] == '_':
|
|
self.json = None
|
|
elif line[0] == '#':
|
|
self.json = line[1] == 't'
|
|
elif line[0] == '!':
|
|
self.json = str(f.read(int(line[1:])))
|
|
f.read(2) # read \r\n
|
|
line_counter[0] += 1
|
|
self.error = True
|
|
elif line[0] == '=':
|
|
self.json = str(f.read(int(line[1:])))[4:] # skip "txt:" or "mkd:"
|
|
f.read(2) # read \r\n
|
|
line_counter[0] += 1 + self.json.count("\r\n")
|
|
elif line[0] == '(':
|
|
self.json = line[1:] # big-number is actually a string
|
|
elif line[0] in ['*', '~', '>']: # unfortunately JSON doesn't tell the difference between a list and a set
|
|
self.json = []
|
|
count = int(line[1:])
|
|
for i in range(count):
|
|
ele = Response(f, line_counter)
|
|
self.json.append(ele.json)
|
|
elif line[0] in ['%', '|']:
|
|
self.json = {}
|
|
count = int(line[1:])
|
|
for i in range(count):
|
|
field = Response(f, line_counter)
|
|
# The server allows fields to be non-strings but JSON doesn't.
|
|
# Luckily, for any kind of response we can validate, the fields are
|
|
# always strings (example: XINFO STREAM)
|
|
# The reason we can't always convert to string is because of DEBUG PROTOCOL MAP
|
|
# which anyway doesn't have a schema
|
|
if isinstance(field.json, str):
|
|
field = field.json
|
|
value = Response(f, line_counter)
|
|
self.json[field] = value.json
|
|
if line[0] == '|':
|
|
# We don't care about the attributes, read the real response
|
|
real_res = Response(f, line_counter)
|
|
self.__dict__.update(real_res.__dict__)
|
|
|
|
|
|
def __str__(self):
|
|
return json.dumps(self.json)
|
|
|
|
|
|
def process_file(docs, path):
|
|
"""
|
|
This function processes a single file generated by logreqres.c
|
|
"""
|
|
line_counter = [0] # A list with one integer: to force python to pass it by reference
|
|
command_counter = dict()
|
|
|
|
print(f"Processing {path} ...")
|
|
|
|
# Convert file to StringIO in order to minimize IO operations
|
|
with open(path, "r", newline="\r\n", encoding="latin-1") as f:
|
|
content = f.read()
|
|
|
|
with io.StringIO(content) as fakefile:
|
|
while True:
|
|
try:
|
|
req = Request(fakefile, docs, line_counter)
|
|
if not req.argv:
|
|
# EOF
|
|
break
|
|
res = Response(fakefile, line_counter)
|
|
except json.decoder.JSONDecodeError as err:
|
|
print(f"JSON decoder error while processing {path}:{line_counter[0]}: {err}")
|
|
print(traceback.format_exc())
|
|
raise
|
|
except Exception as err:
|
|
print(f"General error while processing {path}:{line_counter[0]}: {err}")
|
|
print(traceback.format_exc())
|
|
raise
|
|
|
|
if not req.command:
|
|
# Unknown command
|
|
continue
|
|
|
|
command_counter[req.command] = command_counter.get(req.command, 0) + 1
|
|
|
|
if res.error or res.queued:
|
|
continue
|
|
|
|
if req.command in IGNORED_COMMANDS:
|
|
continue
|
|
|
|
try:
|
|
jsonschema.validate(instance=res.json, schema=req.schema, cls=schema_validator)
|
|
except (jsonschema.ValidationError, jsonschema.exceptions.SchemaError) as err:
|
|
print(f"JSON schema validation error on {path}: {err}")
|
|
print(f"argv: {req.argv}")
|
|
try:
|
|
print(f"Response: {res}")
|
|
except UnicodeDecodeError as err:
|
|
print("Response: (unprintable)")
|
|
print(f"Schema: {json.dumps(req.schema, indent=2)}")
|
|
print(traceback.format_exc())
|
|
raise
|
|
|
|
return command_counter
|
|
|
|
|
|
def fetch_schemas(cli, port, args, docs):
|
|
redis_proc = subprocess.Popen(args, stdout=subprocess.PIPE)
|
|
|
|
while True:
|
|
try:
|
|
print('Connecting to Valkey...')
|
|
r = redis.Redis(port=port)
|
|
r.ping()
|
|
break
|
|
except Exception as e:
|
|
time.sleep(0.1)
|
|
|
|
print('Connected')
|
|
|
|
cli_proc = subprocess.Popen([cli, '-p', str(port), '--json', 'command', 'docs'], stdout=subprocess.PIPE)
|
|
stdout, stderr = cli_proc.communicate()
|
|
docs_response = json.loads(stdout)
|
|
|
|
for name, doc in docs_response.items():
|
|
if "subcommands" in doc:
|
|
for subname, subdoc in doc["subcommands"].items():
|
|
docs[subname] = subdoc
|
|
else:
|
|
docs[name] = doc
|
|
|
|
redis_proc.terminate()
|
|
redis_proc.wait()
|
|
|
|
|
|
if __name__ == '__main__':
|
|
# Figure out where the sources are
|
|
srcdir = os.path.abspath(os.path.dirname(os.path.abspath(__file__)) + "/../src")
|
|
testdir = os.path.abspath(os.path.dirname(os.path.abspath(__file__)) + "/../tests")
|
|
|
|
parser = argparse.ArgumentParser()
|
|
parser.add_argument('--server', type=str, default='%s/valkey-server' % srcdir)
|
|
parser.add_argument('--port', type=int, default=6534)
|
|
parser.add_argument('--cli', type=str, default='%s/valkey-cli' % srcdir)
|
|
parser.add_argument('--module', type=str, action='append', default=[])
|
|
parser.add_argument('--verbose', action='store_true')
|
|
parser.add_argument('--fail-commands-not-all-hit', action='store_true')
|
|
parser.add_argument('--fail-missing-reply-schemas', action='store_true')
|
|
args = parser.parse_args()
|
|
|
|
docs = dict()
|
|
|
|
# Fetch schemas from a Valkey instance
|
|
print('Starting Valkey server')
|
|
redis_args = [args.server, '--port', str(args.port)]
|
|
for module in args.module:
|
|
redis_args += ['--loadmodule', 'tests/modules/%s.so' % module]
|
|
|
|
fetch_schemas(args.cli, args.port, redis_args, docs)
|
|
|
|
# Fetch schemas from a sentinel
|
|
print('Starting Valkey sentinel')
|
|
|
|
# Sentinel needs a config file to start
|
|
config_file = "tmpsentinel.conf"
|
|
open(config_file, 'a').close()
|
|
|
|
sentinel_args = [args.server, config_file, '--port', str(args.port), "--sentinel"]
|
|
fetch_schemas(args.cli, args.port, sentinel_args, docs)
|
|
os.unlink(config_file)
|
|
|
|
missing_schema = [k for k, v in docs.items()
|
|
if "reply_schema" not in v and k not in IGNORED_COMMANDS]
|
|
if missing_schema:
|
|
print("WARNING! The following commands are missing a reply_schema:")
|
|
for k in sorted(missing_schema):
|
|
print(f" {k}")
|
|
if args.fail_missing_reply_schemas:
|
|
print("ERROR! at least one command does not have a reply_schema")
|
|
sys.exit(1)
|
|
|
|
start = time.time()
|
|
|
|
# Obtain all the files to processes
|
|
paths = []
|
|
for path in glob.glob('%s/tmp/*/*.reqres' % testdir):
|
|
paths.append(path)
|
|
|
|
for path in glob.glob('%s/cluster/tmp/*/*.reqres' % testdir):
|
|
paths.append(path)
|
|
|
|
for path in glob.glob('%s/sentinel/tmp/*/*.reqres' % testdir):
|
|
paths.append(path)
|
|
|
|
counter = collections.Counter()
|
|
# Spin several processes to handle the files in parallel
|
|
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
|
|
func = partial(process_file, docs)
|
|
# pool.map blocks until all the files have been processed
|
|
for result in pool.map(func, paths):
|
|
counter.update(result)
|
|
command_counter = dict(counter)
|
|
|
|
elapsed = time.time() - start
|
|
print(f"Done. ({timedelta(seconds=elapsed)})")
|
|
print("Hits per command:")
|
|
for k, v in sorted(command_counter.items()):
|
|
print(f" {k}: {v}")
|
|
not_hit = set(set(docs.keys()) - set(command_counter.keys()) - set(IGNORED_COMMANDS))
|
|
if not_hit:
|
|
if args.verbose:
|
|
print("WARNING! The following commands were not hit at all:")
|
|
for k in sorted(not_hit):
|
|
print(f" {k}")
|
|
if args.fail_commands_not_all_hit:
|
|
print("ERROR! at least one command was not hit by the tests")
|
|
sys.exit(1)
|