Files
Phoebe Goldman aa295825f6 async-ify Wasmtime (and v8) execution (#3263)
# Description of Changes

In service of adding procedures, which will need to execute WASM code in
an `async` environment so as to suspend execution while making HTTP
requests and whatnot.

Prior to this PR, `JobCores` worked by spawning an OS thread for each
database. These threads would then each be pinned to a specific core,
and in multi-tenant deployments multiple threads could be pinned to the
same core. Now, instead, we spawn one thread per available core at
startup. Each of these threads runs a single-threaded Tokio executor.
Each database is assigned to one of these executors, and runs tasks on
it via `tokio::spawn`.

When we run without core pinning (usually due to having too few hardware
cores), we won't spawn any additional threads or Tokio runtimes at all;
instead we will run database jobs on the "global" Tokio executor. These
jobs may block Tokio worker threads, which might be an issue if a very
core-constrained device runs multiple databases with very long-running
reducers. If this is an issue, we could in this case instead build a
second Tokio runtime only for running database jobs, and let the OS
scheduler figure things out like it did previously.

Previously, we implemented load-balancing among the database cores by
occasionally instructing per-database threads to re-pin themselves. Now,
we instead periodically send the database a new
`wasmtime::runtime::Handle`, which they will `spawn` future jobs into.

Previously, it was possible for a database thread to become canceled,
most likely as a result of `ModuleHost::exit`, after which calls would
fail with `NoSuchModule`. Cancellation is no longer meaningful, as the
database holds a `Handle` to a long-lived `tokio::runtime::Runtime`,
which should always outlive the `ModuleHost`. I have added an
`AtomicBool` flag to `ModuleHost` which is flipped by `exit` and checked
by calls to maintain the previous functionality.

Within this PR, the jobs run on the database-execution Tokio tasks are
not actually asynchronous; they will never yield. This is important
because these jobs may (will) hold a transaction open, and attempting to
swap execution to another job which wants a transaction on the same
database would be undesirable.

Note that this may regress our multi-tenant performance / fairness:
previously, in multi-tenant environments, the OS scheduler would divide
the database cores' time between the per-database threads, potentially
causing one high-load database to be interrupted in the middle of a
reducer in order to run other databases pinned to the same core. Now, a
high-load database will instead run its entire reducer to completion
before any other database gets to run.

We could, in the future, change this by instructing Wasmtime to yield
periodically, either via [epoch
interruption](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.epoch_deadline_async_yield_and_update)
or
[fuel](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.fuel_async_yield_interval),
both of which we're already configuring Wasmtime to track. We'd need (or
at least want) to (re-)introduce a queue s.t. we only attempt to run one
job for each database at a time. I have chosen not to do so within this
patch because I felt the changeset was complex enough already, and we
have so far not treated fairness in multi-tenant environments as a high
priority.

I have also reworked our module host machinery to no longer use dynamic
dispatch and trait polymorphism to manage modules and their instances,
and instead introduced `enum Module` and `enum Instance`, each of which
has a variant for Wasm and another for V8.

During this rewrite, I reworked `AutoReplacingModuleInstance`, which
previously used type-erased trait generics in a way that was brittle and
hard to re-use in the new `async` context. (Specifically, the module
instance no longer lives on the job thread, rather, the database grabs
the instance and sends it to the job thread, then gets it back when the
job exits. This is necessary to allow the re-worked load balancing
described above, as we can't have a single long-lived async task.) While
refactoring, I replaced it with `ModuleInstanceManager`, which can now
maintain multiple instances of the same module. This is not yet useful,
but will become necessary with procedures, as each concurrent procedure
will need its own instance. Relatedly, I changed
`ModuleHost::on_module_thread` (used by one-off and initial subscription
queries) to no longer acquire the/an instance. I discussed this with the
team, and consensus was that "locking" the module instance in that path
was not a useful behavior, just an artifact of the previous
implementation.

I have also switched our Wasmtime configuration to set
`async_support(true)`. This causes a variety of methods, notably
`InstancePre::instantiate` and `TypedFunc::call`, to panic, and requires
that we instead call their `_async` variants. As mentioned above, I have
not yet introduced any actual asynchronicity or concurrency, so these
methods should never yield. Rather than `.await`ing their futures, I
have defined a degenerate `async` executor, `poll_once_executor`, which
polls a future exactly once, failing if it does not return
`Poll::Ready`. This means that we will panic if one of these futures
returns `Poll::Pending` unexpectedly.

The previous `trait Module` had a method `initial_instances`. `Module`
is now a concrete type, and I gave it this method, but it appears to be
unused. This is causing lints to fail. I am unsure what, if anything,
that method was for.

The previous `AutoReplacingModuleInstance` called `create_instance` on
the job thread. I am unsure if this was intentional, or just an artifact
of the previous implementation, where the `AutoReplacingModuleInstance`
lived on the job thread. I have written the new `ModuleInstanceManager`
to call `create_instance` on the calling thread, but it would be easy to
move that call into the job executor if that behavior is desired.

# API and ABI breaking changes

None user-facing

# Expected complexity level and risk

4: significant rewrite of performance-sensitive fiddly concurrency code.

Note specifically in above description:
- Running database jobs on the global Tokio runtime when not using core
pinning.
- Multi-tenant fairness issue: no longer possible to interrupt a
performance-intensive database mid-reducer to run another database
pinned to the same core.
- Unused method `module_instances`.
- Running `create_instance` on the calling thread rather than the
database thread.

# Testing

<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->

- [x] Will arrange for a bot test.
- [ ] Determine to what extent we can run with real or synthetic
multi-tenant load in a test or staging environment.

---------

Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
2025-10-08 17:02:56 +00:00
..
2025-06-06 18:37:35 +00:00
2024-05-08 14:20:12 +00:00