Files
joshua-spacetime 1f0e1271a8 Pipeline js module operations (#4962)
# Description of Changes

The core motivation for this change is simple: avoid cross-thread
handoffs and synchronization on the main execution path.

Before this change, the ingress task for each websocket connection would
wait for a completion response on each request before submitting the
next request to the database. This was mainly used to guarantee that we
delivered message responses in receive-order per connection. However it
also meant that for every request, we notified a waiting Tokio task,
which potentially incurred kernel-assisted wakeup and scheduler
overhead.

Note this design existed mainly for historical reasons. Before the
database had a dedicated job thread, requests were not serialized
through a single queue. The module instance was gated behind a semaphore
which guaranteed mutual exclusion, but it did not guarantee FIFO
ordering. Awaiting the completion of each request in `ws_recv_task` was
therefore the mechanism that enforced per-connection receive-order
semantics. However it now serves primarily as a source of overhead.

Procedures are the important exception. They are not serialized through
the main worker queue. Instead they use their own instance pool so as to
be able to run concurrently with other requests. However procedures may
be composed of multiple transactions and they may effectively yield
between transactions. This means that before this change, if a procedure
were to yield, it would effectively block all subsequent requests from
that client until it returned which is quite undesirable.

So with this change, procedures may execute out of order with other
operations received on the same WebSocket. Hence if this is not a
desirable property, clients must enforce ordering themselves by waiting
for a response before submitting the next request.

## What changed?

### 1. Different instance managers for procedures and everything else

Procedures use a bounded instance pool where each instance is backed by
an isolate running in a thread. Reducers and all other operations are
serialized through an mpsc queue that feeds a single isolate running in
a thread.

Trapped isolates are replaced inline. Only a fatal error within one of
the instance threads results in the `ModuleHost` and all its connections
being dropped. The host controller will recreate a new `ModuleHost`
lazily on the next request.

### 2. New enqueue-only `ModuleHost` interface

`ClientConnection` now calls enqueue-only methods on `ModuleHost` which
return immediately after enqueuing on the main instance lane or in the
case of a procedure, checking out an available instance and starting the
operation.

### 3. Separate `ModuleHost` interfaces for scheduled reducers and
scheduled procedures

Scheduled reducers now target the main js instance/worker, while
scheduled procedures go through the pool. The scheduler now
distinguishes between reducers and procedures and calls the appropriate
method.

Note, the scheduler does not pipeline its operations. It waits for each
one to complete before scheduling the next operation. This means that a
long running procedure will block all other operations from being
scheduled. This will need to be fixed at some point, but this patch
doesn't change the current behavior.

### 4. Misc

This patch also names the main js worker thread for better diagnostics.
It also disables core pinning by default and makes it an explicit
opt-in.

This last one is pretty important. The current architecture reduces
thread and context switching significantly such that naive core pinning
may perform worse than just deferring to the OS scheduler on certain
platforms. As it stands, the main motivation which led us to our
original core pinning strategy no longer exists, so we should probably
just defer to the OS until we've designed a proper scheduler that suits
our needs.

# API and ABI breaking changes

As mentioned above, with this change, procedures may execute out of
order with other operations received on the same WebSocket. Hence if
this is not a desirable property, clients must enforce ordering
themselves by waiting for a response before submitting the next request.

# Expected complexity level and risk

4

# Testing

This is mainly a performance oriented refactor, so no additional
correctness tests were added. However this patch does touch a lot of
code that could probably use more coverage in general. Benchmarks were
run to verify expected performance characteristics.

---------

Signed-off-by: joshua-spacetime <josh@clockworklabs.io>
Co-authored-by: Noa <coolreader18@gmail.com>
2026-05-07 01:29:32 +00:00
..
2026-05-07 01:29:32 +00:00
2025-08-12 18:20:58 +00:00

⚠️ Unstable Crate ⚠️

The interface of this crate is not stable and may change without notice.