# Description of Changes
Uses `jsonwebtoken v10.4.0` instead. Important changes include:
**1. Token serialization**
Old tokens with `"exp": null` are still accepted, but new no-expiry
tokens now omit `exp` instead of serializing it as `"exp": null`.
**2. OIDC/JWKS validation**
Issuer extraction now uses `jsonwebtoken::dangerous::insecure_decode`
for key discovery only, not validation. And the old `spacetimedb-jwks`
crate required every JWK to have a `kid`, but this patch does not
preserve that limitation.
# API and ABI breaking changes
I don't believe this is considered breaking, but it bears repeating that
new no-expiry tokens now serialize without `exp` instead of `"exp":
null`.
# Expected complexity level and risk
2
# Testing
- [x] Verify a legacy no-expiry token serialized as `"exp": null` still
validates.
- [x] Verify a token with an expired `exp` is still rejected.
- [x] Verify OIDC/JWKS validation works when the JWKS keys omit the
optional `kid` field.
# Description of Changes
A sort of followup to/unrevert of #4884, an alternative to #4927.
In #3832, we made FunctionBudget a different unit than EnergyQuanta, but
there were still many places where we treated them the same and directly
converted between them. I got confused by that in #4884, and now this PR
properly separates them and corrects the v8 energy calculation.
Also, since we now benchmark the same module in rust and typescript in
the form of the keynote benchmark, this patch adds a new assertion to
that test that the fuel usage of both modules is within 2X of each
other.
# Expected complexity level and risk
3 - touches energy calculation without reverting the problematic #4884,
but I'm confident it's correct this time.
# Testing
- [x] Verified that the conversion factors make sense for wasmtime and
for v8.
- [x] Assert that typescript and rust keynote bench runs have similar
cpu usage
---------
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
# Description of Changes
Adds support to Rust modules and the SpacetimeDB host for defining HTTP
handlers and registering them to routes.
## User-facing API
In a Rust module, users can annotate functions with the new macro
`#[spacetimedb::http::handler]`. A function annotated this way must
accept exactly two arguments, of types `&mut
spacetimedb::http::HandlerContext` and `spacetimedb::http::Request`
(which is a type alias for `http::Request<spacetimedb::http::Body>`. It
must also return `spacetimedb::http::Response` (which is a type alias
for `http::Response<spacetimedb::http::Body>`).
Once the user has defined an HTTP handler, they can register it to a
route by annotating a function with `#[spacetimedb::http::router]`. Such
a function must take no arguments and return a
`spacetimedb::http::Router`. (The original design put this annotation on
a `static` variable rather than a function, but that turned out to be
undesirable because it required that constructing a `Router` be
`const`.) `Router` exposes various methods for registering handlers to
routes.
All of a database's user-defined routes are exposed under
`/v1/database/:name_or_identity/route/{*path}`.
## Example
See [the new
smoketest](https://github.com/clockworklabs/SpacetimeDB/blob/phoebe/http-handlers-webhooks/crates/smoketests/tests/smoketests/http_routes.rs)
for a more exhaustive example.
A simpler example, which stores arbitrary byte data in a table via a
`POST` request, returns an ID, and then retrieves that same data via a
`GET` request with a query parameter:
```rust
#[spacetimedb::table(accessor = data)]
struct Data {
#[primary_key]
#[auto_inc]
id: u64,
body: Vec<u8>,
}
#[spacetimedb::http::handler]
fn insert(ctx: &mut HandlerContext, request: Request) -> Response {
let body: Vec<u8> = request.into_body().into_bytes().into();
let id = ctx.with_tx(|tx| tx.db.data().insert(Data { id: 0, body: body.clone() }).id);
Response::new(Body::from_bytes(format!("{id}")))
}
#[spacetimedb::http::handler]
fn retrieve(ctx: &mut HandlerContext, request: Request) -> Response {
let id = request
.uri()
.query()
.and_then(|query| query.strip_prefix("id="))
.and_then(|id| u64::from_str(id).ok())
.unwrap();
let body = ctx.with_tx(|tx| tx.db.data().id().find(id).map(|data| data.body));
if let Some(body) = body {
Response::new(Body::from_bytes(body))
} else {
Response::builder().status(404).body(Body::empty()).unwrap()
}
}
#[spacetimedb::http::router]
fn router() -> Router {
Router::new().post("/insert", insert).get("/retrieve", retrieve)
}
```
## Design and implementation notes
- As mentioned above, the router is registered via a function, not a
`static` or `const` item. This is because `static` or `const`
initializers must be `const`, and it turns out to be a pain to make all
of the `Router` constructors be `const fn`s.
- The `#[handler]` macro clobbers the original function name with a
`const` variable of type `HttpHandler`. This is unfortunate, but AFAICT
necessary, 'cause we need to pass the string identifier for the handler
to the `Router`, not the function pointer, and Rust allows no (stable
and reliable) way to get a unique string identifier out of a function
item/value, nor to attach data or implement traits for function
items/values. The alternative(s) would involve changing the signature of
the `Router` methods to have uglier and more complex callsites, e.g.
like `.get("/retrieve", retrieve::handler())`, `.get("/retrieve",
handler!(retrieve))` or `.get::<retrieve>("/retrieve")`. I believe that
registering handlers will be much more common than calling their
functions, so I've chosen to make it so that registering them gets the
convenient syntax, even though the inability to call them directly will
be somewhat surprising.
- I haven't wired up energy handling or timing metrics for handler
execution to anywhere. Procedures are still in the same boat.
- HTTP requests to user-defined routes bypass the usual SpacetimeDB auth
middleware, meaning that the host does not validate (or inspect in any
way) `Authorization` headers in requests before invoking the
user-defined handler. This is required to allow arbitrary
user-programmable handling of `Authorization` headers, including those
in formations which SpacetimeDB would reject. As a result of this,
`HandlerContext` doesn't expose a `sender` or `sender_connection_id`.
- HTTP route paths may consist only of a very restrictive set of
characters. I've chosen this set to keep our options open in the future
to add additional syntax to routes, like for registering wildcard
segments and path parameters:
- ASCII digits.
- ASCII letters.
- `-_~/`.
- The internal data structure that represents a `Router` is currently a
`Vec<Route>`, meaning that resolving a request to a route is
`O(num_routes)`. Registering a route checks against each previous route
for uniqueness, meaning that constructing a router is `O(num_routes ^
2)`. There are TODO comments to use a trie, but I think this can wait,
as I expect most databases to register few routes.
- Commit 999a7c317 contains a fix to a mostly-unrelated bug where a few
bindings introduced by the SATS derive macros were unhygienic and not in
a reserved namespace, leading to name conflicts. I discovered this
'cause I tried writing an HTTP handler named `index` to serve the
index/root of a website and it broke.
## Still TODO
- [x] Resolve various TODO comments in the diff.
- [x] Documentation.
- [x] C# bindings support.
- [x] C++ bindings support.
- [x] V8 host support.
- [x] TypeScript bindings support.
# API and ABI breaking changes
New APIs, currently flagged as `unstable`, which will eventually need
stability guarantees. No (intentional) breaking changes, or changes to
existing APIs at all.
# Expected complexity level and risk
3? Changes to our HTTP routing to support the user-defined routes.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] New smoketest of the behavior!
- [x] I dunno, maybe try hosting a simple webpage and see how it works?
- [x] Build a test app with Stripe integration.
- @aasoni did this.
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Jason Larabie <jason@clockworklabs.io>
# Description of Changes
In keeping with the "pipeline everything" approach of SpacetimeDB, this
patch serializes multiple client updates in a single websocket message
using the v3 protocol.
# API and ABI breaking changes
None. `spacetime subscribe` was updated to use the v3 websocket api, but
it falls back to v1 if protocol negotiation fails.
# Expected complexity level and risk
2
# Testing
This patch updates `spacetime subscribe` to use the v3 websocket
protocol by default in order to get adequate coverage via the
smoketests.
# Description of Changes
In keeping with the "pipeline everything" ethos, I've replaced `recv`
with `recv_many` on `ClientConnectionReceiver` so that a client
connection's websocket send path works on a batch of messages at a time,
if possible.
# API and ABI breaking changes
None
# Expected complexity level and risk
1.5
# Testing
Refactor
# Description of Changes
The core motivation for this change is simple: avoid cross-thread
handoffs and synchronization on the main execution path.
Before this change, the ingress task for each websocket connection would
wait for a completion response on each request before submitting the
next request to the database. This was mainly used to guarantee that we
delivered message responses in receive-order per connection. However it
also meant that for every request, we notified a waiting Tokio task,
which potentially incurred kernel-assisted wakeup and scheduler
overhead.
Note this design existed mainly for historical reasons. Before the
database had a dedicated job thread, requests were not serialized
through a single queue. The module instance was gated behind a semaphore
which guaranteed mutual exclusion, but it did not guarantee FIFO
ordering. Awaiting the completion of each request in `ws_recv_task` was
therefore the mechanism that enforced per-connection receive-order
semantics. However it now serves primarily as a source of overhead.
Procedures are the important exception. They are not serialized through
the main worker queue. Instead they use their own instance pool so as to
be able to run concurrently with other requests. However procedures may
be composed of multiple transactions and they may effectively yield
between transactions. This means that before this change, if a procedure
were to yield, it would effectively block all subsequent requests from
that client until it returned which is quite undesirable.
So with this change, procedures may execute out of order with other
operations received on the same WebSocket. Hence if this is not a
desirable property, clients must enforce ordering themselves by waiting
for a response before submitting the next request.
## What changed?
### 1. Different instance managers for procedures and everything else
Procedures use a bounded instance pool where each instance is backed by
an isolate running in a thread. Reducers and all other operations are
serialized through an mpsc queue that feeds a single isolate running in
a thread.
Trapped isolates are replaced inline. Only a fatal error within one of
the instance threads results in the `ModuleHost` and all its connections
being dropped. The host controller will recreate a new `ModuleHost`
lazily on the next request.
### 2. New enqueue-only `ModuleHost` interface
`ClientConnection` now calls enqueue-only methods on `ModuleHost` which
return immediately after enqueuing on the main instance lane or in the
case of a procedure, checking out an available instance and starting the
operation.
### 3. Separate `ModuleHost` interfaces for scheduled reducers and
scheduled procedures
Scheduled reducers now target the main js instance/worker, while
scheduled procedures go through the pool. The scheduler now
distinguishes between reducers and procedures and calls the appropriate
method.
Note, the scheduler does not pipeline its operations. It waits for each
one to complete before scheduling the next operation. This means that a
long running procedure will block all other operations from being
scheduled. This will need to be fixed at some point, but this patch
doesn't change the current behavior.
### 4. Misc
This patch also names the main js worker thread for better diagnostics.
It also disables core pinning by default and makes it an explicit
opt-in.
This last one is pretty important. The current architecture reduces
thread and context switching significantly such that naive core pinning
may perform worse than just deferring to the OS scheduler on certain
platforms. As it stands, the main motivation which led us to our
original core pinning strategy no longer exists, so we should probably
just defer to the OS until we've designed a proper scheduler that suits
our needs.
# API and ABI breaking changes
As mentioned above, with this change, procedures may execute out of
order with other operations received on the same WebSocket. Hence if
this is not a desirable property, clients must enforce ordering
themselves by waiting for a response before submitting the next request.
# Expected complexity level and risk
4
# Testing
This is mainly a performance oriented refactor, so no additional
correctness tests were added. However this patch does touch a lot of
code that could probably use more coverage in general. Benchmarks were
run to verify expected performance characteristics.
---------
Signed-off-by: joshua-spacetime <josh@clockworklabs.io>
Co-authored-by: Noa <coolreader18@gmail.com>
# Description of Changes
Disclaimer: This description was written by claude:
The two `.leader()` call sites in
[`client-api/src/routes/database.rs`](crates/client-api/src/routes/database.rs)
and
[`client-api/src/routes/subscribe.rs`](crates/client-api/src/routes/subscribe.rs)
pipe `GetLeaderHostError` through `log_and_500`, which:
1. Logs every error variant at `error` level — including `Suspended`,
`Bootstrapping`, `NoLeader`, etc., which are normal operational states.
This produces noisy log lines like:
```
ERROR /app/.../client-api/src/lib.rs:623: internal error: database is
suspended
```
2. Forces every such error into a **500 Internal Server Error**
response, even when the appropriate status code is something else (e.g.
503 Service Unavailable for a suspended database).
`GetLeaderHostError` already implements
`Into<axum::response::ErrorResponse>` with the correct per-variant
mapping:
| Variant | Status |
|---|---|
| `NoSuchDatabase` | 404 Not Found |
| `LaunchError`, `Misdirected` | 500 Internal Server Error |
| `NoNodeId`, `NoLeader`, `ControlConnection`, `Suspended`,
`Bootstrapping` | 503 Service Unavailable |
The standalone implementation already uses `?`-propagation directly.
This PR makes the two client-api call sites match that pattern.
## Result
- Suspended / bootstrapping / no-leader databases now return **503**
instead of 500.
- These expected states no longer produce `error`-level log spam in the
request path. Genuinely unexpected internal errors elsewhere in the
codebase continue to log via `log_and_500` unchanged.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [ ] Deploy to staging to see if we still see this error when trying to
access a suspended database
# Description of Changes
Revert the following PRs that have caused some breakage:
```
a32cffa76 Finish refactoring out replay (#4850)
d639be0af Replay: some code motion & reuse `ReplayCommittedState` (#4849)
78d6b6f7d Update NativeAOT-LLVM infrastructure to current ABI (#4515)
d5c1738c1 Better module backtraces for panics and whatnot (#577)
6f23b19f3 Wait for database update to become durable (#4846)
81c9eab86 Add `spacetime lock/unlock` to prevent accidental database deletion (#4502)
809aebd7c Move field `replay_table_updated` to `ReplayCommittedState` (#4807)
21b58ef99 Update axum (#2713)
b5cadff7a Extract replay stuff out of `CommittedState`, part 1 (#4804)
```
I also updated the Python smoketests for breakage introduced in
https://github.com/clockworklabs/SpacetimeDB/pull/4502. Reverting that
PR caused conflicts, so this fix is more straightforward.
# API and ABI breaking changes
Maybe kind of, but we haven't released any of these.
# Expected complexity level and risk
1
# Testing
Ask @bfops about testing
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Confirmed reads applies only to subscription clients, calls to the the
HTTP API publish endpoint return a success response before the operation
is confirmed.
While we await scheduling of a new database, updates require to wait for
the update transaction to be confirmed. To allow this, the
`TransactionOffset` channel and the database's `DurableOffset` need to
be returned all the way up to the request handler.
Note that waiting for confirmation is almost always the right choice, so
can't be opted out of at the time of submission of this patch. Callers
may, however, extend the timeout after which waiting for confirmation is
cancelled.
## Motivation
Feature request: "Is there any way we can lock a module to prevent it
from being deleted? A bit concerned about some fat finger risk of
accidentally deleting prod."
## Solution
Adds a database lock mechanism. A locked database cannot be deleted
until explicitly unlocked.
### New CLI Commands
```bash
# Lock a database to prevent deletion
spacetime lock my-database
# Attempt to delete a locked database (fails with 403)
spacetime delete my-database
# Error: Database is locked and cannot be deleted. Run \`spacetime unlock\` first.
# Unlock when you actually need to delete
spacetime unlock my-database
spacetime delete my-database
```
Both commands support `--server` and `--no-config` flags, and resolve
the database from `spacetime.json` when no argument is given (same as
`spacetime delete`).
### New HTTP API
- `POST /v1/database/:name_or_identity/lock` -- Lock a database
- `POST /v1/database/:name_or_identity/unlock` -- Unlock a database
Both require the same authorization as `DELETE` (owner only).
### Implementation
- Lock state stored in a separate `database_locks` sled tree in the
standalone control DB (avoids changing the `Database` struct and needing
a data migration)
- `ControlStateReadAccess::is_database_locked()` and
`ControlStateWriteAccess::set_database_lock()` added to the trait
- `delete_database` route checks lock state before proceeding; returns
`403 Forbidden` with a descriptive message if locked
- Locking is idempotent (locking an already-locked database is a no-op,
same for unlock)
- Lock only prevents deletion, not publishing updates
### What is NOT locked
- `spacetime publish` (updating module code) still works on locked
databases
- Only `spacetime delete` is blocked
This matches the intent: protect prod from accidental destruction while
allowing normal deployments.
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
# Description of Changes
~~Axum now has what we need out of it for a websocket wrapper, so we no
longer need to duplicate `util/flat_csv.rs` and `util/websocket.rs`,
meaning we have less code to maintain.~~ Nevermind, we know send raw
frames, which axum's wrapper does not support. Makes this PR simpler.
# Expected complexity level and risk
2 - upgrading a dependency is a risk, but looking through the
[changelog](https://github.com/tokio-rs/axum/blob/main/axum/CHANGELOG.md#080)
there isn't anything that should affect us.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] tests pass <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
## Summary
The `/v1/database/:name_or_identity/sql` endpoint now calls the module's
`client_connected` reducer before executing SQL, and
`client_disconnected` after. This allows module authors to accept or
reject SQL connections based on the caller's identity, matching the
existing behavior of the `/call` endpoint.
## Motivation
Previously, the `/sql` endpoint bypassed the module's `onConnect` hook
entirely, meaning module authors had no way to restrict who could run
SQL queries against their database. The `/call` endpoint already runs
the connect hook, so this brings `/sql` to parity.
## Changes
- `crates/client-api/src/routes/database.rs`: The `sql` handler now:
1. Generates a random connection ID
2. Calls `module.call_identity_connected()` before executing SQL
3. Executes the SQL query
4. Calls `module.call_identity_disconnected()` after
5. If `client_connected` rejects the connection, returns 403 Forbidden
without executing the query
- `sql_direct()` is unchanged since it is also used by the pgwire
server, which has its own connection lifecycle.
## Behavior
- If the module defines a `client_connected` reducer that throws/errors
for a given identity, the SQL request returns `403 Forbidden`
- If no `client_connected` reducer is defined, behavior is unchanged
- The connection is always cleaned up via `client_disconnected` after
the query completes
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
The `v3` WebSocket API adds a thin transport layer around the existing
`v2` message schema so that multiple logical `ClientMessage`s can be
sent in a single WebSocket frame.
The motivation is throughput. In `v2`, each logical client message
requires its own WebSocket frame, which adds per-frame overhead in the
client runtime, server framing/compression path, and network stack.
High-throughput clients naturally issue bursts of requests, and batching
those requests into a single frame materially reduces that overhead
while preserving the existing logical message model.
`v3` keeps the `v2` message schema intact and treats batching as a
transport concern rather than a semantic protocol change. This lets the
server support both protocols cleanly:
- `v2` remains unchanged for existing clients
- `v3` allows new clients to batch logical messages without changing
reducer/procedure semantics
- inner messages are still processed in order
On the server side, this PR adds:
- `v3.bsatn.spacetimedb` protocol support
- `ClientFrame` / `ServerFrame` transport envelopes
- decoding of inbound batched client frames into ordered `v2` logical
messages
- v3 outbound framing on the server side
# API and ABI breaking changes
None. `v2` clients continue to work unchanged.
# Expected complexity level and risk
2
# Testing
Testing will be included in the patches that update the sdk bindings
# Description of Changes
Replace `.unwrap()` with `.map_err(log_and_500)?` on the
`get_database_by_identity()` call in the WebSocket subscribe handler
(`crates/client-api/src/routes/subscribe.rs`).
This was the only call site in the codebase using `.unwrap()` for this
method — all four other call sites in `database.rs` already use
`.map_err(log_and_500)?`. A transient database error during WebSocket
connection setup would panic the server instead of returning an HTTP
500.
Closes#4686
# API and ABI breaking changes
None.
# Expected complexity level and risk
1 — Single-token replacement matching the established pattern used
everywhere else.
# Testing
- [ ] Verified the fix matches the error-handling pattern used at all
other `get_database_by_identity` call sites
- [ ] Confirmed `log_and_500` is already imported in the file
# Description of Changes
Before this change, JS reducer requests borrowed a `JsInstance` from a
pool. If no idle instance was available, we created another instance,
which meant another V8 worker thread. Under load, this meant reducers
bouncing across multiple OS threads.
After this change, JS reducers go through a single long-lived
`JsInstance` fed by a FIFO queue which results in much better cache
locality. More accurately, each module now allocates a single OS thread,
on which reducers (and most operations) run. Modules do not share
workers/threads. And modules do not create multiple threads for running
reducers.
Note, the original instance pool is still used for procedures. It should
probably be bounded, but I didn't make any changes to it. It's also used
for executing views during initial subscription to avoid a reentrancy
deadlock. The latter should be fixed and moved over to the JS worker
thread at some point.
# API and ABI breaking changes
N/A
# Expected complexity level and risk
4
# Testing
```
NODE_OPTIONS="--max-old-space-size=8192" \
MAX_INFLIGHT_PER_WORKER=512 \
BENCH_PRECOMPUTED_TRANSFER_PAIRS=1000000 \
pnpm bench test-1 --seconds 10 --concurrency 50 --alpha 1.5 --connectors spacetimedb
```
```
50K TPS -> 85K TPS on m2 mac
```
# Description of Changes
This patch removes the dead legacy SQL query engine and the remaining
code that only existed to support it.
Removed:
- Old SQL compiler/type-checker and VM-based execution path in
spacetimedb-core
- `spacetimedb-vm` crate
- Dead vm specific error variants and compatibility code
- Obsolete tests, benchmarks, and config paths that still referenced the
legacy engine
Small pieces still used by the current engine were moved to their proper
homes instead of keeping the `vm` crate around. In particular,
`RelValue` was moved to `spacetimedb-execution`.
The `sqltest` crate was also updated to use the current engine. Notably
though these tests are not run in CI, however I've kept them around as
they may be beneficial as we look to expand our SQL support in the
future.
Requires codeowner review from @cloutiertyler due to the removal of the
`LICENSE` file in the (now removed) `vm` crate.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
None
---------
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Closes#4635
In `crates/client-api/src/routes/database.rs` line 1111,
`lookup_database_identity` returns a `Result` but the error was handled
with `.unwrap()`, which panics instead of returning an HTTP 500.
Replaced it with `.map_err(log_and_500)?` to match the rest of the file.
---------
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Upgrades the rust SDK's prometheus dependency from 0.13 to 0.14.
Fixes https://github.com/clockworklabs/SpacetimeDB/issues/4597
# API and ABI breaking changes
[The prometheus
changelog](https://github.com/tikv/rust-prometheus/blob/master/CHANGELOG.md#0140)
claims that the MSRV for the new version is 1.82, but this project
doesn't seem to have an official MSRV, so I don't think that's an ABI
change.
I don't think depending on a different prometheus version is itself a
breaking change. Prometheus is exposed through the rust SDK, but in an
explicitly [unstable
module](https://github.com/clockworklabs/SpacetimeDB/blob/3f58b5951bf3c49971c51aecb526439597b9c044/sdks/rust/src/lib.rs#L69-L76)
which "may change incompatibly without a major version bump". Prometheus
structs are also exposed from several crates, but with the same
disclaimers about unstable interfaces.
# Expected complexity level and risk
1. This is a module bump with simple-looking changes.
# Testing
- [x] Just confirmed everything still compiles and the tests still pass
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
Fixes a bug in client disconnect logic that would mark a client's views
as dropped(unsubscribed). However it was marking the identity's views as
dropped, not the connection. So if an identity had multiple connections
open, each subscribing to different views, and one of them disconnected,
the subscriptions for the other connections would break. The observed
behavior would be that they would stop receiving subscription updates.
This could potentially lead to their client cache getting into a
corrupted state.
Now, instead of dropping all of the views for a particular identity on
disconnect, we drop only the views for that particular connection. And
when I say drop, what I really mean is decrement. A view is not dropped
completely unless it no longer had any subscribers.
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
Regression smoketest was added
## Summary
When hitting `/v1/schema` while a database is still loading (replaying
the log, running init reducers, etc.), the endpoint returned a 500 error
because the module host was not yet available.
## Changes
- Add `Host::wait_for_module(timeout)` in `crates/client-api/src/lib.rs`
-- polls `get_module_host` with exponential backoff (100ms, 200ms,
400ms, 800ms, 1s, 1s, ...) up to the given timeout
- Update the `/v1/schema` route to use `wait_for_module(10s)` instead of
the immediate `module()` call
If the database finishes loading within 10 seconds, the schema is
returned normally. If it does not load in time, the existing 500 error
is returned (same behavior as before, just delayed).
No other routes are changed -- this is scoped to the schema endpoint per
the issue description. Other routes (SQL, call, etc.) could adopt the
same pattern if needed.
Fixesclockworklabs/SpacetimeDBPrivate#2748
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
# Description of Changes
Add `?version=10` as an option to
`/v1/database/:name-or-identity/schema`, where previously only
`?version=9` was supported. This seems to have been forgotten when we
introduced `RawModuleDefV10`.
Also, in an unrelated minor fixup, fix a copy-paste error in a doc
comment in the V2 WebSocket format definition.
# API and ABI breaking changes
Additive extension to HTTP API.
# Expected complexity level and risk
1
# Testing
- [x] Did a local get against this route and got a JSON-ified
`RawModuleDefV10`:
```bash
$ curl http://localhost:3000/v1/database/chat-console-rs/schema?version=10
{"sections":[{"Typespace":{"types":[{"Product":{"elements":[{"name":{"some":"sender"},"algebraic_type":{"Product":{"elements":[{"name":{"some":"__identity__"},"algebraic_type":{"U256":[]}}]}}},{"name":{"some":"sent"},"algebraic_type":{"Product":{"elements":[{"name":{"some":"__timestamp_micros_since_unix_epoch__"},"algebraic_type":{"I64":[]}}]}}},{"name":{"some":"text"},"algebraic_type":{"String":[]}}]}},{"Product":{"elements":[{"name":{"some":"identity"},"algebraic_type":{"Product":{"elements":[{"name":{"some":"__identity__"},"algebraic_type":{"U256":[]}}]}}},{"name":{"some":"name"},"algebraic_type":{"Sum":{"variants":[{"name":{"some":"some"},"algebraic_type":{"String":[]}},{"name":{"some":"none"},"algebraic_type":{"Product":{"elements":[]}}}]}}},{"name":{"some":"online"},"algebraic_type":{"Bool":[]}}]}}]}},{"Types":[{"source_name":{"scope":[],"source_name":"Message"},"ty":0,"custom_ordering":true},{"source_name":{"scope":[],"source_name":"User"},"ty":1,"custom_ordering":true}]},{"Tables":[{"source_name":"message","product_type_ref":0,"primary_key":[],"indexes":[],"constraints":[],"sequences":[],"table_type":{"User":[]},"table_access":{"Public":[]},"default_values":[],"is_event":false},{"source_name":"user","product_type_ref":1,"primary_key":[0],"indexes":[{"source_name":{"some":"user_identity_idx_btree"},"accessor_name":{"some":"identity"},"algorithm":{"BTree":[0]}}],"constraints":[{"source_name":{"some":"user_identity_key"},"data":{"Unique":{"columns":[0]}}}],"sequences":[],"table_type":{"User":[]},"table_access":{"Public":[]},"default_values":[],"is_event":false}]},{"Reducers":[{"source_name":"identity_connected","params":{"elements":[]},"visibility":{"Private":[]},"ok_return_type":{"Product":{"elements":[]}},"err_return_type":{"String":[]}},{"source_name":"identity_disconnected","params":{"elements":[]},"visibility":{"Private":[]},"ok_return_type":{"Product":{"elements":[]}},"err_return_type":{"String":[]}},{"source_name":"init","params":{"elements":[]},"visibility":{"Private":[]},"ok_return_type":{"Product":{"elements":[]}},"err_return_type":{"String":[]}},{"source_name":"send_message","params":{"elements":[{"name":{"some":"text"},"algebraic_type":{"String":[]}}]},"visibility":{"ClientCallable":[]},"ok_return_type":{"Product":{"elements":[]}},"err_return_type":{"String":[]}},{"source_name":"set_name","params":{"elements":[{"name":{"some":"name"},"algebraic_type":{"String":[]}}]},"visibility":{"ClientCallable":[]},"ok_return_type":{"Product":{"elements":[]}},"err_return_type":{"String":[]}}]},{"LifeCycleReducers":[{"lifecycle_spec":{"Init":[]},"function_name":"init"},{"lifecycle_spec":{"OnConnect":[]},"function_name":"identity_connected"},{"lifecycle_spec":{"OnDisconnect":[]},"function_name":"identity_disconnected"}]},{"ExplicitNames":{"entries":[{"Table":{"source_name":"message","canonical_name":"message"}},{"Table":{"source_name":"user","canonical_name":"user"}},{"Function":{"source_name":"identity_connected","canonical_name":"identity_connected"}},{"Function":{"source_name":"identity_disconnected","canonical_name":"identity_disconnected"}},{"Function":{"source_name":"init","canonical_name":"init"}},{"Function":{"source_name":"send_message","canonical_name":"send_message"}},{"Function":{"source_name":"set_name","canonical_name":"set_name"}}]}}]}
```
# Description of Changes
Reducing scope of https://github.com/clockworklabs/SpacetimeDB/pull/4390
to only apply to V2 clients.
# API and ABI breaking changes
I think this is an API change?
# Expected complexity level and risk
1
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
## Summary
Enable confirmed reads by default for all WebSocket subscriptions and
SQL queries. This is a 2.0 breaking change that improves data integrity.
### What changed
Previously, subscription updates and SQL results were sent to clients
immediately, before the transaction was confirmed durable. A server
crash could cause clients to have observed data that was lost.
Now the server defaults to `confirmed=true`. Clients receive updates
only after durability is confirmed. This adds a small latency cost but
guarantees that any data a client receives will survive a server
restart.
### Changes
**Server (2 files, 2 lines each):**
- `subscribe.rs`: `SubscribeQueryParams.confirmed` defaults to `true`
- `database.rs`: `SqlQueryParams.confirmed` defaults to `true`
**Documentation:**
- Migration guide updated with "Confirmed Reads Enabled by Default"
section
- Added to overview list and quick migration checklist
### Opt-out
Clients can opt out by explicitly passing `?confirmed=false` in the
WebSocket URL or using `.withConfirmedReads(false)` /
`.WithConfirmedReads(false)` / `.with_confirmed_reads(false)` in SDKs.
### Smoketest impact
Smoketests that don't explicitly pass `--confirmed` will now get
confirmed reads via the server default. This should not cause failures
-- confirmed reads only add a small wait for durability confirmation
before sending results. The `confirmed_reads.py` smoketest explicitly
passes `--confirmed` and continues to work as before.
### SDK impact
No SDK changes needed. SDKs only send the `confirmed` query parameter
when explicitly set by the user. When not set, the server default
applies -- which is now `true`.
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <bot@clockworklabs.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
This PR fixes the bug where WASM procedures could commit a transaction
without refreshing affected materialized views, which caused view-backed
subscriptions to miss updates from procedure writes.
The equivalent changes for V8 will be made in a separate patch.
# API and ABI breaking changes
None
# Expected complexity level and risk
3
This was not a simple translation of the reducer code path, because
reducers are run in a single transaction whereas procedures can have
multiple transactions via `with_tx` and views must be refreshed with
each transaction instead of once at the end of the procedure. This
required refresh to be inserted into the syscall itself which required
some extra plumbing. Mainly that we had to attach the validated
`ModuleDef` to the `WasmInstanceEnv`. This should not affect hotswapping
because we instantiate an entirely new `WasmInstanceEnv` in that case.
# Testing
- [x] Added a regression test in the form of a smoketest that subscribes
to a view and calls a procedure
# Description of Changes
This adds the v2 websocket protocol and adds support on the server side.
For context on many of the changes/decisions, you can look at the
discussion on https://github.com/clockworklabs/SpacetimeDB/pull/4023.
To restate some of the key changes:
- The reducer event information is no longer sent with transaction
updates (because we don't want to broadcast reducer call information
anymore).
- If a client calls a reducer, they are sent a `ReducerResult` which
includes the outcome of the reducer call and and related row updates for
queries that the client is subscribed to.
- We no longer dedupe queries that appear in multiple query sets for the
same client. This is because we are moving toward per-query storage.
- Related to that, Unsubscribe requests have an option to send the
related rows. We need this for now, since clients don't have per-query
storage implemented yet.
- We don't have the json format in v2.
Notes for reviewers:
- This moves around the messages in
`crates/client-api-messages/src/websocket` (into `common`, `v1`, and
`v2`), and this renaming of existing messages adds a lot of noise to the
PR.
- In many places, I chose to duplicate a lot of code to have a v1
version and a v2 version. I went with this to make it easier to remove
the v1 version in the future (hopefully we can just fully delete most of
the v1 functions).
- `module_subscription_manager.rs` has probably has the biggest changes,
since we now track queries by query_set_id, and we get to remove some
complexity of v1's FormatSwitch.
<!-- Please describe your change, mention any related tickets, and so on
here. -->
# API and ABI breaking changes
The v1 protocol still works, though we won't send the reducer event info
for v10 modules.
# Expected complexity level and risk
4. This touches a lot of places.
# Testing
Unit testing is pretty minimal for the new code paths. I've done some
manual e2e testing with the typescript quickstart, and this has been
tested with a different branch implementing the v2 rust client.
---------
Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
# Description of Changes
Adds a warning prompt for 1.0 -> 2.0 module upgrade path.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
This patch checks a wasm module binary compiled with pre-2.0 bindings
into source control. A smoketest was added that first publishes the the
pre-compiled module and then publishes a new module using the 2.0
bindings in its place.
Implements the client-api changes for the organizations feature.
The interesting part of this PR are the `teams` smoketests.
There is one subtlety with deletion.
Note that I haven't (yet) covered the interaction between collaborators
and organization existing for the same database.
# Description of Changes
This PR shrinks `JsWorkerRequest` so that it is (almost) as small as the
call reducer request.
To do that, a bunch of trivial changes had to be done to auth code, that
mostly revolves around `String` -> `Box<str>`.
This should help the auth code, but that is incidental.
The main goal was to improve throughput through the request tx/rx
channel for V8, which is taking quite a bit of time in flamegraphs.
I also noticed while making this change that the wrong hash map was
being used in a bunch of places, so I fixed all of those.
A follow up PR will shrink the reply side to fit within a cache line.
Yet another follow up PR will change the channel to replace flume with
`fibre::spsc`.
# API and ABI breaking changes
None
# Expected complexity level and risk
2, fairly trivial changes.
# Testing
Covered by existing tests.
# Description of Changes
See tin.
This helps out with throughput for the V8 reply rx/tx channel.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Covered by existing tests.
This is the second step to make in-memory-only databases not touch the
disk at all.
While at it, also make it so file-backed module logs are streamed in
constant memory where possible.
Depends-on: #3912
# Expected complexity level and risk
2
# Testing
Added some unit-level tests.
---------
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
# Description of Changes
Closes#3659
# API and ABI breaking changes
Remove route and alter the semantics of the `call` route on both server
and `cli`
# Expected complexity level and risk
1
# Testing
- [x] Publish module with `procedures` and observe calling the `cli` the
result is print.
---------
Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
# Description of Changes
Fixes a deadlock in the subscription code and HTTP SQL handler that was
caused by calling view methods on the module while holding the
transaction lock.
I tried a couple of approaches to make the closures `Send` for all code
paths that need to hold the transaction while working with views, but
that didn’t work out well. The V8 module communicates with the host
through channels, which would require dynamic dispatch.
In the current approach, all existing methods that were calling views
from the host are now invoked from inside the module itself. In future,
It will be better to move these methods to common place rather than
being scattrered.
# Description of Changes
Fixes https://github.com/clockworklabs/SpacetimeDB/issues/2824.
Defines a global pool `BsatnRowListBuilderPool` which reclaims the
buffers of a `ServerMessage<BsatnFormat>` and which is then used when
building new `ServerMessage<BsatnFormat>`s.
Notes:
1. The new pool `BsatnRowListBuilderPool` reports the same kind of
metrics to prometheus as `PagePool` does.
2. `BsatnRowListBuilder` now works in terms of `BytesMut`.
3. The trait method `fn to_bsatn_extend` is redefined to be capable of
dealing with `BytesMut` as well as `Vec<u8>`.
4. A trait `ConsumeEachBuffer` is defined from
`ServerMessage<BsatnFormat>` and down to extract buffers.
`<ServerMessage<_> as ConsumeEachBuffer>::consume_each_buffer(...)` is
then called in `messages::serialize(...)` just after bsatn-encoding the
entire message and before any compression is done. This is the place
where the pool reclaims buffers.
# Benchmarks
Benchmark numbers vs. master using `cargo bench --bench subscription --
--baseline subs` on i7-7700K, 64GB RAM:
```
footprint-scan time: [21.607 ms 21.873 ms 22.187 ms]
change: [-62.090% -61.438% -60.787%] (p = 0.00 < 0.05)
Performance has improved.
full-scan time: [22.185 ms 22.245 ms 22.324 ms]
change: [-36.884% -36.497% -36.166%] (p = 0.00 < 0.05)
Performance has improved.
```
The improvements in `footprint-scan` are mostly thanks to
https://github.com/clockworklabs/SpacetimeDB/pull/2918, but 7 ms of the
improvements here are thanks to the pool. The improvements to
`full-scan` should be only thanks to the pool.
# API and ABI breaking changes
None
# Expected complexity level and risk
2?
# Testing
- Tests for `Pool<T>` also apply to `BsatnRowListBuilderPool`.
Using `#[tokio::test(start_paused = true)]` pauses time, yet tokio will
still advance it when encountering `sleep`s while it has no other work
to do.
This makes the tests that rely on timeouts deterministic and should
prevent those tests from becoming flaky on busy machines.
# Expected complexity level and risk
2
# Testing
This modifies tests.
It does appear to work as described, but it can't hurt if the reviewers
convince themselves that it does indeed.
Mainly a smoketest to exercise the intended behaviour. Also return an
error if we end up delegating to the reset database endpoint, which
itself doesn't accept a `parent` parameter.
RFC 6455, Section 5.4 describes message fragmentation, and we can do
that with tungstenite.
It does seem to help getting control messages (ping, pong, close)
through without head-of-line blocking.
# Expected complexity level and risk
2 - Need to test with clients
# Testing
TBD - some more abstraction is needed due to the difficulty of
synthetically producing a large outgoing message.
# Description of Changes
Title.
Closes#3644 .
# API and ABI breaking changes
Moves an unreleased HTTP route to make it explicitly unstable.
# Expected complexity level and risk
1
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Used `curl` to hit the previous, not-unstable-looking route, got a
404.
- [x] Used `curl` to hit the new, obviously-unstable route, got a proper
response.
# Description of Changes
This modifies the environment variable that we pass in for requiring
SpacetimeAuth to publish so that we can put different issuer values for
live and staging.
# API and ABI breaking changes
This requires a new environment variable or it skips this requirement.
# Expected complexity level and risk
1
# Testing
I have not tested this change locally.
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
# Description of Changes
Due to a limitation around passing headers to a WebSocket connection,
The typescript SDK rely on the endpoint `/v1/identity/websocket-token`
to get a new, short-lived token.
Currently, this endpoint strips all the other claims from the token and
only returns the following claims:
- `hex_identity`
- `sub`
- `iss`
- `aud`
- `iat`
- `exp`
This PR aims to fix this issue by introducing a new member field `extra`
to `SpacetimeIdentityClaims` and `TokenClaims` and letting serde do its
job.
# API and ABI breaking changes
None
# Expected complexity level and risk
2 - The change is trivial (1) but I'm not 100% familiar with all the
places where we would be signing a token (1).
# Testing
1. `curl` the endpoint and checking that the token returned contains all
the expected claims
2. Check that that the endpoint `v1/identity` still correctly issues and
identity and token
---------
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
# Description of Changes
Adds `ProcedureContext::{with_tx, try_with_tx}`.
Fixes https://github.com/clockworklabs/SpacetimeDB/issues/3515.
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
An integration test `test_calling_with_tx` is added.
# Description of Changes
This PR modifies the `--delete-data` flag on `spacetime publish` and
adds the `--delete-data` flag on `spacetime dev`.
In particular instead of `--delete-data` being a boolean, it is now a an
enum:
- `always` -> corresponds to the old value of `true`
- `never` -> corresponds to the old value of `false`
- `on-conflict` -> clears the database, but only if publishing would
have required a manual migration
This flag does NOT change any behavior about prompting users to confirm
if they want to delete the data. Users will still be prompted to confirm
UNLESS they pass the separate `--yes` flag.
`spacetime dev` gets the same `--delete-data` flag. The default value of
`never` is equivalent to the existing behavior. `spacetime dev`
continues to publish with `--yes` just as before. This behavior is
unchanged.
# API and ABI breaking changes
Adds the flags specified above. This is NOT a breaking change to the
CLI. Passing `--delete-data` is the equivalent of
`--delete-data=always`.
This IS technically a breaking change to the `pre_publish` route. As far
as I'm aware this is *only* used by our CLI however.
> IMPORTANT SIDE NOTE: I would argue that `--break-clients` should
really be renamed to `--yes-break-clients` because it actually behaves
like the `--yes` force flag, but only for a subset of the user prompts.
I have not made this change because it would be a breaking change, but
if the reviewers agree, I will make this change.
# Expected complexity level and risk
2, Very small change, but if we get it wrong users could accidentally
lose data. I would ask reviewers to think about ways that users might
accidentally pass `--delete-data --yes`.
# Testing
- [ ] I have not yet tested manually.
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
Historically, controldb reads could be treated as just a datastructure,
but that became a lie when reconnections were introduced.
Some tricks were employed to enter the async context when needed, but
those always bear the risk of introducing a deadlock somewhere.
So, just make it async.
# Expected complexity level and risk
2
It may or may not be safe to use sled in an async context. We did
already for
the write path, but if it's a problem it'll show now.
# Testing
Not a functional change.
Introduces a "routes struct" for the `/identity` endpoints, much like
the `DatabaseRoutes`.
This is useful for overriding individual handlers.
See companion for motivation.
Depends-on: #3525
Permissions for evaluating SQL/DML are not generally "actions", but more
a set of permissions that are checked during evaluation.
To make this work with the teams feature, this patch extends `AuthCtx`
to allow checking a set of permissions as mandated by the spec. This set
is a bit more fine-grained than "is owner", so as to avoid baking in the
concept of teams/collaborators, or assumptions about what a role might
entail. Both are likely to evolve in the future, so evaluation of
permissions / capabilities should be confined to the impl of the
`Authorization` trait.
Unlike "actions", the `AuthCtx` must be able to evaluate permission
checks quickly and without side-effects, nor can it enter an `async`
context. In that sense, it is precomputed (if you will), and stored as a
closure in the `AuthCtx` for external authorization.
A challenge posed is how to thread through the constructed `AuthCtx` for
subscriptions.
A tempting approach would have been to equip the `HostController` with
the ability to summon an `AuthCtx`. That, however, would have created a
gnarly circular dependency, because the `HostController` also controls
the controldb, which itself demands an `AuthCtx`.
Instead, the `AuthCtx` is obtained in the endpoint handler and passed to
each method call that requires one. That's less pretty, but more
effective.
---------
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>