# Description of Changes
While testing #4663, I discovered the server would crash from a V8 out
of memory error after processing many requests. Before #4663, this would
not happen. I theorized that because we now have a single JS worker that
can process an unbounded number of reducer calls over its lifetime, any
V8 heap retention that would previously have been spread across several
pooled isolates now accumulates in one isolate.
This patch now periodically collects heap statistics and forces GC or
replaces the isolate if memory cannot be reclaimed. This greatly reduces
the risk of hitting the V8 heap limit and crashing the server. It
doesn't remove the risk entirely however. But this risk was still
present before we switched to a single worker model in #4663. In order
to remove the risk of crashing the server entirely, we would need to run
V8 in a separate process.
# API and ABI breaking changes
None
# Expected complexity level and risk
3
# Testing
TBD
# Description of Changes
Update the Default casing policy to `snake_case` for `RawModuleDefV10`.
Messy PR contains changes at different places, so that CI can pass:
Here are the main changes as follows:
- `bindings-macro` & `bindings` crate: `name` macro in Indexes for
canonical name and supply it to `RawModuleDefV10` via `ExplicitNames`.
- `bindings-typescript`:
- Changes has been reviewed through this PR -
https://github.com/clockworklabs/SpacetimeDB/pull/4308.
- `binding-csharp`: a single line change to pass `sourceName` of index
instead of null.
- `codegen`:
- Changes has been merged from branch -
https://github.com/clockworklabs/SpacetimeDB/pull/4337.
- Except a fix in rust codegen to use canonical name in Query buillder
instead of accessor.
- `lib/db/raw_def`: Extends `RawDefModuleV10` structure to support case
conversion.
- `schema` crate:
- `validate/v9` - Nothing itself should change or changes in v9
validation logic but the file contains a `CoreValidator` which is shared
with `validate/v10`. No test have t be updated to `validate/v9` which
ensures we aren't regressing it.
- `validate/v10`: This is the main meat, look at the new tests added in
bottom to understand what it does.
- Rest of the files are either test updates or module bindings.
## Testing:
1. Extensive unit tests have been added to verify generated `ModuleDef`
is correct.
2. I have done some e2e testing to verify rust codegen with rust and
typescript modules.
3. I would have like to do more testing for other codegens , I am
continue doing .
I have removed `sql.py` smoketest, as that seems to be already migated
in new framework and was headache to update.
## Expected complexity level and risk
4, It could have side-effect which aren't easily visible.
- - -
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: rekhoff <r.ekhoff@clockworklabs.io>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <bot@clockworklabs.com>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
Co-authored-by: Noa <coolreader18@gmail.com>
Co-authored-by: = <cloutiertyler@gmail.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@clockworklabs.io>
Co-authored-by: Jason Larabie <jason@clockworklabs.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
# Description of Changes
This PR implements support for the `spacetime.json` configuration file
that can be used to set up common `generate` and `publish` targets. An
example of `spacetime.json` could look like this:
```
{
"dev_run": "pnpm dev",
"generate": [
{ "out-dir": "./foobar", "module-path": "region-module", "language": "c-sharp" },
{ "out-dir": "./global", "module-path": "global-module", "language": "c-sharp" },
],
"publish": {
"database": "bitcraft",
"module-path": "spacetimedb",
"server": "local",
"children": [
{ "database": "region-1", "module-path": "region-module", server: "local" },
{ "database": "region-2", "module-path": "region-module", server: "local" }
]
}
}
```
With this config, running `spacetime generate` without any arguments
would generate bindings for two targets: `region-module` and
`global-module`. `spacetime publish` without any arguments would publish
three modules, starting from the parent: `bitcraft`, `region-1`, and
`region-2`. On top of that, the command `pnpm dev` would be executed
when using `spacetime dev`.
It is also possible to pass additional command line arguments when
calling the `publish` and `generate` commands, but there are certain
limitations. There is a special case when passing either a module path
to generate or a module name to publish. Doing that will filter out
entries in the config file that do not match. For example, running:
```
spacetime generate --project-path global-module
```
would only generate bindings for the second entry in the `generate`
list.
In a similar fashion, running:
```
spacetime publish region-1
```
would only publish the child database with the name `region-1`
Passing other existing arguments is also possible, but not all of the
arguments are available for multiple configs. For example, when running
`spacetime publish --server maincloud`, the publish command would be
applied to all of the modules listed in the config file, but the
`server` value from the command line arguments would take precedence.
Running with arguments like `--bin-path` would, however, would throw an
error as `--bin-path` makes sense only in a context of a specific
module, thus this wouldn't work: `spacetime publish --bin-path
spacetimedb/target/debug/bitcraft.wasm`. I will throw an error unless
there is only one entry to process, thus `spacetime publish --bin-path
spacetimedb/target/debug/bitcraft.wasm bitcraft` would work, as it
filters the publish targets to one entry.
# API and ABI breaking changes
None
# Expected complexity level and risk
3
The config file in itself is not overly complex, but when coupled with
the CLI it is somewhat tricky to get right. There are also some changes
that I had to make to how clap arguments are validated - because the
values can now come from both the config file and the clap config, we
can't use some of the built-in validations like `required`, or at least
I haven't found a clean way to do so.
# Testing
I've added some automated tests, but more tests and manual testing is
coming.
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: bradleyshep <148254416+bradleyshep@users.noreply.github.com>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: = <cloutiertyler@gmail.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
# Description of Changes
Update the Rust client SDK to use the new V2 WebSocket format, and
present the V2 user-facing API.
## Reducer events
### Remove on-reducer callbacks
It's no longer possible to observe reducers called by other clients by
registering callbacks with `ctx.reducers.on_{my_reducer}`. We no longer
code-generate those methods, or the associated
`ctx.reducers.remove_on_{my_reducer}`. Internal plumbing for storing and
invoking those callbacks is also removed.
### Add specific reducer invocation callbacks
In addition to the previous way to invoke reducers,
`ctx.reducers.{my_reducer}(args...)`, we add a method that registers a
callback to run after the reducer is finished. This method has the
suffix `_then`, as in `ctx.reducers.{my_reducer}_then(args...,
callback)`.
The callback will accept two arguments:
- `ctx: &ReducerEventContext`, the same context as was previously passed
to on-reducer callbacks.
- `status: Result<Result<(), String>, InternalError>`, denoting the
outcome of the reducer.
- `Ok(Ok(())` means the reducer committed. This corresponds to
`ReducerOutcome::Ok` or `ReducerOutcome::Okmpty` in the new WS format.
- `Ok(Err(message))` means the reducer returned an "expected" or "user"
error. This corresponds to `ReducerOutcome::Err` in the new WS format.
- `Err(internal_error)` means something went wrong with host execution.
This corresponds to `ReducerOutcome::InternalError` in the new WS
format.
Internally, the SDK stores the callbacks in its `ReducerCallbacks` map.
This is keyed on `request_id: u32`, a number that is generated for each
reducer call (from an `AtomicU32` that we increment each time), and
included in the `ClientMessage::CallReducer` request. The
`ServerMessage::ReducerResult` includes the same `request_id`, so the
SDK pops out of the `ReducerCallbacks` and invokes the appropriate
callback when processing that message.
These new callbacks are very similar to the existing procedure
callbacks.
### The `Event` exposed to row callbacks
Row callbacks caused by a reducer invoked by this client will see
`Event::Reducer`, the same as they would prior to this PR. These
callbacks will be the result of a `ServerMessage::ReducerResult` with
`ReducerOutcome::Ok`. In order to expose the reducer name and arguments
to this event, the client stores them in its `ReducerCallbacks` map,
alongside the callback for when the reducer is complete.
Row callbacks caused by any other reducer, or any non-reducer
transaction, are now indistinguishable to the client. These will see
`Event::Transaction`, which is renamed from the old
`Event::UnknownTransaction`.
### Less metadata in `ReducerEvent`
Some metadata is removed from `ReducerEvent`, as the V2 WebSocket format
no longer publishes it, even to the caller.
## `CallReducerFlags` are removed
All machinery for setting, storing and applying call reducer flags is
removed from the SDK, as the new WS format does not have any non-default
flags.
## Requesting rows in unsubscribe
When sending a `ClientMessage::Unsubscribe`, we always request that the
server include the matching rows in its response
`ServerMessage::UnsubscribeApplied`. This saves us having to update the
SDK to store query sets separately, at least for now. (We'll do that
later.)
## Handling rows
The new SDK does some additional parsing to wrangle rows in the new
WebSocket format into the same internal data structures as before,
rather than re-writing the client cache. (We'll do that later.)
Specifically, parsing of `DbUpdate` is changed so that:
- We parse raw `TransactionUpdate` into the generated `DbUpdate` type,
which requires an additional loop compared to the previous version, to
cope with the new WS format's dividing updates by query set. We define a
function `transaction_update_iter_table_updates` which encapsulates this
nested loop in an iterator.
- We have two new functions for parsing raw `QueryRows` into the
generated `DbUpdate` type, one for when they come from a
`SubscribeApplied`, and the other when they come from an
`UnsubscribeApplied`. `QueryRows` from `SubscribeApplied` translate to a
`DbUpdate` of all inserts, while one from `UnsubscribeApplied` will be
all deletes.
## Legacy subscriptions
"Legacy subscriptions" are removed. These were only used for
`subscribe_to_all_tables`, which as of now is stubbed. I will follow up
with a change to re-implement `subscribe_to_all_tables` by
code-generating a list of all known tables, and having it subscribe to
`select * from {table}` for every table in that list.
## `subscribe_to_all_tables` via a list
Previously, `subscribe_to_all_tables` worked by sending a legacy
subscription with the query `SELECT * FROM *`, which the host had
special handling to expand to subscribing to all tables. As legacy
subscriptions are no longer usable in V2 clients, this can't work.
Instead, we code-generate `SpacetimeModule::ALL_TABLE_NAMES`, a list of
all the known table names. `subscribe_to_all_tables` then maps across
this list to construct a list of queries in the form `SELECT * FROM
{table_name}`, and subscribes to all of those queries. This has the
upside that defining a new table in the module without regenerating
client bindings will no longer result in the client seeing rows of
tables it does not know about and cannot parse.
## Light mode removed
Light mode is no longer meaningful in the V2 WS format, so all code
related to it is removed.
## Internal changes
### Renamed WS messages
The SDK's internal code is updated to account for various renames:
- `QueryId` -> `QuerySetId`, `query_id` -> `query_set_id`.
- `SubscribeMulti` -> `Subscribe`, `UnsubscribeMulti` -> `Unsubscribe`.
## Incidental changes in this PR, not necessary for other client SDKs
### Don't filter out empty ranges in `RowSizeHint`
The Rust implementation of `RowSizeHint` in `BsatnRowList` got regressed
in the base branch to not work with zero-sized rows. This change fixes
that.
# API and ABI breaking changes
Boy howdy is it!
# Expected complexity level and risk
3? Changes ended up being less complicated than I feared, but we do have
some fiddly code here, and we have internal dependencies on the SDK.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Updated automated test suite.
- Known failures:
- [ ] `subscribe_all_select_star`, which is currently broken because
it's trying to subscribe to rows from private tables. #4241 will fix
this.
---------
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
Co-authored-by: = <cloutiertyler@gmail.com>
# Description of Changes
This adds the v2 websocket protocol and adds support on the server side.
For context on many of the changes/decisions, you can look at the
discussion on https://github.com/clockworklabs/SpacetimeDB/pull/4023.
To restate some of the key changes:
- The reducer event information is no longer sent with transaction
updates (because we don't want to broadcast reducer call information
anymore).
- If a client calls a reducer, they are sent a `ReducerResult` which
includes the outcome of the reducer call and and related row updates for
queries that the client is subscribed to.
- We no longer dedupe queries that appear in multiple query sets for the
same client. This is because we are moving toward per-query storage.
- Related to that, Unsubscribe requests have an option to send the
related rows. We need this for now, since clients don't have per-query
storage implemented yet.
- We don't have the json format in v2.
Notes for reviewers:
- This moves around the messages in
`crates/client-api-messages/src/websocket` (into `common`, `v1`, and
`v2`), and this renaming of existing messages adds a lot of noise to the
PR.
- In many places, I chose to duplicate a lot of code to have a v1
version and a v2 version. I went with this to make it easier to remove
the v1 version in the future (hopefully we can just fully delete most of
the v1 functions).
- `module_subscription_manager.rs` has probably has the biggest changes,
since we now track queries by query_set_id, and we get to remove some
complexity of v1's FormatSwitch.
<!-- Please describe your change, mention any related tickets, and so on
here. -->
# API and ABI breaking changes
The v1 protocol still works, though we won't send the reducer event info
for v10 modules.
# Expected complexity level and risk
4. This touches a lot of places.
# Testing
Unit testing is pretty minimal for the new code paths. I've done some
manual e2e testing with the typescript quickstart, and this has been
tested with a different branch implementing the v2 rust client.
---------
Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
# Description of Changes
Blocks procedures from requesting private ip ranges after dns
resolution.
Adds a new cargo feature to `spacetimedb-standalone` permitting loopback
http requests in test environments only.
# API and ABI breaking changes
None
# Expected complexity level and risk
2. I may have missed a range.
# Testing
- [x] Unit tests for IP address matching
- [x] Smoketests for blocking a private IP address
# Description of Changes
This adds C++ server bindings (/crate/bindings-cpp) to allow writing C++
20 modules.
- Emscripten WASM build system integration with CMake
- Macro-based code generation (SPACETIMEDB_TABLE, SPACETIMEDB_REDUCER,
etc)
- All SpacetimeDB types supported (primitives, Timestamp, Identity,
Uuid, etc)
- Product types via SPACETIMEDB_STRUCT
- Sum types via SPACETIMEDB_ENUM
- Constraints marked with FIELD* macros
# API and ABI breaking changes
None
# Expected complexity level and risk
2 - Doesn't heavily impact any other areas but is complex macro C++
structure to support a similar developer experience, did have a small
impact on init command
# Testing
- [x] modules/module-test-cpp - heavily tested every reducer
- [x] modules/benchmarks-cpp - tested through the standalone (~6x faster
than C#, ~6x slower than Rust)
- [x] modules/sdk-test-cpp
- [x] modules/sdk-test-procedure-cpp
- [x] modules/sdk-test-view-cpp
- [x] Wrote several test modules myself
- [x] Quickstart smoketest [Currently in progress]
- [ ] Write Blackholio C++ server module
---------
Signed-off-by: Jason Larabie <jason@clockworklabs.io>
Co-authored-by: clockwork-labs-bot <clockwork-labs-bot@users.noreply.github.com>
Co-authored-by: Ryan <r.ekhoff@clockworklabs.io>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
# Description of Changes
This got a little bigger than I had hoped, but I think it's still pretty
manageable. This PR partially reverts back to before #3263: cores are
re-balanced with a `watch::Receiver<CoreId>` that each database thread
will listen for updates on in order to repin itself, and multiple OS
threads (each matched to a database) can be pinned to one core. As I
understand it, that second part is something Phoebe was trying to avoid,
but given that there's no way to asyncify a JS module, it's kind of
necessary.
JS is single-threaded, and uses cooperative rather than preemptive
multitasking (callbacks/async, not green threads). That means that if a
JS function has an infinite loop, no other event handlers would be able
to run unless that loop were to exit. Coupled with the fact that we
can't `Send` a v8 isolate across threads, it makes more sense to keep
the module host on one thread and repin that thread as needed. An
alternative option, as was brought up, would be to deconstruct and
reconstruct the module onto a different thread when needed, since
load-balancing won't be happening often anyway.
# Expected complexity level and risk
3 - reworks the threadpool that databases run on, and so could lead to
deadlocks or other concurrency bugs. However, that seems unlikely, since
this separates databases each onto their own thread, and as such
decreases the likelihood of them interacting poorly with each other.
# Testing
Not sure if there's anything specific I should do, since this doesn't
change behavior.
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
---------
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Implements the client-api changes for the organizations feature.
The interesting part of this PR are the `teams` smoketests.
There is one subtlety with deletion.
Note that I haven't (yet) covered the interaction between collaborators
and organization existing for the same database.
# Description of Changes
Client Query builder for rust, as per proposal -
https://github.com/clockworklabs/SpacetimeDBPrivate/pull/2356.
1. Pach moves query builder to its separate crate, so that it can be
shared between module and sdk.
2. Implements `TypedSubscriptionBuilder` in `sdks/rust` as mentioned in
proposal
3. Modify codegen to extend types to support query builder as mentioned
in proposal
4. a test
# API and ABI breaking changes
NA, additive changes.
# Expected complexity level and risk
2
# Testing
Added a test.
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
This is the second step to make in-memory-only databases not touch the
disk at all.
While at it, also make it so file-backed module logs are streamed in
constant memory where possible.
Depends-on: #3912
# Expected complexity level and risk
2
# Testing
Added some unit-level tests.
---------
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
# Description of Changes
Fixes https://github.com/clockworklabs/SpacetimeDB/issues/3729
I genuinely don't know what came over me.
# API and ABI breaking changes
None
# Expected complexity level and risk
1.5 very straightforward but not strictly trivial
# Testing
Adds automated integration tests (written in Rust and run with `cargo
test`, although note this comment from @matklad about integration tests
for the future
https://internals.rust-lang.org/t/running-test-crates-in-parallel/15639/2):
- [x] Can publish an updated module if no migration is required
- [x] Can publish an updated module if auto-migration is required (with
the yes-break flag true/false)
- [x] Cannot publish if a manual migration is required
- [x] Can publish if a manual migration is required but the user
specified `--delete-data`
- [x] Can publish if a manual migration is required by the user
specified `--delete-data=on-conflict`
- [x] No data deletion occurs if no migration is required and
`--delete-data=on-conflict` is specified
---------
Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
# Description of Changes
Adds `ProcedureContext::{with_tx, try_with_tx}`.
Fixes https://github.com/clockworklabs/SpacetimeDB/issues/3515.
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
An integration test `test_calling_with_tx` is added.
Historically, controldb reads could be treated as just a datastructure,
but that became a lie when reconnections were introduced.
Some tricks were employed to enter the async context when needed, but
those always bear the risk of introducing a deadlock somewhere.
So, just make it async.
# Expected complexity level and risk
2
It may or may not be safe to use sled in an async context. We did
already for
the write path, but if it's a problem it'll show now.
# Testing
Not a functional change.
So far, the `--clear-database` option to `publish` has simply dropped
and then re-created the database (if it did exist).
This will no longer work when databases can have "children": because
dropping and re-creating is not atomic, children would either become
orphans, or be dropped as well.
To solve this, `reset_database` is introduced as a separate action that:
- shuts down all replicas
- if a `program_bytes` is supplied, replaces the database's initial
program
- if a `host_type` is supplied, replaces the database's host type
- starts `num_replicas` or the previous number of replicas, which
initialize themselves as normal
As this could be its own CLI command, the action is provided as its own
API endpoint (undocumented).
However, since `publish` has no way of knowing whether the database it
operates on actually exists, the `publish_database` handler will just
invoke the `reset_database` handler if `clear=true` and the database
exists, and return its result. This is to avoid starting the transfer of
the program in the request body, only to receive a redirect.
Some refactoring was necessary to dissect and understand the flow.
# API and ABI breaking changes
Introduces a new, undocumented API endpoint.
We may want to nest it under `/unstable`.
# Expected complexity level and risk
2
# Testing
From the outside, the observed behavior should be as before,
so smoketests should cover it.
# Description of Changes
This reverts commit #3496.
# API and ABI breaking changes
Technically maybe yes? But definitely nothing is using the new code yet.
# Expected complexity level and risk
1
# Testing
CI only
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
So far, the `--clear-database` option to `publish` has simply dropped
and then re-created the database (if it did exist).
This will no longer work when databases can have "children": because
dropping and re-creating is not atomic, children would either become
orphans, or be dropped as well.
To solve this, `reset_database` is introduced as a separate action that:
- shuts down all replicas
- if a `program_bytes` is supplied, replaces the database's initial
program
- if a `host_type` is supplied, replaces the database's host type
- starts `num_replicas` or the previous number of replicas, which
initialize themselves as normal
As this could be its own CLI command, the action is provided as its own
API endpoint (undocumented).
However, since `publish` has no way of knowing whether the database it
operates on actually exists, the `publish_database` handler will just
invoke the `reset_database` handler if `clear=true` and the database
exists, and return its result. This is to avoid starting the transfer of
the program in the request body, only to receive a redirect.
Some refactoring was necessary to dissect and understand the flow.
# API and ABI breaking changes
Introduces a new, undocumented API endpoint.
We may want to nest it under `/unstable`.
# Expected complexity level and risk
2
# Testing
From the outside, the observed behavior should be as before,
so smoketests should cover it.
# Description of Changes
This commit builds support for executing procedures in WASM modules.
This includes an HTTP endpoint,
`/v1/database/:name_or_address/procedure/:name POST`, as well as an
extension to the WS protocol. These new APIs are not wired up to the CLI
or SDKs, but I have manually tested the HTTP endpoint via `curl`. The
new WS extensions are completely untested.
Several TODOs are scattered throughout the new code, most notably for
sensibly tracking procedure execution time in the metrics.
I also expect that we will want to remove the `procedure_sleep_until`
syscall and the `ProcedureContext::sleep_until` method prior to release.
# API and ABI breaking changes
Adds new APIs and ABIs.
# Expected complexity level and risk
3? 4? Unlikely to break existing stuff, 'cause it's mostly additive, but
adds plenty of potentially-fragile new stuff. Notably is the first time
we're doing anything actually `async`hronous on a database core Tokio
worker, and we don't yet have strong evidence of how that will affect
reducer execution.
# Testing
- [x] Manually published `modules/module-test` and executed procedures
with the following `curl` invocations:
- `curl -X POST -H "Content-Type:application/json" -d '[]'
http://localhost:3000/v1/database/module-test/procedure/sleep_one_second`
- `curl -X POST -H "Content-Type:application/json" -d '[1223]'
http://localhost:3000/v1/database/module-test/procedure/return_value`
- [ ] Need to write automated tests.
---------
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
# Description of Changes
Adds `modules/benchmarks-ts`.
# API and ABI breaking changes
None
# Expected complexity level and risk
2?
# Testing
This is a test.
---------
Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Noa <coolreader18@gmail.com>
# Description of Changes
This commit extends various schema and schema-adjacent structures to
describe procedures, a new kind of database function which are allowed
to perform side effects.
This includes extending `RawModuleDefV9` with a way to register
`RawProcedureDefV9`s in the `misc_exports`, preserving compatibility
with modules that predate procedures.
The module validation path is reorganized somewhat to validate various
properties related to procedures while preserving code clarity and
maintainability.
Additionally, the `ArgsTuple` machinery for ser/de-ing reducer arguments
using the argument type as a seed is extended to also support procedure
arguments.
All of this is currently unused.
# API and ABI breaking changes
Additive and backwards-compatible additions to `RawModuleDefV9` and
friends.
# Expected complexity level and risk
2 - some minor complexity in schema validation which may have gotten
borked in a merge at some point.
# Testing
Unsure what tests would be useful, open to suggestions from reviewers.
- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
# Description of Changes
Currently based on #3361
Implements most of the TS module API (not yet a function for type
aliases).
# Expected complexity level and risk
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Extremely basic module stuff works
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
---------
Signed-off-by: Noa <coolreader18@gmail.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@aol.com>
Co-authored-by: = <cloutiertyler@gmail.com>
Co-authored-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
In service of adding procedures, which will need to execute WASM code in
an `async` environment so as to suspend execution while making HTTP
requests and whatnot.
Prior to this PR, `JobCores` worked by spawning an OS thread for each
database. These threads would then each be pinned to a specific core,
and in multi-tenant deployments multiple threads could be pinned to the
same core. Now, instead, we spawn one thread per available core at
startup. Each of these threads runs a single-threaded Tokio executor.
Each database is assigned to one of these executors, and runs tasks on
it via `tokio::spawn`.
When we run without core pinning (usually due to having too few hardware
cores), we won't spawn any additional threads or Tokio runtimes at all;
instead we will run database jobs on the "global" Tokio executor. These
jobs may block Tokio worker threads, which might be an issue if a very
core-constrained device runs multiple databases with very long-running
reducers. If this is an issue, we could in this case instead build a
second Tokio runtime only for running database jobs, and let the OS
scheduler figure things out like it did previously.
Previously, we implemented load-balancing among the database cores by
occasionally instructing per-database threads to re-pin themselves. Now,
we instead periodically send the database a new
`wasmtime::runtime::Handle`, which they will `spawn` future jobs into.
Previously, it was possible for a database thread to become canceled,
most likely as a result of `ModuleHost::exit`, after which calls would
fail with `NoSuchModule`. Cancellation is no longer meaningful, as the
database holds a `Handle` to a long-lived `tokio::runtime::Runtime`,
which should always outlive the `ModuleHost`. I have added an
`AtomicBool` flag to `ModuleHost` which is flipped by `exit` and checked
by calls to maintain the previous functionality.
Within this PR, the jobs run on the database-execution Tokio tasks are
not actually asynchronous; they will never yield. This is important
because these jobs may (will) hold a transaction open, and attempting to
swap execution to another job which wants a transaction on the same
database would be undesirable.
Note that this may regress our multi-tenant performance / fairness:
previously, in multi-tenant environments, the OS scheduler would divide
the database cores' time between the per-database threads, potentially
causing one high-load database to be interrupted in the middle of a
reducer in order to run other databases pinned to the same core. Now, a
high-load database will instead run its entire reducer to completion
before any other database gets to run.
We could, in the future, change this by instructing Wasmtime to yield
periodically, either via [epoch
interruption](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.epoch_deadline_async_yield_and_update)
or
[fuel](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.fuel_async_yield_interval),
both of which we're already configuring Wasmtime to track. We'd need (or
at least want) to (re-)introduce a queue s.t. we only attempt to run one
job for each database at a time. I have chosen not to do so within this
patch because I felt the changeset was complex enough already, and we
have so far not treated fairness in multi-tenant environments as a high
priority.
I have also reworked our module host machinery to no longer use dynamic
dispatch and trait polymorphism to manage modules and their instances,
and instead introduced `enum Module` and `enum Instance`, each of which
has a variant for Wasm and another for V8.
During this rewrite, I reworked `AutoReplacingModuleInstance`, which
previously used type-erased trait generics in a way that was brittle and
hard to re-use in the new `async` context. (Specifically, the module
instance no longer lives on the job thread, rather, the database grabs
the instance and sends it to the job thread, then gets it back when the
job exits. This is necessary to allow the re-worked load balancing
described above, as we can't have a single long-lived async task.) While
refactoring, I replaced it with `ModuleInstanceManager`, which can now
maintain multiple instances of the same module. This is not yet useful,
but will become necessary with procedures, as each concurrent procedure
will need its own instance. Relatedly, I changed
`ModuleHost::on_module_thread` (used by one-off and initial subscription
queries) to no longer acquire the/an instance. I discussed this with the
team, and consensus was that "locking" the module instance in that path
was not a useful behavior, just an artifact of the previous
implementation.
I have also switched our Wasmtime configuration to set
`async_support(true)`. This causes a variety of methods, notably
`InstancePre::instantiate` and `TypedFunc::call`, to panic, and requires
that we instead call their `_async` variants. As mentioned above, I have
not yet introduced any actual asynchronicity or concurrency, so these
methods should never yield. Rather than `.await`ing their futures, I
have defined a degenerate `async` executor, `poll_once_executor`, which
polls a future exactly once, failing if it does not return
`Poll::Ready`. This means that we will panic if one of these futures
returns `Poll::Pending` unexpectedly.
The previous `trait Module` had a method `initial_instances`. `Module`
is now a concrete type, and I gave it this method, but it appears to be
unused. This is causing lints to fail. I am unsure what, if anything,
that method was for.
The previous `AutoReplacingModuleInstance` called `create_instance` on
the job thread. I am unsure if this was intentional, or just an artifact
of the previous implementation, where the `AutoReplacingModuleInstance`
lived on the job thread. I have written the new `ModuleInstanceManager`
to call `create_instance` on the calling thread, but it would be easy to
move that call into the job executor if that behavior is desired.
# API and ABI breaking changes
None user-facing
# Expected complexity level and risk
4: significant rewrite of performance-sensitive fiddly concurrency code.
Note specifically in above description:
- Running database jobs on the global Tokio runtime when not using core
pinning.
- Multi-tenant fairness issue: no longer possible to interrupt a
performance-intensive database mid-reducer to run another database
pinned to the same core.
- Unused method `module_instances`.
- Running `create_instance` on the calling thread rather than the
database thread.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Will arrange for a bot test.
- [ ] Determine to what extent we can run with real or synthetic
multi-tenant load in a test or staging environment.
---------
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
# Description of Changes
Closes#3219
This adds the Unreal SDK, the new Unreal test cases, updates the test
runner to handle Unreal, codegen updates for Unreal, and a QuickStart
Chat.
# API and ABI breaking changes
No breaking changes.
# Expected complexity level and risk
2 - This impacts the subcommand generate.rs to include unrealcpp and
crates/testing to expand for Unreal
# Testing
- [x] Run the new Unreal tests
- [x] Run any previous automation testing - with all the changes to
generate/testing I'm uncertain if there is an impact
- [x] Review the new CLI generate documentation changes
---------
Co-authored-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
Co-authored-by: John Detter <4099508+jdetter@users.noreply.github.com>
# Description of Changes
This adds a new system table to store the jwt payloads of connected
clients. I'm planning to use this system table to expose client claims
to modules in subsequent PRs.
The new table is called `st_connection_credentials`. It is a **private**
system table which stores a mapping from `connection_id` to
`jwt_payload`. Note that a jwt payload is a json representation of the
clients claims, not a fully signed token.
The times when we need to insert and delete these rows closely mirrors
that of the existing `st_client` table, with 1.5 exceptions:
1. We weren't previously inserting to `st_client` until after the
`OnConnect` reducer ran (even though it was in the same transaction). We
want `st_connection_credentials` to be populated before calling the
reducer, so that the reducer can use it get the credentials, so I made a
change to insert to `st_client` and `st_connection_credentials` before
calling the reducer.
2. This difference has not actualized, but when clients start sending
refresh tokens, we will probably need to update the credentials stored
in this table.
This also enforces uniqueness of connection ids. A duplicate connection
id will now make the on-connect reducer fail (since it will violate
uniqueness when trying to insert to `st_connection_credentials`).
# Expected complexity level and risk
2.5
Adding a system table is a bit risky. This is almost rollback safe, with
one annoying case that is worth calling out:
If a database is created with this system table, opening it with an
older version of spacetimedb will only work if there is a snapshot of
the database. If we try to load a table without a snapshot, replaying
will fail on the first row for that table. This is because we don't
write the table schema information to the commit log when creating a
database. In practice, this is unlikely to be an issue, because new
databases asynchronously trigger a snapshot immediately after creation.
Migrating existing databases will be fine. On startup this will detect
that there is a missing system table, and add it in a way that writes it
to the commit log. Since it is in the commit log, we can open the
database with an older version and still understand the data for that
table.
# Testing
There are unit tests that cover opening a database created with an older
version (which doesn't have this table).
I manually tested opening a migrated database with an older version of
spacetimedb.
# Description of Changes
- Adds endpoint for for pretty printing migration plan.
- It also changes current `publish` endpoint to optionally provide
`MigrationToken` and `MigrationPolicy` to allow migration with breaking
clients.
# API and ABI breaking changes
Backward compatible change to existing API and new Api
# Expected complexity level and risk
2
# Testing
- Existing smoketest should cover changes for `publish` endpoint.
- For pretty print endpoint, smoketests can be written only after cli
changes.
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: James Gilles <jameshgilles@gmail.com>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
# Description of Changes
We recently merged several repos together. This PR clarifies the license
terms for several subdirectories, as well as the relationship between
the licenses.
The licenses in our subdirectories have become symbolic links to
licenses in our toplevel `licenses` directory. For any particular
subdirectory's license file in the diff, you can click `... -> View
file` and then click on the text that says "Symbolic Link" on that page.
This will take you to the license file that it links to.
I have also updated the `tools/upgrade-version` script to update the
change date in the new `licenses/BSL.txt` file.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
None. Only changes to license files.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>