# Description of Changes
We've run into a problem on Maincloud caused by a database that was
writing a relatively small number of very large transactions. This was
accruing many commitlog segments consuming hundreds of gigabytes of
disk, but had not ever taken a snapshot, or compressed or archived any
data, as the database had not progressed past one million transactions.
With this PR, we take a snapshot every time the commitlog segment
rotates. We still also snapshot every million transactions.
One BitCraft database we looked at had 2.5 million transactions per
commitlog segment, meaning that this change will not meaningfully affect
the frequency of snapshots. The offending Maincloud database, however,
had only 50 transactions per segment!
# API and ABI breaking changes
N/a
# Expected complexity level and risk
3: Hastily made changes to finnicky code across several crates.
# Testing
I am unsure how to test these changes.
- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
# Description of Changes
See the inline comments for the motivation. This was originally
introduced to our Windows CI in #3351. This PR moves it from CI to
general Windows target builds, since it seems like Windows builds are
now generally having this issue.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [x] Windows CI still passes
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Removes the use of the Derived Data Cache during CI, will in
# API and ABI breaking changes
None
# Expected complexity level and risk
1 - Small change for CI
# Testing
- [x] Re-ran tests on both Linux + Windows with the change
# Description of Changes
- Fixed logic issue around Option<Vec<Option<>>> applying the wrong
types for primitives and dropping Optional
- Fixed an issue with enum vs enum variants wrapped in Option<>
producing the incorrect Unreal type
- Removed unnecessary and incorrect header bindings
- Type fix in the tests around the Optional Int32
# API and ABI breaking changes
No breaking changes
# Expected complexity level and risk
2 - Reworked incorrect optional lookups which can happen recursively
# Testing
I built out a few simple and complex objects in a Rust module to
triple-check possible cases beyond what the test framework calls out.
- [x] Tested many combinations from a Rust module to an Unreal project
- [x] Ran and updated Unreal test cases as necessary
# Description of Changes
`console.log` debugging statements accidentally made it into the
release.
# API and ABI breaking changes
None
# Expected complexity level and risk
1, trivial
# Testing
- [x] Automated testing only
# Description of Changes
Python is funny, if a file `token.py` is created and another script run
on the same dir, it will cause python to block:
```python
python3-3.13.7/lib/python3.13/tokenize.py", line 35, in <module>
from token import *
File "/Users/mamcx/token.py", line 3, in <module>
text = sys.stdin.read()
```
By coincidence the docs on `pg wire` use this name. Changed to one that
don't cause the trouble.
# Expected complexity level and risk
1
# Testing
- [x] Created another script and run it, see it blocks because this...
# Description of Changes
Host-side changes extracted from #3327
I added AUTO_INC_OVERFLOW even though we don't currently ever return it,
in order to future-proof so it's already there when we start emitting
it.
Prepublish was failing because it was expecting a wasm module
unconditionally, so now it takes ?host_type.
I tweaked JS deser to accept null/undefined when the unit type or an
option type is expected.
I switched to bsatn, because the native sats->js translator wasn't
matching what js was expecting.
I renamed the sys module: my thinking is that `spacetime:` as a scheme
will help disambiguate it, and maybe it could also be used for IMC in
the future or something? And I believe we had discussed wanting this to
be versioned, similar to wasm imports.
Trying to get a borrowed str from deserialize_js doesn't work, because
v8 strings don't store utf8.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] All this was done in the course of getting an actual typescript
module to successfully publish.
# Description of Changes
* Make sure the user provides at least one of `--rust-and-cli`,
`--typescript`, or `--csharp`, since providing none of them is a no-op
as of #3308
* Do a semver-parsing of the arg before doing anything, and use that
parsed version everywhere
* Consolidate some version strings that we were computing in a few
places
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [x] Running `cargo bump-versions 1.5.0 --typescript --rust-and-cli
--csharp` only shows a diff in the change dates
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Check that our generated C# files are up-to-date in our CI.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] CI all passes
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
* Small cleanups in `tools/check-diff.sh`
* Use `tools/check-diff.sh` wherever appropriate
* Simplify the `sdks/csharp/tools~/gen-*.sh` files after the repo merge
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] CI still passes
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Necessary for pulling in rolldown.
# API and ABI breaking changes
None
# Expected complexity level and risk
1, with the caveat that this updates the Rust version and therefore
touches all the code.
# Testing
- [ ] Just the automated testing
# Description of Changes
Tweaks V8 module support to use JS modules for `spacetimedb_sys` and the
user module rather than using scripts and the global object.
To this end, the code is also made more modular so that it e.g., cares
less avoid global vs. modules.
An `Object` with the functions is e.g., enough for the lowest level
`call_call_reducer`.
Some `.unwrap()`s are also removed.
Also, `run_timeout_and_cb_every` is disabled, as it currently leads to
UB due to bugs in the `v8` crate.
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
Future work.
# Description of Changes
In the React Integration tutorial
- Replace the staging URL by the production one
- Add useAutoSignin to automatically redirect users to the login page
- Add post_logout_redirect_uri to redirect users back to the application
after login
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
Local test
# Description of Changes
I changed some variables related to caching in the TypeScript test CI,
since it was failing on master due to suspected cache issues.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
- [x] CI passes on this PR when it didn't on master
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Pin the `temporal_rs` and `timezone_provider` versions to `0.0.11`,
because future versions such as `0.0.16` are incompatible, but their
version constraints are not correct so we keep getting
auto-rolled-forward to build-breaking versions.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] CI passes
- [x] If I `rm Cargo.lock && cargo clippy`, the build still passes
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
In service of adding procedures, which will need to execute WASM code in
an `async` environment so as to suspend execution while making HTTP
requests and whatnot.
Prior to this PR, `JobCores` worked by spawning an OS thread for each
database. These threads would then each be pinned to a specific core,
and in multi-tenant deployments multiple threads could be pinned to the
same core. Now, instead, we spawn one thread per available core at
startup. Each of these threads runs a single-threaded Tokio executor.
Each database is assigned to one of these executors, and runs tasks on
it via `tokio::spawn`.
When we run without core pinning (usually due to having too few hardware
cores), we won't spawn any additional threads or Tokio runtimes at all;
instead we will run database jobs on the "global" Tokio executor. These
jobs may block Tokio worker threads, which might be an issue if a very
core-constrained device runs multiple databases with very long-running
reducers. If this is an issue, we could in this case instead build a
second Tokio runtime only for running database jobs, and let the OS
scheduler figure things out like it did previously.
Previously, we implemented load-balancing among the database cores by
occasionally instructing per-database threads to re-pin themselves. Now,
we instead periodically send the database a new
`wasmtime::runtime::Handle`, which they will `spawn` future jobs into.
Previously, it was possible for a database thread to become canceled,
most likely as a result of `ModuleHost::exit`, after which calls would
fail with `NoSuchModule`. Cancellation is no longer meaningful, as the
database holds a `Handle` to a long-lived `tokio::runtime::Runtime`,
which should always outlive the `ModuleHost`. I have added an
`AtomicBool` flag to `ModuleHost` which is flipped by `exit` and checked
by calls to maintain the previous functionality.
Within this PR, the jobs run on the database-execution Tokio tasks are
not actually asynchronous; they will never yield. This is important
because these jobs may (will) hold a transaction open, and attempting to
swap execution to another job which wants a transaction on the same
database would be undesirable.
Note that this may regress our multi-tenant performance / fairness:
previously, in multi-tenant environments, the OS scheduler would divide
the database cores' time between the per-database threads, potentially
causing one high-load database to be interrupted in the middle of a
reducer in order to run other databases pinned to the same core. Now, a
high-load database will instead run its entire reducer to completion
before any other database gets to run.
We could, in the future, change this by instructing Wasmtime to yield
periodically, either via [epoch
interruption](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.epoch_deadline_async_yield_and_update)
or
[fuel](https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#method.fuel_async_yield_interval),
both of which we're already configuring Wasmtime to track. We'd need (or
at least want) to (re-)introduce a queue s.t. we only attempt to run one
job for each database at a time. I have chosen not to do so within this
patch because I felt the changeset was complex enough already, and we
have so far not treated fairness in multi-tenant environments as a high
priority.
I have also reworked our module host machinery to no longer use dynamic
dispatch and trait polymorphism to manage modules and their instances,
and instead introduced `enum Module` and `enum Instance`, each of which
has a variant for Wasm and another for V8.
During this rewrite, I reworked `AutoReplacingModuleInstance`, which
previously used type-erased trait generics in a way that was brittle and
hard to re-use in the new `async` context. (Specifically, the module
instance no longer lives on the job thread, rather, the database grabs
the instance and sends it to the job thread, then gets it back when the
job exits. This is necessary to allow the re-worked load balancing
described above, as we can't have a single long-lived async task.) While
refactoring, I replaced it with `ModuleInstanceManager`, which can now
maintain multiple instances of the same module. This is not yet useful,
but will become necessary with procedures, as each concurrent procedure
will need its own instance. Relatedly, I changed
`ModuleHost::on_module_thread` (used by one-off and initial subscription
queries) to no longer acquire the/an instance. I discussed this with the
team, and consensus was that "locking" the module instance in that path
was not a useful behavior, just an artifact of the previous
implementation.
I have also switched our Wasmtime configuration to set
`async_support(true)`. This causes a variety of methods, notably
`InstancePre::instantiate` and `TypedFunc::call`, to panic, and requires
that we instead call their `_async` variants. As mentioned above, I have
not yet introduced any actual asynchronicity or concurrency, so these
methods should never yield. Rather than `.await`ing their futures, I
have defined a degenerate `async` executor, `poll_once_executor`, which
polls a future exactly once, failing if it does not return
`Poll::Ready`. This means that we will panic if one of these futures
returns `Poll::Pending` unexpectedly.
The previous `trait Module` had a method `initial_instances`. `Module`
is now a concrete type, and I gave it this method, but it appears to be
unused. This is causing lints to fail. I am unsure what, if anything,
that method was for.
The previous `AutoReplacingModuleInstance` called `create_instance` on
the job thread. I am unsure if this was intentional, or just an artifact
of the previous implementation, where the `AutoReplacingModuleInstance`
lived on the job thread. I have written the new `ModuleInstanceManager`
to call `create_instance` on the calling thread, but it would be easy to
move that call into the job executor if that behavior is desired.
# API and ABI breaking changes
None user-facing
# Expected complexity level and risk
4: significant rewrite of performance-sensitive fiddly concurrency code.
Note specifically in above description:
- Running database jobs on the global Tokio runtime when not using core
pinning.
- Multi-tenant fairness issue: no longer possible to interrupt a
performance-intensive database mid-reducer to run another database
pinned to the same core.
- Unused method `module_instances`.
- Running `create_instance` on the calling thread rather than the
database thread.
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] Will arrange for a bot test.
- [ ] Determine to what extent we can run with real or synthetic
multi-tenant load in a test or staging environment.
---------
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Co-authored-by: joshua-spacetime <josh@clockworklabs.io>
# Description of Changes
<!-- Please describe your change, mention any related tickets, and so on
here. -->
The `#[table]` macro now generates read-only table and index handles.
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
1
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
- [x] positive and negative test cases using
`ReducerContext::as_read_only()`
# Description of Changes
It turns out that cargo automatically uses the latest semver-compatible
versions of dependencies, which is not what we expected. tl;dr
specifying `1.5.0` actually means `>=1.5.0 <2.0.0`, but we actually
intend `1.5.*`.
This PR updates our `upgrade-version` tool, and re-runs it to fix the
dep versions.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
- [x] I ran `cargo bump-versions 1.5.0 --rust-and-cli` to regenerate the
other committed files.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
This resolves an issue reported in Discord:
https://discordapp.com/channels/1037340874172014652/1398209084699709492/1423784670402842766
> note the name is Counter. in the typescript bindings, my useTable only
gives me the option for counter => useTable<DbConnection,
Counter>('counter');
> I switched my name in the c# module to [SpacetimeDB.Table(Public =
true, Name = "counter")] and it just works.
> Capitalization and whatnot strikes again
The diff seems large because of the mechanical changes to codegen. You
only really need to review code gen'd tables and `index.ts`. All the
other files just have a new `__TableHandle` type import.
# API and ABI breaking changes
Technically API breaking, but it fixes a bug where the old API doesn't
work
# Expected complexity level and risk
2, relatively straight-foward changes to the TS SDK and TS codegen
# Testing
- [x] I have tested with a Rust module that has an `UpperCase` table,
that that name is passed correctly down to the TypeScript client through
the codegen and that it works in `useTable`
---------
Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
# Description of Changes
Update:
This PR did all of the below but was split. Now it just does:
1. Exposes V8/JS modules via the `unstable` feature flag on the host. To
publish a JS module, `--js-path path/to/module.js`
This PR:
1. Exposes V8/JS modules via the `unstable` feature flag on the host. To
publish a JS module, `--js-path path/to/module.js` needs to be used.
2. Bumps V8 to 140.2.
3. Shares more logic with WASM and makes some minor refactorings to
energy/budget logic.
4. Moves logic from `WasmInstanceEnv` to `InstanceEnv` and friends.
5. Makes JS modules actually work in terms of `create_instance`,
`make_actor`,
6. Fleshes out `call_reducer` with timeouts and long-running logs added
as well.
7. Adds all the syscalls with associated documentation as well.
# API and ABI breaking changes
None
# Expected complexity level and risk
2? It's only available on unstable and mostly touches V8 stuff.
# Testing
Follow up PRs will add unit tests for parts.
We'll also need to add integration tests for whole modules.
# Description of Changes
This is the implementation of issue #3191. This adds a Default attribute
to C# module fields.
**Note**: In C#, attribute arguments must be compile-time constants,
which means you can't directly use non-constant expressions like new
expressions, method calls, or dynamic values in attribute constructors.
(Ref:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/attributes#2324-attribute-parameter-types)
For this reason, these default values are limited to primitive types,
enums, and strings.
This includes (shown as `C# Type` (`BSATN type`):
* `bool` (`Bool`)
* `sbyte` (`I8`)
* `byte` (`U8`)
* `short` (`I16`)
* `ushort` (`U16`)
* `int` (`I32`)
* `unit` (`U32`)
* `long` (`I64`)
* `ulong` (`U64`)
* `float` (`F32`)
* `double` (`F64`)
* `enum` (`Enum`)
* `string` (`String`)
* `null` (`RefOption`) <- Nullable type
Because of C# limitations, for nullable and complex data types, such as
a struct, can take use `[Default(null)]` to populate values with null
defaults. This allows things like structs to workaround the non-constant
expressions in attribute constructors limitation by allowing these
complex types to still be able to be added as new column tables.
The `int` type can also be in the form of Hex or Binary literals, such
as `[Default(0x2A)]` or `[Default(0b00101010)]`
Both Decimal (like `[Default(3.14m)]`) and Char (like `[Default('A')]`)
are unsupported types in BSATN and will still return `BSATN0001` errors.
# API and ABI breaking changes
Not API breaking.
This change only adds the `[Default(value)]` attribute logic.
Using the `[Default(value)]` attribute with older versions SpacetimeDB
C# modules will result in an error.
# Expected complexity level and risk
2
# Testing
Local testing of this requires use of CLI changes in #3278
- [x] Regression test of functionality added.
# Description of Changes
v8: use fast static strings for known strings
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Covered by existing.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Added support for wss:// protocol in Unreal SDK. The SDK previously
forced all connections to use ws://, preventing connections to servers
behind SSL/TLS proxies. The fix detects if the URI already includes a
protocol (ws:// or wss://) and preserves it; otherwise defaults to ws://
for backward compatibility. This enables connections to SpacetimeDB
servers hosted with Cloudflare Tunnels or other secure proxies.
# API and ABI breaking changes
This change is fully backwards compatible. URIs without a protocol
prefix will continue to work as before, with ws:// being automatically
prepended. No API signatures or existing behavior has changed.
# Expected complexity level and risk
Simple string prefix check before URL construction. The core WebSocket
connection logic remains unchanged, and the default behavior is
preserved when no protocol is specified.
# Testing
Successfully tested with a wss:// connection to my self-hosted
SpacetimeDB server behind Cloudflare Tunnel. Verified backward
compatibility with protocol-less URIs when directly connecting to the
server. Currently running without issues.
---------
Signed-off-by: Jason Larabie <jason@clockworklabs.io>
Co-authored-by: Jason Larabie <jason@clockworklabs.io>
# Description of Changes
Tiny fix to update the size scripts for the `spacetimedb` TypeScript
package
# API and ABI breaking changes
none
# Expected complexity level and risk
1
# Testing
- [ ] I verified the size script do be workin.
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
- Shares more logic with WASM and makes some minor refactorings to
energy/budget logic.
- Moves logic from `WasmInstanceEnv` to `InstanceEnv` and friends.
- Makes JS modules actually work in terms of `create_instance`,
`make_actor`,
- Fleshes out `call_reducer` with timeouts and long-running logs added
as well.
- Adds all the syscalls with associated documentation as well.
# API and ABI breaking changes
None
# Expected complexity level and risk
2
# Testing
Future work.
# Description of Changes
- Extract from `wasm_instance_env.rs`
- `console_timer_end`: use Noop backtrace.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
No semantic changes.
# Description of Changes
Fix bindgen tests (due to crate `timezone_provider`)
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Fixes CI tests.
# Description of Changes
Extract `InstanceEnv::console_timer_end`.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Just code motion.
# Description of Changes
Prefix of https://github.com/clockworklabs/SpacetimeDB/pull/3276 to
bisect a problem in smoketests.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
Future work.
Make it so the `SnapshotWorker` can be re-configured with a new
committed state. This allows event subscriptions to remain valid while a
replica transitions from leader to follower and vice versa.
This is considerably simpler than keeping the lifetimes of database and
persistence services strictly in-sync, at the expense of an idle task
per replica.
# Expected complexity level and risk
1.5
---------
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
# Description of Changes
Fix various issues on the React Integration page:
- Invalid links (Closes#3332)
- Fix broken syntax highlight
- Update the page title (React -> `React integration`)
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
Tested using a local version of the website
# Description of Changes
Fixes two problems introduced in #3185:
1. The `--directories` option on `find-publish-list.py` was not printing
directories
2. The `publish-crates.sh` script was using an undefined variable.
# API and ABI breaking changes
No breaking changes
# Expected complexity level and risk
1
# Testing
This was used to publish crates in the release yesterday.
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
The instructions were misleading about which directory to generate into.
# API and ABI breaking changes
None
# Expected complexity level and risk
1
# Testing
None
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
Two variants were missing from `intoMapKey`, causing the infinite loop
reported in https://github.com/clockworklabs/SpacetimeDB/issues/3299
# API and ABI breaking changes
No breaking changes
# Expected complexity level and risk
1
# Testing
- [x] I added a regression test for this case
# Description of Changes
Updates smoketests to check for following.
- Normal auto migration subscription should not disconnect subcriber.
- Add table columns migration should disconnect subscribers.
Solves #1957 .
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
PR contains:
* CLI changes for the `pre_publish` endpoint when publishing a module.
* The regular `--yes` flag will not bypasses the *break clients* warning
prompt — an extra confirmation is now required. For CI, a hidden flag
`--break-clients` is added.
* Added smoketest.
* Some trivial naming changes in `client-api-*` crates for consistency
reasons.
* `pre_publish` route to accept similar Body size limit as `publish`
route.
# API and ABI breaking changes
an additive API change, does not break anything.
# Expected complexity level and risk
2
# Testing
- Existing smoketests passing for backward compatibility.
- New smoketest for add columns
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
# Description of Changes
Closes: https://github.com/clockworklabs/SpacetimeDBPrivate/issues/2058
- Updated the generation code to setup basic initialization to
properties for Unreal Blueprints
- Added new Blueprint library to interact with FContextBase to allow
access to inherited properties from all contexts in Blueprint
# API and ABI breaking changes
No breaking changes
# Expected complexity level and risk
2 - Updates the generation code
# Testing
- [x] Ran Unreal tests to confirm no breaking changes
# Description of Changes
Add documentation for SpacetimeAuth
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
Checked if everything was working using a local instance of the website
# Description of Changes
As the title says
# Expected complexity level and risk
1
# Testing
- [x] Manually create the projects shown as examples and run them
# Description of Changes
The `AutoMigrateStep::DisconnectAllUsers` step is implemented as
follows:
1. The `spacetimedb::db::update::update_database` function returns a
response of type
`UpdateDatabaseResult::UpdatePerformedWithClientDisconnect`.
2. Upon receiving this response, the `host_controller::update_module`
proceeds to drop the `watch::Sender<ModuleHost>` field within the
`core::host_controller::Host` and disconnect clients.
# API and ABI breaking changes
NA
# Expected complexity level and risk
3.
Diff code is simple but It depends on the subcription logic to behave
correctly.
# Testing
Manually.
---------
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
Report more metrics about snapshot compression, namely:
- time to compress a single snapshot (histogram)
- for each compression pass:
- number of snapshots found to be already compressed (gauge)
- number of snapshots compressed (gauge)
- cumulative number of objects compressed (gauge)
- cumulative number of objects hardlinked (gauge)
Those metrics are collected from the `spacetimedb-snapshot` crate
without imposing a prometheus dependency on it, i.e. they can be
observed by the caller as ordinary Rust types.
This is exploited to avoid scanning the entire snapshot repository on
each pass -- only the range `(last_compressed + 1)..newest_snapshot` is
visited (note that the `compress_snapshots` method now short-circuits on
errors).
Lastly, the snapshot worker can now be configured to disable
compression. This greatly simplifies implementation of alternative
post-processing strategies, e.g. involving archival, for which a more
coarse-grained compression strategy may be more appropriate.
Subcribers are notified of a new snapshot _after_ compression, such that
any filesystem locks should be released.
# Expected complexity level and risk
2
# Testing
May need some, I'm pondering.
The `DurabilityProvider` trait was introduced to enable the
`HostController` to procure an alternative `Durability` impl from an
external source.
It is also useful to be able to instantiate a `SnapshotWorker`
externally, in order to subscribe to snapshot creation events without
access to the `RelationalDB` instance it is operating on.
At a later stage, we may also use it to control the snapshot frequency
externally.
This patch thus reframes the trait as `PersistenceProvider`, whose job
is to provide persistence-related services.
Also separates snapshot creation and compression of older snapshots, and
adds instrumentation to gather timing information for both.
# Description of Changes
Re-submit of #3281 (reverted by #3293), with only the intended changes.
# Expected complexity level and risk
1.5
# Testing
No functional changes.