This is the first step to make in-memory only databases not touch the
disk at all. Pending is an in-memory only sink for module logs.
Responsibility for the lock file is transferred to `Durability`, which
means that only persistent databases opened for writing acquire the
lock.
As a consequence, the `Durability` trait gains a `close` method that
prevents further writes and drains the internal buffers, even when
multiple `Arc`-pointers to the `Durability` exist.
# Expected complexity level and risk
2
# Testing
Covered by existing tests.
Controlled shutdown of a database should drain the outstanding
transactions
queue(s) and flush them to the durability layer.
With the introduction of another queueing layer in #3868, it became
harder to
observe when or if this process is completed.
This patch thus introduces an explicit (async) shutdown method for
`RelationalDB` and below, which will wait until all submitted
transactions are
either reported durable, or an error occurs in the durability layer.
`RelationalDB` is made `!Clone`, such that shutdown can be initiated in
the
`Drop` impl. Note that this requires access to a tokio runtime, which we
thread
through via the `Persistence` services in order to allow control over
which of
the various runtimes is being used for durability-related tasks.
Also moves `RelationalDB::open` to a blocking thread when a
persistence-enabled
database is constructed by the `HostController` -- this process performs
heavy
I/O and can take a substantial amount of time, during which we don't
want to
block a worker thread.
# API and ABI breaking changes
None
# Expected complexity level and risk
3
# Testing
- [ ] some testing added
- [ ] existing tests still pass
- [ ] `impl Drop for RelationalDB` difficult to test, extra eyeballs
needed
---------
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
When a new commitlog segment is created, allocate disk space for it up
to the maximum segment size. Also do this when resuming writes to an
existing segment, such that segments created without preallocation will
allocate as well when the database is opened.
Preallocation is gated behind the feature "fallocate", because it is not
always desirable to preallocate, e.g. for local `standalone` users.
The feature can only be enabled on Linux targets, because allocation is
done using the Linux-specific `fallocate(2)` system call.
Unlike `ftruncate(2)` or the portable `posix_fallocate(3)`,
`fallocate(2)`
supports allocating disk space without zeroing. This is currently
required, because the commitlog format does not handle padding bytes.
If not enough space can be allocated, the commitlog refuses writes. For
commitlogs that were created without preallocation, this means that the
commitlog cannot even be opened in this situation.
The local durability impl will crash if it detects that the commitlog is
unable to allocate enough space.
This means that a database will eventually crash and be unable to start
in
an out-of-space situation.
Allocated space is not included in the reported size of the commitlog.
Instead, allocated blocks are reported separately.
# Expected complexity level and risk
3 - Disk size monitoring may need to be adjusted.
# Testing
- [x] Adds a test that demonstrates the crash behavior of
[`spacetimedb_durability::Local`]
when there is insufficient space. The test performs I/O against a loop
device.
- [x] Modified the `repo::Memory` impl so that it can run out of space.
No test currently
utilizes this, but existing tests assuming infinite space still pass.
Make it so the `SnapshotWorker` can be re-configured with a new
committed state. This allows event subscriptions to remain valid while a
replica transitions from leader to follower and vice versa.
This is considerably simpler than keeping the lifetimes of database and
persistence services strictly in-sync, at the expense of an idle task
per replica.
# Expected complexity level and risk
1.5
---------
Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
Report more metrics about snapshot compression, namely:
- time to compress a single snapshot (histogram)
- for each compression pass:
- number of snapshots found to be already compressed (gauge)
- number of snapshots compressed (gauge)
- cumulative number of objects compressed (gauge)
- cumulative number of objects hardlinked (gauge)
Those metrics are collected from the `spacetimedb-snapshot` crate
without imposing a prometheus dependency on it, i.e. they can be
observed by the caller as ordinary Rust types.
This is exploited to avoid scanning the entire snapshot repository on
each pass -- only the range `(last_compressed + 1)..newest_snapshot` is
visited (note that the `compress_snapshots` method now short-circuits on
errors).
Lastly, the snapshot worker can now be configured to disable
compression. This greatly simplifies implementation of alternative
post-processing strategies, e.g. involving archival, for which a more
coarse-grained compression strategy may be more appropriate.
Subcribers are notified of a new snapshot _after_ compression, such that
any filesystem locks should be released.
# Expected complexity level and risk
2
# Testing
May need some, I'm pondering.
The `DurabilityProvider` trait was introduced to enable the
`HostController` to procure an alternative `Durability` impl from an
external source.
It is also useful to be able to instantiate a `SnapshotWorker`
externally, in order to subscribe to snapshot creation events without
access to the `RelationalDB` instance it is operating on.
At a later stage, we may also use it to control the snapshot frequency
externally.
This patch thus reframes the trait as `PersistenceProvider`, whose job
is to provide persistence-related services.
Also separates snapshot creation and compression of older snapshots, and
adds instrumentation to gather timing information for both.
# Description of Changes
Re-submit of #3281 (reverted by #3293), with only the intended changes.
# Expected complexity level and risk
1.5
# Testing
No functional changes.
This reverts commit 2b61190d4d.
An accident happened, and the patch contains changes that were intended
for a separate PR.
Perhaps better to start over.
The `DurabilityProvider` trait was introduced to enable the
`HostController` to procure an alternative `Durability` impl from an
external source.
It is also useful to be able to instantiate a `SnapshotWorker`
externally, in order to subscribe to snapshot creation events without
access to the `RelationalDB` instance it is operating on.
At a later stage, we may also use it to control the snapshot frequency
externally.
This patch thus reframes the trait as `PersistenceProvider`, whose job
is to provide persistence-related services.
Also separates snapshot creation and compression of older snapshots, and
adds instrumentation to gather timing information for both.
# Expected complexity level and risk
1.5
# Testing
Not a functional change, existing tests should cover that.
# Description of Changes
Adds utilities for marking and deleting snapshot directories that have
been archived
# API and ABI breaking changes
<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->
None
# Expected complexity level and risk
<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.
This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->
1
# Testing
<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->
Testing will be handled by the patch that adds archival
# Description of Changes
We recently merged several repos together. This PR clarifies the license
terms for several subdirectories, as well as the relationship between
the licenses.
The licenses in our subdirectories have become symbolic links to
licenses in our toplevel `licenses` directory. For any particular
subdirectory's license file in the diff, you can click `... -> View
file` and then click on the text that says "Symbolic Link" on that page.
This will take you to the license file that it links to.
I have also updated the `tools/upgrade-version` script to update the
change date in the new `licenses/BSL.txt` file.
# API and ABI breaking changes
None.
# Expected complexity level and risk
1
# Testing
None. Only changes to license files.
---------
Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
# Description of Changes
On databases with many already-compressed snapshots, this was leading to
log spam without providing any useful information.
# API and ABI breaking changes
N/a
# Expected complexity level and risk
1
# Testing
N/a
* Add the `snapshot` crate, which implements snapshotting at a low level
- Requires making `BlobHash` be `Serialize` and `Deserialize`.
For arcane macro-ology reasons, this requires writing `BlobHash::SIZE`
instead of `Self::SIZE` (it gets embedded in a visitor struct or something).
- Requires adding two new operators to `BlobStore`.
- Adds a return value to `Page::save_content_hash`, for convenience.
- Impls `DerefMut` for `Pages`.
- **Scary change:** adds `Table::pages_mut`.
I think possibly this operator should be `unsafe`,
since write access to the `Pages` allows an undisciplined caller
to violate the `Table`'s assumptions by corrupting a `Page`.
It seems like an anti-pattern to mark a method `unsafe` on the grounds that
misusing its return value can cause UB,
but I don't see a plausible alternative
without making most methods on `Page` unsafe.
Open to feedback on this one!
* Nix `Table::pages_mut`
* Address Mazdak's feedback
* Use `thiserror` rather than `anyhow` for better error hygiene