Also use `pretty_assertions` in some tests,
because I was having trouble debugging small differences in large structures.
Notably does not use `pretty_assertions` in our whole test suite,
only in the tests broken by the previous commits in this PR.
Prior to this commit, we pre-allocated 4096 values for each sequence during bootstrap.
For user sequences, these were 0..=4096; for system sequences, they were 4097..=8192.
This did not play nicely with restoring after a restart;
we would either incorrectly re-use values starting from 4097 after restart,
or would spuriously allocate after bootstrap without a restart
and begin with values starting from 8193.
With this commit, we do not pre-allocate sequence values during bootstrap.
Each user table sequence starts with `value: 1, allocation: 0`,
and each system sequence with `value: 4097, allocation: 4096`.
This means that the first access to a sequence, either after bootstrap or after a restart,
will perform an allocation.
This is in contrast to previously, where accesses after restart would allocate,
but accesses after bootstrap would not.
Additionally, the logic for determining whether an allocation was necessary
in `MutTxId::get_next_sequence_value` contained an off-by-one value
which caused the last value before an allocation to be skipped.
This commit fixes that off-by-one error, making it so that yielding value `4096`
when `allocation == 4096` is possible, though it does result in a new allocation.
Previously, we would yield 4095 without allocation, skip 4096,
then allocate and yield 4097.
* Add the `snapshot` crate, which implements snapshotting at a low level
- Requires making `BlobHash` be `Serialize` and `Deserialize`.
For arcane macro-ology reasons, this requires writing `BlobHash::SIZE`
instead of `Self::SIZE` (it gets embedded in a visitor struct or something).
- Requires adding two new operators to `BlobStore`.
- Adds a return value to `Page::save_content_hash`, for convenience.
- Impls `DerefMut` for `Pages`.
- **Scary change:** adds `Table::pages_mut`.
I think possibly this operator should be `unsafe`,
since write access to the `Pages` allows an undisciplined caller
to violate the `Table`'s assumptions by corrupting a `Page`.
It seems like an anti-pattern to mark a method `unsafe` on the grounds that
misusing its return value can cause UB,
but I don't see a plausible alternative
without making most methods on `Page` unsafe.
Open to feedback on this one!
* Nix `Table::pages_mut`
* Address Mazdak's feedback
* Use `thiserror` rather than `anyhow` for better error hygiene
* Create new crate `fs-utils`; move `Lockfile` and `create_parent_dir`
The snapshot crate will need to create lockfiles.
Rather than duplicating code to do so, we choose to move our definition of `Lockfile`
into a crate that can be depended on by both `cli` and `snapshot`.
No existing crate seems like an obvious choice for this
-- a `Lockfile` is not really a data structure, so `data-structures` seems wrong --
so we add a new crate, `fs-utils`.
Currently this contains only `Lockfile` and `create_parent_dir`,
but a follow-up PR will add `DirTrie`, a Git-like on-disk object store.
* Deduplicate `map_err` closure
* Zeke's nit: simplify control flow
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
---------
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
* Impl `Serialize`, `Deserialize` for `Page`
Snapshotting needs to write `Page`s to files and read them back again.
To that effect, this commit implements `Serialize` and `Deserialize` for `Page`.
* Address Mazdak's review
- Fix soundness in `FixedBitSet` by moving an assert.
- Add commentary to test.
- Add commentary to `spacetimedb-lib` dependency.
* commitlog: Panic on fsync failure
Errors returned by `fsync(2)` are particularly nefarious, as it is
mostly undefined what the state of the page cache is in this case.
Since the log is synced asynchronously and not after every write, it is
impossible to know up to which commit data can be considered durable --
except by reading the most recent segment from disk.
Therefore, the reasonable thing to do is to prevent any further use of
the log, and force users to re-load it from disk.
Note that this is only half of the solution: an application restart may
still read data from the page cache, which could be gone after a system
restart.
To fix this, we would need to employ direct I/O (i.e. `O_DIRECT`), which
however is beyond the scope of this patch as it invalidates the use of
most of `std::io`.
* commitlog: Handle duplicate commits when iterating
We cannot exclude the possibility of a false failure in I/O operations.
In particular, `EIO` errors are difficult to attribute to a particular
write, as they happen asynchronously during flush of the page cache.
Because we do not bypass the page cache, the possibility exists that a
particular commit is lost when it isn't, or that it is considered
durable when it isn't. The former could lead to duplicate commits
appearing in the log, while the latter could lead to a matching offset
number, but with different commit payload.
This patch thus ignores duplicates, and introduces a new error variant
in the event the offset matches but the checksum doesn't.
* durability: Manage the flush-and-sync task in this crate
Since syncing the commitlog may now panic, it is more obvious to handle
all async tasks here, so as to be able to handle the panic cases.
Namely, if the `FlushAndSyncTask` panics, the `PersisterTask` is
aborted. This will lead to the channel receiver being dropped, which in
turn will cause the next `append_tx` call to panic.
* commitlog: Remove async flush-and-sync
Due to panic behaviour, it is now preferable to manage periodic sync at
the use site of the commitlog crate.
Hence remove `flush_and_sync_every` method, and with it the dependency
on tokio.
Make it so `HostController` manages both the module host (wasm
machinery) and the database (`RelationalDB` / `DatabaseInstanceContext`)
of spacetime databases deployed to a server.
The `DatabaseInstanceContextController` (DBIC) is removed in the
process.
This allows to make database accesses panic-safe, in that uncaught
panics will cause all resouces to be released and the database to be
restarted on subsequent access. This is a prerequisite for #985.
It also allows to move towards storage of the module binary directly in
the database / commitlog. This patch, however, makes some contortions in
order to **not** introduce a breaking change just yet.
* Make `Page` always fully init
Per discussion on the snapshotting proposal,
this PR changes the type of `Page.row_data` to `[u8; _]`,
where previously it was `[MaybeUninit<u8>; _]`.
This turns out to be shockingly easy,
as our serialization codepaths never write padding bytes into a page.
The only place pages ever became `poison` was the initial allocation;
changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`.
The majority of this diff is replacing `MaybeUninit`-specific operators
with their initialized equivalents,
and updating comments and documentation to reflect the new requirements.
This change also revealed a bug in the benchmarks
introduced when we swapped the order of sum tags and payloads
( https://github.com/clockworklabs/SpacetimeDB/pull/1063 ),
where benchmarks used a hardcoded offset for the tag which had not been updated.
* Update blake3
Blake3 only supports running under Miri as of 1.15.1, the latest version.
Prior versions hard-depended on SIMD intrinsics which Miri doesn't support.
* Address Mazdak's review.
Still pending his agreeing with me that `poison` is a better name than `uninit`.
* "Poison" -> "uninit"
Against my best wishes, for consistency with the broader Rust community's poor choices.
* Remove unnecessary `unsafe` blocks
* More unnecessary `unsafe`; remove forgotten SAFETY comments
2. Make `RowRef::row_hash` use the above.
3. Make `Table::insert` return a `RowRef`.
4. Use less unsafe because of 1-3.
5. Use `second-stack` to reuse temporary allocations in hashing and serialization.
While working on the new C# codegen, I accidentally noticed that those tests were passing even when they clearly should've been failing due to changed output.
After running with `--nocapture`, I found out it's because the tests are silently skipped and reported as successful when `rust_wasm_test.wasm` isn't built.
This further led to finding that `rust_wasm_test.wasm` is never built - the relevant module results in `rust_wasm_test_module.wasm` instead - so these tests have been incorrectly passing for ages.
This PR changes them to actually build the module as part of testing and updates the snapshots to latest master.
This patch attempts to integrate the new commitlog with the minimum
changes.
Most of the diff comes from deletions of the legacy log and the need to
adjust tests due to the requirement for a tokio runtime when a durable
database is used in tests.
The "meat" of the patch are the `RelationalDB` constructors,
`RelationalDB::commit_tx`, and the replay logic in
`locking_tx_datastore`.
While `DataKey` is gone, there is still some redundant data being passed
around, which will be addressed in the follow-up patch.
Defines traits intended to abstract over the kind of persistence a
database utilizes. The only implementation is (host-)local durability in
terms of the new commitlog crate.
The trait definitions may not be considered stable yet, but are in their
tentative form needed for further integration of the new commitlog.
* Detect unsatisfiable range queries; warn and short-circuit.
This commit fixes a panic caused by unsatisfiable range bounds on an index query,
e.g. `WHERE x < 5 AND x > 5`.
These unsatisfiable bounds made Rust's `BTreeMap` angry
(See https://doc.rust-lang.org/src/alloc/collections/btree/search.rs.html#106-124),
and panicked.
They also represent probable bugs,
as it's silly to write a query which statically will return no rows.
With this commit, we detect statically unsatisfiable bounds in two cases:
- When compiling queries, we log a message at `WARN` containing the offending query.
- When evaluating queries, we silently construct an `EmptyRelOps`
rather than a real query iterator.
This commit also adds a test that the offending queries can be compiled and executed
without panicking, and select no rows.
* Per Joshua's review, add comments that this is a suboptimal solution
* Fix typo
---------
Co-authored-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
Defines the canonical commitlog payload, and how to encode / decode it.
Also exposes folds alongside iterators, which allows the common case of
replaying the commitlog onto a database to be further optimized (the
`Txdata` does not have to be constructed in this case). This
optimization is, however, left for a future patch.
First in a series of patches to implement the new commitlog format.
This patch implements the base format, leaving the transaction payload
generic. Segment handling, writing and reading is implemented based on
an in-memory backend, which greatly simplifies testing.
As a notable deviation from the previous implementation, segments are
never implicitly trimmed. Instead, faulty commits are ignored if and
only if the next commit in the log sequence is valid and has the right
offset. On the write path, this entails closing the active segment when
an (I/O) error occurs, but retaining the commit in memory such that it
is written to the next segment.
Note that this patch does not define the final public API.
* Implement (but do not use) a fast path for BFLATN -> BSATN conversion
* fmt and clippy
* `u16` offset rather than `usize`
* Address Joshua's review
* Define methods on `RowRef` and `RelValue` which use the new serializer
* Comment in `align_to` about div-by-zero
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
* Add benchmark comparing BFLATN -> BSATN with and without the fast path
* Add benchmark on `u64_u64_u32`, which has less interior padding than `u32_u64_u64`
* Remove `to_len` from `to_bsatn_extend`
It turns out to be slower than just eating the `realloc`s.
* Remove unused `to_bsatn_slice`
I thought I would need it, but it ended up not being useful.
* Expand comment with example; `Box<[...]>` to reduce memory footprint
* Comments from Mazdak's review
---------
Signed-off-by: Phoebe Goldman <phoebe@goldman-tribe.org>
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
It turns out that the changes introduced in #734 do not result in more
reliable detection of incompatible schema updates. This is because the
datastructures involved can be converted into each other, but that
conversion is not bijective.
Fix this by manually adjusting the schema of the existing table to be
comparable to the proposed table.
Also log details about a schema mismatch to the user-retrievable database log,
in unified diff format.
Closes#747.
Before this change,
we would evaluate each and every query,
for each and every subscription,
on each and every row update.
If N subscriptions had a query Q in common,
it would be evaluated N different times.
With this change,
distinct queries are evaluated once,
and the results copied for each client.
So in the example above, Q would be evaluated once,
with the results transmitted to N different clients.