Commit Graph

15 Commits

Author SHA1 Message Date
Kim Altintop cfd0d4b712 commitlog,durability: Support preallocation of disk space (#3437)
When a new commitlog segment is created, allocate disk space for it up
to the maximum segment size. Also do this when resuming writes to an
existing segment, such that segments created without preallocation will
allocate as well when the database is opened.

Preallocation is gated behind the feature "fallocate", because it is not
always desirable to preallocate, e.g. for local `standalone` users.

The feature can only be enabled on Linux targets, because allocation is
done using the Linux-specific `fallocate(2)` system call.

Unlike `ftruncate(2)` or the portable `posix_fallocate(3)`,
`fallocate(2)`
supports allocating disk space without zeroing. This is currently
required, because the commitlog format does not handle padding bytes.

If not enough space can be allocated, the commitlog refuses writes. For
commitlogs that were created without preallocation, this means that the
commitlog cannot even be opened in this situation.

The local durability impl will crash if it detects that the commitlog is
unable to allocate enough space.

This means that a database will eventually crash and be unable to start
in
an out-of-space situation.

Allocated space is not included in the reported size of the commitlog.
Instead, allocated blocks are reported separately.


# Expected complexity level and risk

3 - Disk size monitoring may need to be adjusted.

# Testing

- [x] Adds a test that demonstrates the crash behavior of
[`spacetimedb_durability::Local`]
when there is insufficient space. The test performs I/O against a loop
device.
- [x] Modified the `repo::Memory` impl so that it can run out of space.
No test currently
utilizes this, but existing tests assuming infinite space still pass.
2025-11-10 16:55:55 +00:00
Phoebe Goldman e77b62f475 Also capture a snapshot every new commitlog segment (#3405)
# Description of Changes

We've run into a problem on Maincloud caused by a database that was
writing a relatively small number of very large transactions. This was
accruing many commitlog segments consuming hundreds of gigabytes of
disk, but had not ever taken a snapshot, or compressed or archived any
data, as the database had not progressed past one million transactions.

With this PR, we take a snapshot every time the commitlog segment
rotates. We still also snapshot every million transactions.

One BitCraft database we looked at had 2.5 million transactions per
commitlog segment, meaning that this change will not meaningfully affect
the frequency of snapshots. The offending Maincloud database, however,
had only 50 transactions per segment!

# API and ABI breaking changes

N/a

# Expected complexity level and risk

3: Hastily made changes to finnicky code across several crates.

# Testing

I am unsure how to test these changes.

- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
2025-10-15 15:18:15 +00:00
Noa 742303ca49 Bump rust-toolchain to rust 1.88 (#2749)
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2025-07-15 17:39:41 +00:00
Shubham Mishra f2a9657a72 Commitlog: handle empty offset index lookup (#2771) 2025-05-22 12:59:02 +00:00
Noa a5212a5f75 Commitlog compression (#2504) 2025-03-31 22:00:52 +00:00
Kim Altintop 5063bd8759 commitlog: Streaming (#2492) 2025-03-26 07:40:23 +00:00
Phoebe Goldman d171b44a89 Don't create indexes during bootstrapping; wait until after replay (#2161) 2025-01-23 19:41:39 +00:00
Kim Altintop 125ab58388 commitlog: Fix set_epoch (#2005) 2024-11-21 13:34:10 +00:00
Noa f136670420 Directory structure impl (#1879)
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
2024-11-12 04:24:43 +00:00
Kim Altintop e4fcb72432 commitlog: Small tweaks (#1978) 2024-11-11 13:24:21 +00:00
Shubham Mishra eeaa00a05f Commitlog offset index (#1671)
Signed-off-by: Shubham Mishra <shubham@clockworklabs.io>
Co-authored-by: Kim Altintop <kim@eagain.io>
2024-09-24 16:06:49 +00:00
Kim Altintop 2c3fc66f21 Commitlog: panic on fsync failure (#985)
* commitlog: Panic on fsync failure

Errors returned by `fsync(2)` are particularly nefarious, as it is
mostly undefined what the state of the page cache is in this case.

Since the log is synced asynchronously and not after every write, it is
impossible to know up to which commit data can be considered durable --
except by reading the most recent segment from disk.

Therefore, the reasonable thing to do is to prevent any further use of
the log, and force users to re-load it from disk.

Note that this is only half of the solution: an application restart may
still read data from the page cache, which could be gone after a system
restart.

To fix this, we would need to employ direct I/O (i.e. `O_DIRECT`), which
however is beyond the scope of this patch as it invalidates the use of
most of `std::io`.

* commitlog: Handle duplicate commits when iterating

We cannot exclude the possibility of a false failure in I/O operations.
In particular, `EIO` errors are difficult to attribute to a particular
write, as they happen asynchronously during flush of the page cache.

Because we do not bypass the page cache, the possibility exists that a
particular commit is lost when it isn't, or that it is considered
durable when it isn't. The former could lead to duplicate commits
appearing in the log, while the latter could lead to a matching offset
number, but with different commit payload.

This patch thus ignores duplicates, and introduces a new error variant
in the event the offset matches but the checksum doesn't.

* durability: Manage the flush-and-sync task in this crate

Since syncing the commitlog may now panic, it is more obvious to handle
all async tasks here, so as to be able to handle the panic cases.

Namely, if the `FlushAndSyncTask` panics, the `PersisterTask` is
aborted. This will lead to the channel receiver being dropped, which in
turn will cause the next `append_tx` call to panic.

* commitlog: Remove async flush-and-sync

Due to panic behaviour, it is now preferable to manage periodic sync at
the use site of the commitlog crate.

Hence remove `flush_and_sync_every` method, and with it the dependency
on tokio.
2024-05-28 18:22:38 +00:00
Kim Altintop 1d316d991e Commitlog: Add canonical txdata payload (#921)
Defines the canonical commitlog payload, and how to encode / decode it.

Also exposes folds alongside iterators, which allows the common case of
replaying the commitlog onto a database to be further optimized (the
`Txdata` does not have to be constructed in this case). This
optimization is, however, left for a future patch.
2024-04-02 09:54:19 +00:00
Kim Altintop 73cd78231e Commitlog: Add I/O based on regular files (#920)
Provides a commitlog backing store based on files, and defines the
exported `Commitlog` type which fixes the store to the file-based one.
2024-04-02 09:10:21 +00:00
Kim Altintop 3b343e4eb1 Commitlog: Base implementation "sans I/O" (#919)
First in a series of patches to implement the new commitlog format.

This patch implements the base format, leaving the transaction payload
generic. Segment handling, writing and reading is implemented based on
an in-memory backend, which greatly simplifies testing.

As a notable deviation from the previous implementation, segments are
never implicitly trimmed. Instead, faulty commits are ignored if and
only if the next commit in the log sequence is valid and has the right
offset. On the write path, this entails closing the active segment when
an (I/O) error occurs, but retaining the commit in memory such that it
is written to the next segment.

Note that this patch does not define the final public API.
2024-04-02 06:18:30 +00:00