Commit Graph

25 Commits

Author SHA1 Message Date
Kim Altintop 727f51620b durability: Notify waiters after lockfile is dropped (#4051)
Notifications were sent shortly before dropping the actor (and thus, the
lockfile), which could cause the `close` future to return while the lock
is still held.

This can lead to a race if the database is re-opened immediately, such
as in `TestDB::reopen()`, causing test flakes.

# Expected complexity level and risk

1

# Testing

Should result in "Database is already opened" test flakes to go away.
2026-01-16 07:07:26 +00:00
Kim Altintop bcf6d77fca Time out database shutdown (#4019)
It is possible that, under pathological conditions, a database has a
huge transaction backlog, or that there is some bug that prevents
progress on draining this backlog upon shutdown.

In order to avoid piling up `exit_module_host` tasks (which we would not
notice), impose a timeout to be specified after which `exit_module_host`
will drop resources without waiting for the shutdown to complete
gracefully.
2026-01-13 15:07:16 +00:00
Kim Altintop 05d4874918 Create db.lock file only for persistent databases (#3912)
This is the first step to make in-memory only databases not touch the
disk at all. Pending is an in-memory only sink for module logs.

Responsibility for the lock file is transferred to `Durability`, which
means that only persistent databases opened for writing acquire the
lock.

As a consequence, the `Durability` trait gains a `close` method that
prevents further writes and drains the internal buffers, even when
multiple `Arc`-pointers to the `Durability` exist.


# Expected complexity level and risk

2

# Testing

Covered by existing tests.
2026-01-08 08:22:37 +00:00
Kim Altintop e2b4113ffb Async shutdown for database / durability (#3880)
Controlled shutdown of a database should drain the outstanding
transactions
queue(s) and flush them to the durability layer.

With the introduction of another queueing layer in #3868, it became
harder to
observe when or if this process is completed.

This patch thus introduces an explicit (async) shutdown method for
`RelationalDB` and below, which will wait until all submitted
transactions are
either reported durable, or an error occurs in the durability layer.

`RelationalDB` is made `!Clone`, such that shutdown can be initiated in
the
`Drop` impl. Note that this requires access to a tokio runtime, which we
thread
through via the `Persistence` services in order to allow control over
which of
the various runtimes is being used for durability-related tasks.

Also moves `RelationalDB::open` to a blocking thread when a
persistence-enabled
database is constructed by the `HostController` -- this process performs
heavy
I/O and can take a substantial amount of time, during which we don't
want to
block a worker thread.

# API and ABI breaking changes

None

# Expected complexity level and risk

3

# Testing

- [ ] some testing added
- [ ] existing tests still pass
- [ ] `impl Drop for RelationalDB` difficult to test, extra eyeballs
needed

---------

Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2025-12-17 18:28:42 +00:00
Kim Altintop cfd0d4b712 commitlog,durability: Support preallocation of disk space (#3437)
When a new commitlog segment is created, allocate disk space for it up
to the maximum segment size. Also do this when resuming writes to an
existing segment, such that segments created without preallocation will
allocate as well when the database is opened.

Preallocation is gated behind the feature "fallocate", because it is not
always desirable to preallocate, e.g. for local `standalone` users.

The feature can only be enabled on Linux targets, because allocation is
done using the Linux-specific `fallocate(2)` system call.

Unlike `ftruncate(2)` or the portable `posix_fallocate(3)`,
`fallocate(2)`
supports allocating disk space without zeroing. This is currently
required, because the commitlog format does not handle padding bytes.

If not enough space can be allocated, the commitlog refuses writes. For
commitlogs that were created without preallocation, this means that the
commitlog cannot even be opened in this situation.

The local durability impl will crash if it detects that the commitlog is
unable to allocate enough space.

This means that a database will eventually crash and be unable to start
in
an out-of-space situation.

Allocated space is not included in the reported size of the commitlog.
Instead, allocated blocks are reported separately.


# Expected complexity level and risk

3 - Disk size monitoring may need to be adjusted.

# Testing

- [x] Adds a test that demonstrates the crash behavior of
[`spacetimedb_durability::Local`]
when there is insufficient space. The test performs I/O against a loop
device.
- [x] Modified the `repo::Memory` impl so that it can run out of space.
No test currently
utilizes this, but existing tests assuming infinite space still pass.
2025-11-10 16:55:55 +00:00
Phoebe Goldman e77b62f475 Also capture a snapshot every new commitlog segment (#3405)
# Description of Changes

We've run into a problem on Maincloud caused by a database that was
writing a relatively small number of very large transactions. This was
accruing many commitlog segments consuming hundreds of gigabytes of
disk, but had not ever taken a snapshot, or compressed or archived any
data, as the database had not progressed past one million transactions.

With this PR, we take a snapshot every time the commitlog segment
rotates. We still also snapshot every million transactions.

One BitCraft database we looked at had 2.5 million transactions per
commitlog segment, meaning that this change will not meaningfully affect
the frequency of snapshots. The offending Maincloud database, however,
had only 50 transactions per segment!

# API and ABI breaking changes

N/a

# Expected complexity level and risk

3: Hastily made changes to finnicky code across several crates.

# Testing

I am unsure how to test these changes.

- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
2025-10-15 15:18:15 +00:00
Kim Altintop a675cb36d2 Expand scope of DurabilityProvider to include snapshotting (#3295)
The `DurabilityProvider` trait was introduced to enable the
`HostController` to procure an alternative `Durability` impl from an
external source.

It is also useful to be able to instantiate a `SnapshotWorker`
externally, in order to subscribe to snapshot creation events without
access to the `RelationalDB` instance it is operating on.

At a later stage, we may also use it to control the snapshot frequency
externally.

This patch thus reframes the trait as `PersistenceProvider`, whose job
is to provide persistence-related services.

Also separates snapshot creation and compression of older snapshots, and
adds instrumentation to gather timing information for both.

# Description of Changes

Re-submit of #3281 (reverted by #3293), with only the intended changes.

# Expected complexity level and risk

1.5

# Testing

No functional changes.
2025-10-01 05:50:37 +00:00
Kim Altintop 311462760a Revert "Expand scope of DurabilityProvider to include snapshotting (#3281)" (#3293)
This reverts commit 2b61190d4d.

An accident happened, and the patch contains changes that were intended
for a separate PR.
Perhaps better to start over.
2025-09-25 14:30:59 +00:00
Kim Altintop 2b61190d4d Expand scope of DurabilityProvider to include snapshotting (#3281)
The `DurabilityProvider` trait was introduced to enable the
`HostController` to procure an alternative `Durability` impl from an
external source.

It is also useful to be able to instantiate a `SnapshotWorker`
externally, in order to subscribe to snapshot creation events without
access to the `RelationalDB` instance it is operating on.

At a later stage, we may also use it to control the snapshot frequency
externally.

This patch thus reframes the trait as `PersistenceProvider`, whose job
is to provide persistence-related services.

Also separates snapshot creation and compression of older snapshots, and
adds instrumentation to gather timing information for both.

# Expected complexity level and risk

1.5

# Testing

Not a functional change, existing tests should cover that.
2025-09-25 12:53:53 +00:00
Kim Altintop f394de32d9 Confirmed reads (#3133)
# Description of Changes

Implements [subscribing to durable
commits](https://github.com/clockworklabs/SpacetimeDBPrivate/issues/1594).

The setting works on a per-connection level, and essentially just delays
sending transaction updates until the transaction is reported as durable
by the database.

For connectionless SQL operations, the setting works per-request. No SQL
syntax is provided by this patch to toggle the configuration.

After some deliberation, I opted to obtain the offset when a transaction
commits (as opposed to when it starts). This creates some mild
inconvenience, because we prevent the transaction from committing until
the corresponding subscription updates are enqueued.
The strategy is, however, more correct should we ever support weaker
isolation levels, and it is easier to document.

Follow-ups include:

- Provide SQL syntax (`SET synchronous_commit = ON` or something)
- C# and TypeScript SDKs
- Reference docs?
 

# API and ABI breaking changes

Not breaking, but adds a parameter to the subscribe and sql endpoints.


# Expected complexity level and risk

4

To the author's understanding, ordering of outbound messages is not
changed by this patch, even if there are messages that don't have a
transaction offset (such as error messages). I.e. while waiting for the
transaction offset of a message to become durable, no message enqueued
after that message will be delivered. This may not be desirable in some
cases.

The patch may contain concurrency bugs, e.g. awaiting futures that may
never resolve.


# Testing

- [x] Implemented a new test in the `module_subscription_actor` module
- [x] Added unit tests for the core logic in `ClientConnectionReceiver` 

It would be desirable to also have integration-level tests, but I'm
currently unsure how to write those without being able to control if and
when the database reports an offset as durable.

---------

Signed-off-by: Kim Altintop <kim@eagain.io>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
2025-09-08 17:51:04 +00:00
Zeke Foppa f6f0909ea4 Update all licenses (#3002)
# Description of Changes

We recently merged several repos together. This PR clarifies the license
terms for several subdirectories, as well as the relationship between
the licenses.

The licenses in our subdirectories have become symbolic links to
licenses in our toplevel `licenses` directory. For any particular
subdirectory's license file in the diff, you can click `... -> View
file` and then click on the text that says "Symbolic Link" on that page.
This will take you to the license file that it links to.

I have also updated the `tools/upgrade-version` script to update the
change date in the new `licenses/BSL.txt` file.

# API and ABI breaking changes

None.

# Expected complexity level and risk

1

# Testing

None. Only changes to license files.

---------

Co-authored-by: Zeke Foppa <bfops@users.noreply.github.com>
2025-08-12 18:20:58 +00:00
Noa 742303ca49 Bump rust-toolchain to rust 1.88 (#2749)
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
2025-07-15 17:39:41 +00:00
Kim Altintop 47fd1684fd durability: Fix task leak (#2875) 2025-06-27 12:55:22 +00:00
Kim Altintop 3d1a91c25c Handle snapshot restore more robustly (#2735)
Signed-off-by: Kim Altintop <kim@eagain.io>
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Shubham Mishra <shubham@clockworklabs.io>
2025-05-15 14:35:09 +00:00
Mario Montoya 3fd78203c4 Compress the snapshot (#2034) 2025-04-11 15:18:17 +00:00
Noa a5212a5f75 Commitlog compression (#2504) 2025-03-31 22:00:52 +00:00
Mario Montoya f9f38543c8 Add readmes to all implementation crates specifying that they do no offer stable interfaces (#2320) 2025-03-06 19:50:17 +00:00
Noa f136670420 Directory structure impl (#1879)
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
2024-11-12 04:24:43 +00:00
Kim Altintop afeb3421ae commitlog: Yield StoredCommit in iterators (#1791) 2024-10-08 08:53:25 +00:00
Kim Altintop 0fa94cedc1 durability: Skip fsync if nothing changed (#1360) 2024-06-11 18:10:33 +00:00
Kim Altintop 1044ebc652 core: Store address, owner and program bytes in st_module (#1305) 2024-06-10 10:25:16 +00:00
Kim Altintop 2c3fc66f21 Commitlog: panic on fsync failure (#985)
* commitlog: Panic on fsync failure

Errors returned by `fsync(2)` are particularly nefarious, as it is
mostly undefined what the state of the page cache is in this case.

Since the log is synced asynchronously and not after every write, it is
impossible to know up to which commit data can be considered durable --
except by reading the most recent segment from disk.

Therefore, the reasonable thing to do is to prevent any further use of
the log, and force users to re-load it from disk.

Note that this is only half of the solution: an application restart may
still read data from the page cache, which could be gone after a system
restart.

To fix this, we would need to employ direct I/O (i.e. `O_DIRECT`), which
however is beyond the scope of this patch as it invalidates the use of
most of `std::io`.

* commitlog: Handle duplicate commits when iterating

We cannot exclude the possibility of a false failure in I/O operations.
In particular, `EIO` errors are difficult to attribute to a particular
write, as they happen asynchronously during flush of the page cache.

Because we do not bypass the page cache, the possibility exists that a
particular commit is lost when it isn't, or that it is considered
durable when it isn't. The former could lead to duplicate commits
appearing in the log, while the latter could lead to a matching offset
number, but with different commit payload.

This patch thus ignores duplicates, and introduces a new error variant
in the event the offset matches but the checksum doesn't.

* durability: Manage the flush-and-sync task in this crate

Since syncing the commitlog may now panic, it is more obvious to handle
all async tasks here, so as to be able to handle the panic cases.

Namely, if the `FlushAndSyncTask` panics, the `PersisterTask` is
aborted. This will lead to the channel receiver being dropped, which in
turn will cause the next `append_tx` call to panic.

* commitlog: Remove async flush-and-sync

Due to panic behaviour, it is now preferable to manage periodic sync at
the use site of the commitlog crate.

Hence remove `flush_and_sync_every` method, and with it the dependency
on tokio.
2024-05-28 18:22:38 +00:00
Kim Altintop bee6ae188d Re-instantiate replay progress reporting (#1096)
* durability: Introduce a method to obtain the max tx offset of a history

Useful for reporting replay progress.
Include note that it is somewhat similar to `std::iter::Iterator::size_hint`.

* core: Re-instantiate replay progress reporting

The percentage is calculated as starting from the zero offset, although
that may change in the future.
2024-04-19 09:38:52 +00:00
Kim Altintop 838e8696ec core,commitlog: Re-instantiate commitlog disk usage reporting (#955)
Disk usage reporting was left unimplemented in previous patches of the
series, as its semantics are slightly different from before.

Namely, inspecting the size of the commitlog now requires to `stat(2)`
the segment files, and is thus fallible.

Also, a size reporting function is only defined for local durability
(i.e. the commitlog). The behaviour when the database is in a follower
state is left unspecified.
2024-04-12 08:49:34 +00:00
Kim Altintop 02be002416 Durability: Traits and implementation in terms of commitlog (#922)
Defines traits intended to abstract over the kind of persistence a
database utilizes. The only implementation is (host-)local durability in
terms of the new commitlog crate.

The trait definitions may not be considered stable yet, but are in their
tentative form needed for further integration of the new commitlog.
2024-04-11 09:44:58 +00:00