Commit Graph

25 Commits

Author SHA1 Message Date
Phoebe Goldman e77b62f475 Also capture a snapshot every new commitlog segment (#3405)
# Description of Changes

We've run into a problem on Maincloud caused by a database that was
writing a relatively small number of very large transactions. This was
accruing many commitlog segments consuming hundreds of gigabytes of
disk, but had not ever taken a snapshot, or compressed or archived any
data, as the database had not progressed past one million transactions.

With this PR, we take a snapshot every time the commitlog segment
rotates. We still also snapshot every million transactions.

One BitCraft database we looked at had 2.5 million transactions per
commitlog segment, meaning that this change will not meaningfully affect
the frequency of snapshots. The offending Maincloud database, however,
had only 50 transactions per segment!

# API and ABI breaking changes

N/a

# Expected complexity level and risk

3: Hastily made changes to finnicky code across several crates.

# Testing

I am unsure how to test these changes.

- [ ] <!-- maybe a test you want to do -->
- [ ] <!-- maybe a test you want a reviewer to do, so they can check it
off when they're satisfied. -->
2025-10-15 15:18:15 +00:00
Kim Altintop 37c64c787b commitlog: Provide folding over a range of tx offsets (#3129)
Adds methods and free-standing functions to allow folds to stop at an
upper
bound, by passing a range instead of only a start offset.

# Expected complexity level and risk

1

# Testing
2025-08-08 11:55:27 +00:00
Kim Altintop 7709f3cf1e commitlog: Set up options for toml configuration (#2942) 2025-07-17 08:34:35 +00:00
Kim Altintop 3d1a91c25c Handle snapshot restore more robustly (#2735)
Signed-off-by: Kim Altintop <kim@eagain.io>
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Co-authored-by: Shubham Mishra <shubham@clockworklabs.io>
2025-05-15 14:35:09 +00:00
Mario Montoya 3fd78203c4 Compress the snapshot (#2034) 2025-04-11 15:18:17 +00:00
Noa a5212a5f75 Commitlog compression (#2504) 2025-03-31 22:00:52 +00:00
Kim Altintop 5063bd8759 commitlog: Streaming (#2492) 2025-03-26 07:40:23 +00:00
Noa 293aebaef9 Bump to Rust 1.84 (#2001) 2025-01-28 23:11:29 +00:00
Phoebe Goldman d171b44a89 Don't create indexes during bootstrapping; wait until after replay (#2161) 2025-01-23 19:41:39 +00:00
Kim Altintop c5f4c8bc5c commitlog: Make offset index usable externally (#2108) 2025-01-14 18:56:08 +00:00
Shubham Mishra f04d2817d0 create commitlog dir in fs::New (#2006) 2024-11-21 15:47:40 +00:00
Noa f136670420 Directory structure impl (#1879)
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>
2024-11-12 04:24:43 +00:00
Kim Altintop f22b163c0a commitlog: Introduce epoch (#1851) 2024-11-05 10:10:30 +00:00
Kim Altintop afeb3421ae commitlog: Yield StoredCommit in iterators (#1791) 2024-10-08 08:53:25 +00:00
Shubham Mishra eeaa00a05f Commitlog offset index (#1671)
Signed-off-by: Shubham Mishra <shubham@clockworklabs.io>
Co-authored-by: Kim Altintop <kim@eagain.io>
2024-09-24 16:06:49 +00:00
Kim Altintop 0029ca5648 commitlog: Make commit module public, and allow access to header fields (#1685) 2024-09-10 08:16:32 +00:00
Jeremie Pelletier f91dcda283 Make some commitlog helpers public (#1390) 2024-07-09 18:02:58 +00:00
Kim Altintop 2c3fc66f21 Commitlog: panic on fsync failure (#985)
* commitlog: Panic on fsync failure

Errors returned by `fsync(2)` are particularly nefarious, as it is
mostly undefined what the state of the page cache is in this case.

Since the log is synced asynchronously and not after every write, it is
impossible to know up to which commit data can be considered durable --
except by reading the most recent segment from disk.

Therefore, the reasonable thing to do is to prevent any further use of
the log, and force users to re-load it from disk.

Note that this is only half of the solution: an application restart may
still read data from the page cache, which could be gone after a system
restart.

To fix this, we would need to employ direct I/O (i.e. `O_DIRECT`), which
however is beyond the scope of this patch as it invalidates the use of
most of `std::io`.

* commitlog: Handle duplicate commits when iterating

We cannot exclude the possibility of a false failure in I/O operations.
In particular, `EIO` errors are difficult to attribute to a particular
write, as they happen asynchronously during flush of the page cache.

Because we do not bypass the page cache, the possibility exists that a
particular commit is lost when it isn't, or that it is considered
durable when it isn't. The former could lead to duplicate commits
appearing in the log, while the latter could lead to a matching offset
number, but with different commit payload.

This patch thus ignores duplicates, and introduces a new error variant
in the event the offset matches but the checksum doesn't.

* durability: Manage the flush-and-sync task in this crate

Since syncing the commitlog may now panic, it is more obvious to handle
all async tasks here, so as to be able to handle the panic cases.

Namely, if the `FlushAndSyncTask` panics, the `PersisterTask` is
aborted. This will lead to the channel receiver being dropped, which in
turn will cause the next `append_tx` call to panic.

* commitlog: Remove async flush-and-sync

Due to panic behaviour, it is now preferable to manage periodic sync at
the use site of the commitlog crate.

Hence remove `flush_and_sync_every` method, and with it the dependency
on tokio.
2024-05-28 18:22:38 +00:00
Kim Altintop 06d5481dfb commitlog: Support traversal without opening the log (#1103)
Traversing the commitlog without also making it available for writing
would still require upfront I/O imposed by the `open` constructor.

Avoid that by introducing free-standing functions which start traversal
right away.
2024-04-19 18:08:41 +00:00
Kim Altintop 838e8696ec core,commitlog: Re-instantiate commitlog disk usage reporting (#955)
Disk usage reporting was left unimplemented in previous patches of the
series, as its semantics are slightly different from before.

Namely, inspecting the size of the commitlog now requires to `stat(2)`
the segment files, and is thus fallible.

Also, a size reporting function is only defined for local durability
(i.e. the commitlog). The behaviour when the database is in a follower
state is left unspecified.
2024-04-12 08:49:34 +00:00
Kim Altintop 02be002416 Durability: Traits and implementation in terms of commitlog (#922)
Defines traits intended to abstract over the kind of persistence a
database utilizes. The only implementation is (host-)local durability in
terms of the new commitlog crate.

The trait definitions may not be considered stable yet, but are in their
tentative form needed for further integration of the new commitlog.
2024-04-11 09:44:58 +00:00
Phoebe Goldman 8902b08bfc Drop commitlog logging to trace to avoid spamming host logs (#1073) 2024-04-10 15:26:54 +00:00
Kim Altintop 1d316d991e Commitlog: Add canonical txdata payload (#921)
Defines the canonical commitlog payload, and how to encode / decode it.

Also exposes folds alongside iterators, which allows the common case of
replaying the commitlog onto a database to be further optimized (the
`Txdata` does not have to be constructed in this case). This
optimization is, however, left for a future patch.
2024-04-02 09:54:19 +00:00
Kim Altintop 73cd78231e Commitlog: Add I/O based on regular files (#920)
Provides a commitlog backing store based on files, and defines the
exported `Commitlog` type which fixes the store to the file-based one.
2024-04-02 09:10:21 +00:00
Kim Altintop 3b343e4eb1 Commitlog: Base implementation "sans I/O" (#919)
First in a series of patches to implement the new commitlog format.

This patch implements the base format, leaving the transaction payload
generic. Segment handling, writing and reading is implemented based on
an in-memory backend, which greatly simplifies testing.

As a notable deviation from the previous implementation, segments are
never implicitly trimmed. Instead, faulty commits are ignored if and
only if the next commit in the log sequence is valid and has the right
offset. On the write path, this entails closing the active segment when
an (I/O) error occurs, but retaining the commit in memory such that it
is written to the next segment.

Note that this patch does not define the final public API.
2024-04-02 06:18:30 +00:00