Commit Graph

18 Commits

Author SHA1 Message Date
Shubham Mishra eeaa00a05f Commitlog offset index (#1671)
Signed-off-by: Shubham Mishra <shubham@clockworklabs.io>
Co-authored-by: Kim Altintop <kim@eagain.io>
2024-09-24 16:06:49 +00:00
Kim Altintop 0029ca5648 commitlog: Make commit module public, and allow access to header fields (#1685) 2024-09-10 08:16:32 +00:00
Kim Altintop 8338b53b8f commitlog: Fix single-commit bitflip test (#1528) 2024-07-19 05:57:53 +00:00
Jeremie Pelletier f91dcda283 Make some commitlog helpers public (#1390) 2024-07-09 18:02:58 +00:00
Kim Altintop ff851ae5fa commitlog: Make bitflip test a proptest (#1333)
* commitlog: Make bitflip test a proptest

The test sometimes fails. As a proptest, we'll be able to seed it with
failing inputs.

Fixes: #1167

* commitlog: Fix the bitflip test

Turns out we sometimes flipped a bit in the CRC32 itself, which makes
things go wrong in not the expected way.
2024-06-05 05:53:41 +00:00
Phoebe Goldman 18aa1d4299 Fix commitlog fold_transactions_from ignoring requested offset (#1330)
* Fix commitlog `fold_transactions_from` ignoring requested offset

Prior to this commit, `fold_transactions_from` on a durability backed by a commitlog
would discard the requested offset and unconditionally yield all txes in the relevant segments.

This commit changes that behavior so that `fold_transactions_from`
skips commitlog commits (which contain many txes) less than the reqested offset,
and skips txes using `consume_record`.

* Add `Decoder::skip_record`

Lucky I asked Kim whether I was using `consume_record` and `decode_record` correctly,
because I wasn't.

This commit adds methods to `Decoder` and `Visitor` for skipping records and rows,
causing them to be extracted from the reader but not folded.

* Fix test

Add new methods to `Decoder` and `Visitor` hidden away in a test I missed.
2024-06-03 22:37:43 +00:00
Kim Altintop 2c3fc66f21 Commitlog: panic on fsync failure (#985)
* commitlog: Panic on fsync failure

Errors returned by `fsync(2)` are particularly nefarious, as it is
mostly undefined what the state of the page cache is in this case.

Since the log is synced asynchronously and not after every write, it is
impossible to know up to which commit data can be considered durable --
except by reading the most recent segment from disk.

Therefore, the reasonable thing to do is to prevent any further use of
the log, and force users to re-load it from disk.

Note that this is only half of the solution: an application restart may
still read data from the page cache, which could be gone after a system
restart.

To fix this, we would need to employ direct I/O (i.e. `O_DIRECT`), which
however is beyond the scope of this patch as it invalidates the use of
most of `std::io`.

* commitlog: Handle duplicate commits when iterating

We cannot exclude the possibility of a false failure in I/O operations.
In particular, `EIO` errors are difficult to attribute to a particular
write, as they happen asynchronously during flush of the page cache.

Because we do not bypass the page cache, the possibility exists that a
particular commit is lost when it isn't, or that it is considered
durable when it isn't. The former could lead to duplicate commits
appearing in the log, while the latter could lead to a matching offset
number, but with different commit payload.

This patch thus ignores duplicates, and introduces a new error variant
in the event the offset matches but the checksum doesn't.

* durability: Manage the flush-and-sync task in this crate

Since syncing the commitlog may now panic, it is more obvious to handle
all async tasks here, so as to be able to handle the panic cases.

Namely, if the `FlushAndSyncTask` panics, the `PersisterTask` is
aborted. This will lead to the channel receiver being dropped, which in
turn will cause the next `append_tx` call to panic.

* commitlog: Remove async flush-and-sync

Due to panic behaviour, it is now preferable to manage periodic sync at
the use site of the commitlog crate.

Hence remove `flush_and_sync_every` method, and with it the dependency
on tokio.
2024-05-28 18:22:38 +00:00
Kim Altintop 61613ca7a8 commitlog: Allow folds to not allocate Mutations values (#1215)
The documentation promised to not collect payload values during folds
(i.e. replaying), but the code did so anyway. This patch makes it so
only values required to satisfy the `Visitor` trait are allocated when
folding.
2024-05-13 09:35:09 +00:00
Kim Altintop 06d5481dfb commitlog: Support traversal without opening the log (#1103)
Traversing the commitlog without also making it available for writing
would still require upfront I/O imposed by the `open` constructor.

Avoid that by introducing free-standing functions which start traversal
right away.
2024-04-19 18:08:41 +00:00
Kim Altintop 2894d364fb core: Store inputs (reducer info + args) in commitlog (#1091)
Prerequisite for auto-disconnect after a database crash, requested for
analytics purposes.
2024-04-18 20:16:52 +00:00
Kim Altintop 4cd17d7e00 core: Don't persist empty transactions (#1086)
Fix a minor bug where completely empty transactions would still be
written to the commitlog. The bug is minor because, once we start
logging inputs, all transactions will be non-empty.

The check is done in relational DB rather than the durability crate,
because in principle empty transactions are permissible, and may be used
in the future (e.g. to confirm a certain offset).
2024-04-12 16:45:21 +00:00
Kim Altintop 838e8696ec core,commitlog: Re-instantiate commitlog disk usage reporting (#955)
Disk usage reporting was left unimplemented in previous patches of the
series, as its semantics are slightly different from before.

Namely, inspecting the size of the commitlog now requires to `stat(2)`
the segment files, and is thus fallible.

Also, a size reporting function is only defined for local durability
(i.e. the commitlog). The behaviour when the database is in a follower
state is left unspecified.
2024-04-12 08:49:34 +00:00
Kim Altintop 47048559b4 core: Integrate new commitlog + durability (#926)
This patch attempts to integrate the new commitlog with the minimum
changes.

Most of the diff comes from deletions of the legacy log and the need to
adjust tests due to the requirement for a tokio runtime when a durable
database is used in tests.

The "meat" of the patch are the `RelationalDB` constructors,
`RelationalDB::commit_tx`, and the replay logic in
`locking_tx_datastore`.

While `DataKey` is gone, there is still some redundant data being passed
around, which will be addressed in the follow-up patch.
2024-04-11 22:46:31 +00:00
Kim Altintop 02be002416 Durability: Traits and implementation in terms of commitlog (#922)
Defines traits intended to abstract over the kind of persistence a
database utilizes. The only implementation is (host-)local durability in
terms of the new commitlog crate.

The trait definitions may not be considered stable yet, but are in their
tentative form needed for further integration of the new commitlog.
2024-04-11 09:44:58 +00:00
Phoebe Goldman 8902b08bfc Drop commitlog logging to trace to avoid spamming host logs (#1073) 2024-04-10 15:26:54 +00:00
Kim Altintop 1d316d991e Commitlog: Add canonical txdata payload (#921)
Defines the canonical commitlog payload, and how to encode / decode it.

Also exposes folds alongside iterators, which allows the common case of
replaying the commitlog onto a database to be further optimized (the
`Txdata` does not have to be constructed in this case). This
optimization is, however, left for a future patch.
2024-04-02 09:54:19 +00:00
Kim Altintop 73cd78231e Commitlog: Add I/O based on regular files (#920)
Provides a commitlog backing store based on files, and defines the
exported `Commitlog` type which fixes the store to the file-based one.
2024-04-02 09:10:21 +00:00
Kim Altintop 3b343e4eb1 Commitlog: Base implementation "sans I/O" (#919)
First in a series of patches to implement the new commitlog format.

This patch implements the base format, leaving the transaction payload
generic. Segment handling, writing and reading is implemented based on
an in-memory backend, which greatly simplifies testing.

As a notable deviation from the previous implementation, segments are
never implicitly trimmed. Instead, faulty commits are ignored if and
only if the next commit in the log sequence is valid and has the right
offset. On the write path, this entails closing the active segment when
an (I/O) error occurs, but retaining the commit in memory such that it
is written to the next segment.

Note that this patch does not define the final public API.
2024-04-02 06:18:30 +00:00