64231 Commits

Author SHA1 Message Date
Alexander Korotkov 5cdec42319 Fix WAIT FOR LSN cleanup on subtransaction abort
WAIT FOR LSN registers the current backend in shared memory before entering an
interruptible wait loop.  Top-level abort and backend exit already call
WaitLSNCleanup(), but subtransaction abort did not.  If an interrupt, such as
statement_timeout, occurred while waiting inside a savepoint, rolling back to
the savepoint left the backend marked as present in the WAIT FOR LSN heap.

Clean up WAIT FOR LSN state from AbortSubTransaction() as well, and add
a TAP test covering reuse of WAIT FOR LSN after a savepoint rollback.

Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWXDRwo-RVRaQgwxVcXgURVFeX8BKnijQrPiPcSCkDDX9A%40mail.gmail.com
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-06 13:56:38 +03:00
Daniel Gustafsson 486b9a9b9e Fix regex searching for page verification failures in tests
The test for finding page verification failures in the logfiles
were missing the /m modifier to make sure it anchors to every
newline in the search space buffer, and not just the last one.

Spotted while adding a test for the recently reported issue with
excessive WAL for unlogged relations.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeGrpZbNZdLjd_T4b43xKEEXZN0HGhkFm-1bkBdyzK7AQ@mail.gmail.com
2026-05-06 12:38:15 +02:00
Daniel Gustafsson 9a39056c41 Apply data-checksum worker throttling parameters
The DataChecksumsWorker accepts cost_delay and cost_limit parameters
from pg_enable_data_checksums() so users can throttle the I/O caused
by enabling checksums.  Due to the API for setting the cost parameters
changing between when the code was written, and when it was committed
the new cost update function call was omitted and thus the parameters
were silently ignored.

Fix by calling VacuumUpdateCosts() after assigning the parameters
(both during worker startup and on the runtime cost-update path), and
by leaving the page-cost weights at their GUC-controlled defaults.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeevH6aTyWdXYBJW0wOmfoZy66gDi5TfinK_dXeCrHQLg@mail.gmail.com
2026-05-06 12:38:12 +02:00
Daniel Gustafsson 2018bd6167 Skip WAL for unlogged main fork during online checksum enable
ProcessSingleRelationFork() unconditionally generated an FPI WAL
record for every page of every relation when enabling checksums.
Unlogged relations, which by definition never generate WAL for
data changes, were not exempt which generated excessive WAL to
be emitted.

Fix by guarding the FPI WAL record call with RelationNeedsWAL()
to avoid emitting WAL for unlogged main forks.  Unlogged pages
are still dirtied to ensure the checksum is written to disk at
the next checkpoint.  The init fork remains WAL-logged even for
unlogged relations, as it's needed on the standby to materialize
the relation after promotion (see ResetUnloggedRelations()).
Skipping init-fork WAL would leave the standby with a stale init
fork that, once copied to the main fork on promotion, would fail
checksum verification on every read of the unlogged relation.

A test which creates an unlogged table with an index, enables
checksums, promotes the standby, and verifies that the unlogged
relation and its indexes are still readable post-promotion has
been added.

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeGrpZbNZdLjd_T4b43xKEEXZN0HGhkFm-1bkBdyzK7AQ@mail.gmail.com
2026-05-06 12:38:01 +02:00
Peter Eisentraut 43dc21f76f Document deprecated --wal-directory option for pg_verifybackup
Commit b3cf461b3c renamed --wal-directory to --wal-path but retained
the former as a silent alias.  Per project policy, all options,
including deprecated ones, should be documented to assist users
transitioning between versions.

This patch restores --wal-directory to the documentation and --help
output.

Author: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/E1w3fZp-000gje-31%40gemulon.postgresql.org
2026-05-06 10:45:42 +02:00
Álvaro Herrera a0a0c0c20e Skip other sessions' temp tables in REPACK, CLUSTER, and VACUUM FULL
get_tables_to_repack() and get_all_vacuum_rels() were including other
sessions' temporary tables in their output work list, causing REPACK,
CLUSTER and VACUUM FULL (when executed without a table list) to attempt
to acquire AccessExclusiveLock on them, potentially blocking for an
extended time.  Fix by skipping other-session temp tables early, before
they are added to the list.

This issue is ancient, but there have been no complaints about it that I
know of, so I'm opting for not backpatching at present.

Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: https://postgr.es/m/0b555318-2bf2-46df-9377-09629a2a59db@uni-muenster.de
2026-05-05 16:20:26 +02:00
John Naylor 6766264262 Add missing guard for __builtin_constant_p
Oversight in commit e2809e3a1. While at it, use pg_integer_constant_p
in master.

Discussion: https://postgr.es/m/CANWCAZbOha-x5MCreQn3TRA56VdKWNMAKMy3fAV1kJSw9Vp4pw@mail.gmail.com
Backpatch-through: 18
2026-05-05 18:51:07 +07:00
Etsuro Fujita 648818ba38 postgres_fdw: Fix handling of abort-cleanup-failed connections.
As connections that failed abort cleanup can't safely be further used,
if a remote query tries to get such a connection, we reject it.
Previously, this rejection involved dropping the connection if it was
open, without accounting for the possibility of open cursors using it,
causing a server crash when such an open cursor tried to use an
already-dropped connection, as a cursor-handling function
(create_cursor, fetch_more_data, or close_cursor) was called on a freed
PGconn.  To fix, delay dropping failed connections until abort cleanup
of the main transaction, to ensure open cursors using such a connection
can safely refer to the PGconn for it.

Oversight in commit 8bf58c0d9.

Reported-by: Zhibai Song <songzhibai1234@gmail.com>
Diagnosed-by: Zhibai Song <songzhibai1234@gmail.com>
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAPmGK176y6JP017-Cn%2BhS9CEJx_6iVhRoYbAqzuLU4d8-XPPNg%40mail.gmail.com
Backpatch-through: 14
2026-05-05 18:55:00 +09:00
Peter Eisentraut d0ed9ad8b0 doc: Clean up title case use 2026-05-05 11:24:16 +02:00
Peter Eisentraut 22f9207aaa Message style improvements (oauth related) 2026-05-05 10:39:13 +02:00
Álvaro Herrera eb2e2eb4d4 Don't lose column values on REPACK
Commit 28d534e2ae introduced reform_tuple() with a fast path that
returns the source tuple verbatim when no dropped columns require fixing
up.  I (Álvaro) failed to realize that this broke handling of columns
with a 'missingval' defined: after a VACUUM FULL, CLUSTER, or REPACK
operation, the catalogued missingval is thrown away, so the tuples are
no longer correct.

Fix by forcing the rewrite when the tuple is shorter than the tuple
descriptor.

Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeoccU5CudrJpmSKZfKZ1gRMNY=5BxSC=JpHgkonzgcOw@mail.gmail.com
2026-05-05 10:24:49 +02:00
Peter Eisentraut d0eac3cafb Make spelling consistent
"vertexes" -> "vertices"

Reported-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAJTYsWXFy1j_T82%2BM_S9kFxU414tQYnZQD-b82%3DoL_LbG_5fPQ%40mail.gmail.com
2026-05-05 09:36:54 +02:00
Peter Eisentraut 1190f858ea doc: Small synopsis wording change for consistency 2026-05-05 09:27:32 +02:00
Richard Guo 574581b50a Consider collation when proving subquery uniqueness
rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality
operator from each join clause to query_is_distinct_for(), discarding
the operator's input collation.  query_is_distinct_for() then verified
opfamily compatibility but never checked collations, so a DISTINCT /
GROUP BY / set-op operating under one collation was trusted to prove
uniqueness for a comparison performed under an unrelated collation.
As with the recent fix in relation_has_unique_index_for(), this is
unsound for nondeterministic collations and yields wrong query results
in any optimization that consumes the proof.

Fix by carrying each clause's operator input collation into
query_is_distinct_for() and validating it at every check-site against
the subquery target expression's collation.

Back-patch to all supported branches.  query_is_distinct_for() is
declared in an installed header, so on stable branches the existing
two-list signature is retained as a thin wrapper that forwards to a
new collation-aware entry point; external callers continue to receive
the historical collation-blind answer.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
2026-05-05 10:23:31 +09:00
Richard Guo 5a55ea507a Consider collation when proving uniqueness from unique indexes
relation_has_unique_index_for() has long had an XXX noting that it
doesn't check collations when matching a unique index's columns
against equality clauses.  This was benign as long as all collations
in play reduced to the same notion of equality, but has been incorrect
since nondeterministic collations were introduced in PG 12: a unique
index under a deterministic collation does not prove uniqueness under
a nondeterministic collation, nor vice versa.

The consequence is wrong query results for any planner optimization
that consumes the faulty proof, including inner-unique join execution
(which stops the inner search after the first match per outer row),
useless-left-join removal, semijoin-to-innerjoin reduction, and
self-join elimination.

Fix by requiring the index's collation to agree on equality with the
clause's input collation.  Two collations agree on equality if either
is InvalidOid (denoting a non-collation-sensitive operation, which
cannot conflict with the other side), if they have the same OID, or if
both are deterministic: by definition a deterministic collation treats
two strings as equal iff they are byte-wise equal (see CREATE
COLLATION), so any two deterministic collations share the same
equality relation and the uniqueness proof carries over.  Any mismatch
involving a nondeterministic collation is rejected.

Back-patch to all supported branches; the bug has existed since
nondeterministic collations were introduced in PG 12.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com
Backpatch-through: 14
2026-05-05 10:22:53 +09:00
Tom Lane 93da297366 Declare load_hosts() as returning HostsFileLoadResult.
This function returns some value of enum HostsFileLoadResult,
but for reasons lost in the development process was declared to
return "int".  Fix that, for clarity and so that our typedefs
collection tooling sees the typedef as used.  Also fix the
variable that the sole call assigns into.  Move the typedef
to the header file that declares load_hosts() to avoid creating
header dependency problems.

Discussion: https://postgr.es/m/359138.1777922557@sss.pgh.pa.us
2026-05-04 18:33:06 -04:00
Peter Eisentraut f6edd8ed70 Add ORDER BY to test query to stabilize test
for commit dc9e7c9ed9
2026-05-04 20:59:16 +02:00
Álvaro Herrera b5f92b8eb4 Fix off-by-one in repack index loop
A blunder of mine (Álvaro) in commit 28d534e2ae.

Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Xiaopeng Wang <wxp_728@163.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M9ytFufvD8Tm0rhpfxuC4XrpgQDBHxM7NJQYxv488JW7w@mail.gmail.com
2026-05-04 20:01:19 +02:00
Peter Eisentraut dc9e7c9ed9 Handle nodes that may appear in GraphPattern expression trees
expression_tree_mutator_impl() did not handle T_GraphPattern,
T_GraphElementPattern, and T_GraphPropertyRef.  The corresponding
expression_tree_walker_impl() already handles all three node types.
This causes an "unrecognized node type" error whenever a GRAPH_TABLE
appeared in an expression tree.

While at it, also update raw_expression_tree_walker() and
expression_tree_walker() to handle missing nodes that may appear in
GraphPattern expression trees.  When raw_expression_tree_walker() is
called, GraphElementPattern::labelexpr contains ColumnRefs instead of
GraphLabelRefs.  Hence those are not handled in
raw_expression_tree_walker().

Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDc97WFTSkXg%3Dg_ZAH8GnY2gJrvq72cs%2BYjqEAuZgXnkAQ%40mail.gmail.com
2026-05-04 17:34:32 +02:00
Peter Eisentraut 891a57c739 Do not define type for a property graph
Even though a property graph is defined in pg_class it does not
contain any rows by itself and need not have a type defined. Avoid
creating a type for it.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ucu7ZTgYkO6rB_1ShJP3e%3DGAT2T3CP4XWN8rUVEsiJoA%40mail.gmail.com
2026-05-04 15:45:56 +02:00
Peter Eisentraut abff4492d0 Fix options listing of pg_restore --no-globals
The new pg_restore option --no-globals (commit 3c19983cc0) appeared
out of order in the documentation and help output.  Fix that.
2026-05-04 12:00:22 +02:00
Peter Eisentraut b83a94a73b Add missing serial commas 2026-05-04 11:53:04 +02:00
Peter Eisentraut 2fcc8aaeb2 doc: Fix up spacing around verbatim DocBook elements 2026-05-04 09:45:40 +02:00
Amit Kapila bf3ead6075 Simplify translatable messages for tuple value details in conflict.c.
append_tuple_value_detail() constructed user-visible messages using
separately translated fragments such as ": ", ", ", and ".",. This
makes correct translation difficult or impossible in some languages.

Refactor append_tuple_value_detail() to move all punctuation and
sentence construction to the callers, which now use a single
translatable string with a %s placeholder for the tuple data.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/227279.1775956328%40sss.pgh.pa.us#8f3a5f50543556c60cc5a13270cb7ba4
Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com
2026-05-04 12:06:41 +05:30
Alexander Korotkov c06d1a4ba6 Mark modified the FSM buffer as dirty during recovery
The XLogRecordPageWithFreeSpace function updates the freespace map (FSM) data
while replaying data-level WAL records during the recovery. If the FSM block
is updated, it needs to be marked as modified. Currently, this is done with
the MarkBufferDirtyHint call (as in all other cases for modifying FSM data).
However, in the recovery context, this function will actually do nothing if
checksums are enabled. It's assumed that the page should not be dirtied
during recovery while modifying hints to protect against torn pages, since no
new WAL data can be generated at this point to store FPI.

Such logic does not seem fully aligned with the FSM case, as its blocks could
be simply zeroed if a checksum mismatch is detected. Currently, changes to an
FSM block could be lost if each change to that block occurs infrequently
enough to allow it to be evicted from the cache. To persist the change, the
modification needs to be performed while the FSM block is still kept in
buffers and marked as dirty after receiving its FPI. If the block has already
been cleaned, the change won't be persisted, so stored FSM blocks may remain
in an obsolete state.

If a large number of discrepancies between the data in leaf FSM blocks and the
actual data blocks accumulate on the replica server, this could cause
significant delays in insert operations after switchover. Such an insert
operation may need to visit many data blocks marked as having sufficient
space in the FSM, only to discover that the information is incorrect and the
FSM records need to be corrected. In a heavily trafficked insert-only table
with many concurrent clients performing inserts, this has been observed to
cause several-second stalls, causing visible application malfunction. The
desire to avoid such cases was the reason behind the commit ab7dbd681, which
introduced an update of FSM data during the heap_xlog_visible invocation.
However, an update to the FSM data on the standby side could be lost due to a
missing 'dirty' flag, so there is still a possibility that a large number of
FSM records will contain incorrect data. Note that having a zeroed FSM page
in such a case (due to a checksum mismatch) is preferable, as a zero value
will be interpreted as an indication of full data blocks, and the inserter
will be routed to the next FSM block or to the end of the table.

Given that FSM is ready to handle torn page writes and
XLogRecordPageWithFreeSpace is called only during the recovery, there seems
to be no reason to use MarkBufferDirtyHint here instead of a regular
MarkBufferDirty call.

Discussion: https://postgr.es/m/596c4f1c-f966-4512-b9c9-dd8fbcaf0928%40postgrespro.ru
Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 20:23:50 +03:00
Alexander Korotkov 21d290161b Document that WAIT FOR LSN is timeline-blind
WAIT FOR LSN compares only the numeric LSN and has no notion of which
timeline a WAL record belongs to.  There are many possible scenarios when
timeline-switching can break read-your-writes consistency.  The proper
analysis and timeline support is possible in the next major release.  Yet
just document the current behaviour.

Reported-by: Xuneng Zhou <xunengzhou@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov cb096e6d69 Improve WAIT FOR LSN test coverage
Add regression coverage for several WAIT FOR LSN edge cases.

First, cover fresh walreceiver shared-memory initialization after a
standby restart.  Restart the standby while its upstream is down, so
RequestXLogStreaming() seeds writtenUpto/flushedUpto to the
segment-aligned receiveStart and the walreceiver cannot immediately
advance them.  Verify that the seeded flush position is segment-aligned,
that replay can be ahead of it, and that standby_write/standby_flush
still succeed for an already-replayed LSN via the replay-position floor
in GetCurrentLSNForWaitType().

Second, add fencepost checks for the target <= currentLSN predicate.
With replay paused and walreceiver stopped, verify exact boundaries for
standby_replay using pg_last_wal_replay_lsn(), and for standby_flush
using pg_last_wal_receive_lsn().  Also verify that a waiter for
current + 1 sleeps while replay is paused and wakes with success once
new WAL is delivered and replay advances.

Finally, add a cascading-standby timeline-switch test.  Start a waiter
on the downstream standby, promote its upstream, generate WAL on the new
timeline, and verify that the cascade follows the new timeline and the
wait completes successfully once replay reaches the target LSN.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov e7cd592174 Wake standby_write/standby_flush waiters from the WAL replay loop
The startup process only woke STANDBY_REPLAY waiters after replaying
each WAL record. STANDBY_WRITE and STANDBY_FLUSH waiters depended only
on walreceiver write/flush callbacks. As a result, replay progress alone
did not wake those waiters, and in pure archive recovery (where no
walreceiver exists) they could sleep until timeout.

Fix by also calling WaitLSNWakeup() for STANDBY_WRITE and
STANDBY_FLUSH after each replay. For the replay-floor semantics used by
GetCurrentLSNForWaitType(), replay progress is a valid lower bound for
both modes: WAL cannot be replayed unless it has already been written
and flushed locally.

This works together with the replay-position floor in
GetCurrentLSNForWaitType(). The getter ensures that a waiter woken by
replay can recheck successfully; the replay-side wakeups ensure that a
waiter already asleep is notified when replay reaches its target.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov cba67b5b87 Use replay position as floor for WAIT FOR LSN standby_(write|flush)
GetCurrentLSNForWaitType() for standby_write and standby_flush modes
returned only the walreceiver position, which may lag behind WAL
already present on the standby from a base backup, archive restore,
or prior streaming.  This could cause unnecessary blocking if the
target LSN falls between the walreceiver's tracked position and the
replay position.

Fix by returning the maximum of the walreceiver position and the
replay position.  WAL up to the replay point is physically on disk
regardless of its origin, so there is no reason to wait for the
walreceiver to re-receive it.

This complements 29e7dbf5e4, which seeded writtenUpto to
receiveStart in RequestXLogStreaming() to fix the most common
hang scenario.  The getter-level floor handles the remaining edge
cases: targets between receiveStart and the replay position, and
standbys running with archive recovery only (no walreceiver).

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov df9f938ca2 Remove redundant WAIT FOR LSN caller-side pre-checks
All five wakeup call sites duplicate WaitLSNWakeup()'s internal
fast-path minWaitedLSN check and add an unnecessary NULL check
on waitLSNState.

Remove the inline pre-checks and call WaitLSNWakeup() directly.
The fast-path check inside WaitLSNWakeup() already returns early
when no waiter's target has been reached, so there is no
performance difference.

The waitLSNState NULL checks are also unnecessary: shared memory
is fully initialized before any backend or auxiliary process
starts, so waitLSNState is always non-NULL at these call sites.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/jzq5shdewncpxc35r3s2mcfsmo4bjovkza5mnqf5bdfumhfi3g%40bglckf7dxmw5
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov a80a593ab6 Fix memory ordering in WAIT FOR LSN wakeup mechanism
WAIT FOR LSN uses a Dekker-style handshake: the waker stores an LSN
position then reads minWaitedLSN; the waiter stores its target into
minWaitedLSN then reads the position.  Without a barrier between each
side's store and load, a CPU may satisfy the load before the store
becomes globally visible, causing either side to miss a concurrent
update.  The result is a missed wakeup: the waiter sleeps indefinitely
until the next unrelated event.

Fix by embedding the required barriers into the atomic operations on
minWaitedLSN:

- In updateMinWaitedLSN(), use pg_atomic_write_membarrier_u64() so the
  waiter's preceding heap update is visible before the new minWaitedLSN
  value is published.

- In WaitLSNWakeup(), use pg_atomic_read_membarrier_u64() in the
  fast-path check so the waker's preceding position store is globally
  visible before minWaitedLSN is read.

The waiter side is also covered by the barrier semantics already present
in GetCurrentLSNForWaitType(): GetWalRcvWriteRecPtr() uses an explicit
read barrier (from patch 0001), while the remaining getters acquire a
spinlock, which implies the same ordering.

Also call ResetLatch() unconditionally after WaitLatch(), following the
standard latch loop pattern.  WaitLatch() does not guarantee that all
simultaneously true wake conditions are reported in one return, so a
timeout can race with SetLatch().  If we skip ResetLatch() on a timeout
return, the code performs further asynchronous-state checks before
consuming the latch, violating the latch API's required wait/reset
pattern.  That can leave the latch set across loop exit and cause a
later unrelated WaitLatch() in the same backend to return immediately.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Alexander Korotkov dfb690dd52 Use barrier semantics when reading/writing writtenUpto
The walreceiver publishes its write position lock-free via writtenUpto.
On weakly-ordered architectures (ARM, PowerPC), both sides of this
handshake need explicit barriers so that the lock-less reader sees a
consistent state.

Use pg_atomic_write_membarrier_u64() at both write sites and
pg_atomic_read_membarrier_u64() in GetWalRcvWriteRecPtr().  This matches
the barrier semantics that GetWalRcvFlushRecPtr() and other LSN-position
functions get implicitly from their spinlock acquire/release, and
protects from bugs caused by expectations of similar barrier guarantees
from different LSN-position functions.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2026-05-03 16:22:02 +03:00
Andrew Dunstan c34a280c85 Add missing connection validation in ECPG
ECPGdeallocate_all(), ECPGprepared_statement(), ECPGget_desc(), and
ecpg_freeStmtCacheEntry() could crash with a SIGSEGV when called
without an established connection (for example, when EXEC SQL CONNECT
was forgotten or a non-existent connection name was used), because
they dereferenced the result of ecpg_get_connection() without first
checking it for NULL.

Each site is fixed in the style of the surrounding code.

New tests are added for these conditions.

Author: Shruthi Gowda <gowdashru@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Nishant Sharma <nishant.sharma@enterprisedb.com>
Discussion: https://postgr.es/m/3007317.1765210195@sss.pgh.pa.us
Backpatch-through: 14
2026-05-01 15:12:28 -04:00
Andrew Dunstan b772f3fcad Only show signal-sender PID/UID detail in server log
The errdetail() added in 55890a9194 (and reworked in 3e2a1496ba)
exposed the operating-system PID and UID of whoever sent the
termination signal directly to the affected client.

Discussion suggested this should not be sent to the client, but only
recorded in the server log where the admin can use it for diagnosis.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/E5CA274C-74BD-4067-8B73-A3AD8C080EFA@gmail.com
2026-05-01 13:20:08 -04:00
Amit Kapila f67dbd8398 Fix BF failure introduced in commit 2bf6c9ff71.
The sequence subscription test switches regress_seq_sub to connect to the
publisher as regress_seq_repl (a non-superuser) when checking behavior
with insufficient sequence privileges but forgot to set up pg_hba.conf to
allow connections from it. The special setup is only needed on Windows
machines that don't use UNIX sockets.

As per buildfarm.

Reported-by: Ajin Cherian <itsajin@gmail.com>
Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDad911HUMkHgD1KZk+WOvTopiBcYf4C_8Fqj1-sZk3xgw@mail.gmail.com
2026-05-01 14:35:26 +05:30
Michael Paquier 0916282a06 doc: Mention validation attempt during ALTER INDEX .. ATTACH PARTITION
Since 9d3e094f12, the command tries to validate the parent index of the
named index, if invalid.  The documentation did not mention this
behavior, which could be confusing.

Author: Mohamed ALi <moali.pg@gmail.com>
Discussion: https://postgr.es/m/CAGnOmWpHu25_LpT=zv7KtetQhqV1QEZzFYLd_TDyOLu1Od9fpw@mail.gmail.com
Backpatch-through: 14
2026-05-01 13:10:35 +09:00
Fujii Masao c0b24b32b0 Avoid blocking indefinitely while finishing walsender shutdown
When walsender finishes streaming during shutdown, it sends a
CommandComplete message to tell the receiver that WAL streaming is done.
Previously, that path used EndCommand() followed by pq_flush().

Those functions can block indefinitely waiting for the socket to become
writeable. As a result, even when wal_sender_shutdown_timeout is set,
walsender could remain stuck while sending the final completion message,
and the shutdown timeout would not be enforced.

Fix this by introducing EndCommandExtended(), which allows
CommandComplete to be queued with pq_putmessage_noblock(), and by
using the walsender nonblocking flush path instead of pq_flush(), so
the shutdown timeout continues to be checked while pending output is
flushed.

Per CI testing on FreeBSD.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/vwlugmsogfn36jhm56zwrgd7m6xe6ircltvfh3kzt6kldvbtht@f45dgow5uhnx
2026-05-01 12:12:44 +09:00
Richard Guo f76686ce7f Fix HAVING-to-WHERE pushdown with nondeterministic collations
When GROUP BY uses a nondeterministic collation, the planner's
optimization of moving HAVING clauses to WHERE can produce incorrect
query results.  The HAVING clause may apply a stricter collation that
distinguishes values the GROUP BY considers equal.  Pushing such a
clause to WHERE causes it to filter individual rows before grouping,
potentially eliminating group members and changing aggregate results.

Fix this by detecting collation conflicts before flatten_group_exprs,
while the HAVING clause still contains GROUP Vars (Vars referencing
RTE_GROUP).  At that point, each GROUP Var directly carries the GROUP
BY collation as its varcollid, making it straightforward to compare
against the operator's inputcollid.  A mismatch where the GROUP BY
collation is nondeterministic means the clause is unsafe to push down.
RowCompareExpr is treated specially, since it carries per-column
inputcollids[] rather than a single inputcollid.

The conflicting clause indices are recorded in a Bitmapset and
consulted during the existing HAVING-to-WHERE loop, so that only
affected clauses are kept in HAVING; other safe clauses in the same
query are still pushed.

Back-patch to v18 only.  The fix relies on the RTE_GROUP mechanism
introduced in v18 (commit 247dea89f), which is what lets us identify
grouping expressions and their resolved collations via GROUP Vars on
pre-flatten havingQual.  Pre-v18 branches lack that machinery, so a
back-patch there would need a different approach.  Given the absence
of field reports of this bug on back branches, the risk of carrying a
different fix on stable branches is not justified.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48Dn2wW6XM94GZsoyMiH42=KgMo+WcobPKuWvGYnWaPOQ@mail.gmail.com
Backpatch-through: 18
2026-05-01 11:13:50 +09:00
Amit Langote 410013d2a5 Use "concurrent delete" in serialization error for TM_Deleted cases
In ExecLockRows() and ri_LockPKTuple(), the TM_Deleted code path was
using the same "could not serialize access due to concurrent update"
message as the TM_Updated path.  Use "concurrent delete" instead, since
the tuple was deleted, not updated.  The ExecLockRows() instance was
likely a copy-paste error per Andres; the ri_LockPKTuple() instance
was carried over from the same pattern in commit 2da86c1ef9.

Update affected isolation test expected files accordingly and add
a new test to fk-concurrent-pk-upd.spec with concurrent delete of the
PK row.

The ExecLockRows() change is master-only for lack of user complaints
and to avoid breaking anything that might match on the error text.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CACJufxEG1JTCq4A1gnNAu-bGAq9Xn=Xkf7kC3TRWFz6iuUOuRA@mail.gmail.com
2026-05-01 10:00:29 +09:00
Richard Guo 8d829f5a02 Fix JSON_ARRAY(query) empty set handling and view deparsing
According to the SQL/JSON standard, JSON_ARRAY(query) must return an
empty JSON array ('[]') when the subquery returns zero rows.

Previously, the parser rewrote JSON_ARRAY(query) into a JSON_ARRAYAGG
aggregate function.  Because this aggregate evaluates to NULL over an
empty set without a GROUP BY clause, the constructor erroneously
returned NULL.  Additionally, this premature rewrite baked physical
implementation details into the catalog, preventing ruleutils.c from
deparsing the original syntax for views.

This patch resolves both issues by introducing a new
JSCTOR_JSON_ARRAY_QUERY constructor type.  The parser builds the
executable form --- a COALESCE-wrapped JSON_ARRAYAGG subquery --- from
raw parse nodes via transformExprRecurse, and stores it in the func
field.  The original transformed Query is kept in a new orig_query
field so that ruleutils.c can deparse the original syntax for views.
During planning, eval_const_expressions replaces the node with the
pre-built func expression.

The deparsing issue was reported by Tom Lane.

Bump catalog version.

Bug: #19418
Reported-by: Lukas Eder <lukas.eder@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19418-591ba1f29862ef5b@postgresql.org
2026-05-01 09:42:00 +09:00
Álvaro Herrera 6ca631b990 REPACK CONCURRENTLY: fix processing of toasted tuples
In order to process tuples inserted or updated while REPACK executes, we
write those tuples to disk and later restore them; however, some forms
of toasted tuples were not being processed correctly.  Fix that.

Also expand the tests a bit for better coverage.

Author: Satya Narlapuram <satyanarlapuram@gmail.com>
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDeXb9HM2VGKXQedyCp52GzajJK5KOUdNi6oLjsS0nerQw@mail.gmail.com
2026-04-30 23:32:57 +02:00
Álvaro Herrera 2fd787d0aa Remove working test that was supposed to fail
I evidently failed to review the expected output in commit 832e220d99
carefully enough.  Per complaint from Tom Lane.

Discussion: https://postgr.es/m/769631.1777575242@sss.pgh.pa.us
2026-04-30 22:57:24 +02:00
Andrew Dunstan 6cf49e804c Fix attnum remapping in generateClonedExtStatsStmt()
When cloning extended statistics via CREATE TABLE ... LIKE ... INCLUDING
STATISTICS, stxkeys holds attribute numbers from the source (parent)
table, but get_attname() was being called with the child relation's
OID.  If the parent has dropped columns, the child's attribute numbers
are renumbered sequentially and no longer match, so the lookup either
returns the wrong column name (silent corruption) or errors out when
the attnum does not exist in the child.

Fix it by remapping the parent attnum through attmap before the lookup,
consistent with how expression statistics are already handled a few
lines below.

Add a regression test covering both manifestations: a 3-column parent
where the stale attnum refers to no child column (cache-lookup error),
and a 4-column parent where the stale attnum silently refers to the
wrong child column.

Author: Julien Tachoires <julmon@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://postgr.es/m/20260415105718.tomuncfbmlt67oel@poseidon.home.virt
Backpatch-through: 14
2026-04-30 11:04:57 -04:00
Andrew Dunstan 5642a0367c Avoid SIGSEGV in pg_get_database_ddl() on NULL tablespace
There is a narrow race in which a concurrent ALTER DATABASE ... SET
TABLESPACE moves the database off the tablespace and a DROP TABLESPACE
removes it between the syscache lookup and the catalog scan. If that
happens, output an error.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Jack Bonatakis <jack@bonatak.is>
Reviewed-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/573E45C1-31A4-4885-A00C-1A2171159A2A@gmail.com
2026-04-30 10:14:52 -04:00
Daniel Gustafsson 75152c5dc5 Fix data_checksum GUC show_hook
Commit f19c0eccae erroneously omitted the show_hook for the
data_checksum GUC.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:57 +02:00
Daniel Gustafsson 1df361e3d8 Improve database detection logic in datachecksumsworker
The worker need to know whether a database which failed checksum
processing still exists, or has been dropped.  This improves the
detection logic by checking for being partially dropped.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:55 +02:00
Daniel Gustafsson bf25e5571b Improve handling of concurrent checksum requests
When pg_{enable|disable}_data_checksums is called while checksums are
being enabled or disabled, the already running launcher is detected
and the new desired state is recorded.  Processing will then pick up
the new state and change its operation to fulfill the new request.
If the same state is requested but with different cost values, the
new cost values will take effect on the next relation processed.

The previous coding had a complex logic of starting a new launcher
for this, which is now avoided with the shared mem structure instead
used to signal current processing.

This makes the logic more robust, and fixes a bug where the launcher
would erroneously revert back to the "off" state.

Access to the shared memory is also protected with LWLocks in all
cases.  Since the shmem structure is used for signalling between
the worker and the launcher, and there can be only one of each,
there were no concurrency issues detected but it's better to stick
to proper locking protocol should this ever be updated to handle
multiple workers.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:53 +02:00
Daniel Gustafsson 381d19da15 Typo and spelling fixups for online checksums
A collection of spelling, wording and punctuation fixups for the code
documentation from postcommit review.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:50 +02:00
Daniel Gustafsson 25b922ec58 Fix invalid checksum state transition in checkpoints
Commit 78e950cb8 added checksum state handling to all XLOG_CHECKPOINT
records which caused unnecessary state transitions and emission of
procsignal barriers.  Remove as only the _REDO record need to handle
checksum state.  Barrier emission is also consistently made after
controlfile updates to avoid race conditions.

Additionally, interrupts are held between calling ProcSignalInit and
InitLocalDataChecksumState to remove a window where otherwise invalid
state transitions can happen.

Also remove a pointless assertion on Controlfile which will never hit.

Author: Tomas Vondra <tomas@vondra.me>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:48 +02:00
Daniel Gustafsson 8fb8ded889 Handle data_checksum state changes during launcher_exit
When erroring out from the datachecksums launcher during data checksum
enabling, before state has transitioned to "on", we revert back to the
"off" state.  Since checksums weren't enabled, there is no use staying
in an inprogress state since the checksum launcher currently doesn't
support restarting from where it left off.  Should restartability get
added in the future, this would need to be revisited.  This state
transition was however missing from the allowed transitions in the
statemachine causing an error.

Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/9197F930-DDEB-4CAC-82A2-16FEC715CCE8@yesql.se
2026-04-30 13:41:46 +02:00