Skip to content

Commit 65108c7

Browse files
x4mrobot-cloud-aw
authored andcommitted
Fix two bugs in archive_mode=shared on standby
1. Checkpoint on standby deletes WAL with .ready status. XLogArchiveCheckDone() treated archive_mode=shared like archive_mode=on during recovery, returning true unconditionally and allowing checkpoint to remove WAL segments that the primary had not yet archived. Fix: exclude shared mode from the early-return path, same as "always". 2. Walsender never sends archival status reports after archiving is restored. WalSndArchivalReport() calls pgstat_fetch_stat_archiver() whose result is cached per-session (PGSTAT_FETCH_CONSISTENCY_CACHE by default). The walsender has no transaction boundaries that would clear the cache, so last_archived_wal remained "" forever, and strcmp() suppressed all reports. Fix: call pgstat_clear_snapshot() before fetching archiver stats. Add TAP tests in 051_archive_shared_checkpoint.pl that reproduce both bugs, and extend 050_archive_shared.pl with checkpoint/restore scenarios. Reviewed-by: reshke <reshke@double.cloud>
1 parent f401062 commit 65108c7

4 files changed

Lines changed: 365 additions & 7 deletions

File tree

src/backend/access/transam/xlogarchive.c

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -573,16 +573,22 @@ XLogArchiveCheckDone(const char *xlog)
573573

574574
/*
575575
* During archive recovery, the file is deletable if archive_mode is not
576-
* "always".
576+
* "always" or "shared".
577+
*
578+
* In "shared" mode the standby does not archive independently; instead it
579+
* waits for the primary to report successful archival, at which point the
580+
* walreceiver converts the .ready file to .done. We must therefore fall
581+
* through to the .done/.ready check below so that checkpoint cannot
582+
* delete a segment whose .ready file has not yet become .done.
577583
*/
578-
if (!XLogArchivingAlways() &&
584+
if (!XLogArchivingAlways() && !EffectiveArchiveModeIsShared() &&
579585
GetRecoveryState() == RECOVERY_STATE_ARCHIVE)
580586
return true;
581587

582588
/*
583589
* At this point of the logic, note that we are either a primary with
584-
* archive_mode set to "on" or "always", or a standby with archive_mode
585-
* set to "always".
590+
* archive_mode set to "on" or "always", a standby with archive_mode set
591+
* to "always", or a standby with archive_mode set to "shared".
586592
*/
587593

588594
/* First check for .done --- this means archiver is done with it */

src/backend/replication/walsender.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2810,10 +2810,13 @@ WalSndArchivalReport(void)
28102810
return;
28112811
last_archival_report_timestamp = now;
28122812
/*
2813-
* Get archiver statistics. We use non-blocking access to avoid delaying
2814-
* replication if stats collector is slow. If stats are unavailable or
2815-
* stale, we'll just try again at the next interval.
2813+
* Get archiver statistics. The pgstat snapshot is cached per-session and
2814+
* is only invalidated at transaction boundaries. The walsender runs
2815+
* without transaction boundaries, so we must clear the snapshot explicitly
2816+
* to avoid reading stale data (e.g. last_archived_wal stuck at its initial
2817+
* empty value even after the archiver has archived new segments).
28162818
*/
2819+
pgstat_clear_snapshot();
28172820
archiver_stats = pgstat_fetch_stat_archiver();
28182821
if (archiver_stats == NULL)
28192822
return;

src/test/recovery/t/050_archive_shared.pl

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,4 +267,142 @@
267267
ok($standby2_count >= 3500, "standby2 has all data (got $standby2_count rows)");
268268
ok($standby3_count >= 3500, "standby3 has all data (got $standby3_count rows)");
269269

270+
###############################################################################
271+
# Test 5: checkpoint on standby must NOT delete WAL that has .ready status
272+
#
273+
# In archive_mode=shared, the standby relies on archival reports from the
274+
# primary to know when a segment is safe to delete. Segments not yet
275+
# confirmed as archived have .ready files. A checkpoint (CreateRestartPoint)
276+
# must not remove those WAL files because they may be needed for recovery
277+
# after a standby promotion if the primary never archived them.
278+
#
279+
# Root cause: XLogArchiveCheckDone() treats archive_mode=shared the same as
280+
# archive_mode=on during recovery, bypassing the .ready/.done check.
281+
###############################################################################
282+
283+
note("Test 5: checkpoint must not delete WAL with .ready on standby");
284+
285+
my $archive_dir5 = PostgreSQL::Test::Utils::tempdir();
286+
my $primary5 = PostgreSQL::Test::Cluster->new('primary5');
287+
$primary5->init(has_archiving => 1, allows_streaming => 1);
288+
$primary5->append_conf(
289+
'postgresql.conf', qq{
290+
archive_mode = shared
291+
archive_command = 'cp %p "$archive_dir5/%f"'
292+
});
293+
$primary5->start;
294+
$primary5->safe_psql('postgres', 'CREATE TABLE t5 (i int);');
295+
296+
# Ensure WAL activity exists in the current segment before switching.
297+
# pg_switch_wal() is a no-op when called at the very start of a segment,
298+
# so we write a row first to guarantee there is WAL to switch away from.
299+
$primary5->safe_psql('postgres', 'INSERT INTO t5 VALUES (0);');
300+
$primary5->safe_psql('postgres', 'SELECT pg_switch_wal();');
301+
302+
# Wait for archiver to archive the switched segment
303+
$primary5->poll_query_until('postgres',
304+
'SELECT archived_count > 0 FROM pg_stat_archiver')
305+
or die "primary5: archiver did not start";
306+
307+
# Create standby without wal_keep_size so checkpoint is free to recycle segments
308+
# backup() returns an empty list (bare "return"), so the backup name must be
309+
# stored separately before passing it to init_from_backup.
310+
$primary5->backup('backup5');
311+
my $standby5 = PostgreSQL::Test::Cluster->new('standby5');
312+
$standby5->init_from_backup($primary5, 'backup5', has_streaming => 1);
313+
$standby5->append_conf(
314+
'postgresql.conf', qq{
315+
archive_mode = shared
316+
archive_command = 'cp %p "$archive_dir5/%f"'
317+
wal_receiver_status_interval = 1s
318+
});
319+
$standby5->start;
320+
$primary5->wait_for_catchup($standby5);
321+
322+
# Break archiving on primary: new segments received by standby will get .ready
323+
$primary5->adjust_conf('postgresql.conf', 'archive_command', "'/bin/false'");
324+
$primary5->reload;
325+
326+
# Generate several complete WAL segments. After the standby replays all of
327+
# them its redo pointer is well past the first few, making those candidates
328+
# for checkpoint removal.
329+
for (1 .. 6)
330+
{
331+
$primary5->safe_psql('postgres',
332+
'INSERT INTO t5 SELECT generate_series(1,1000);');
333+
$primary5->safe_psql('postgres', 'SELECT pg_switch_wal();');
334+
}
335+
$primary5->wait_for_catchup($standby5);
336+
337+
# Collect every WAL segment that has a .ready file on the standby
338+
my $status_dir5 = $standby5->data_dir . '/pg_wal/archive_status';
339+
my @ready5;
340+
if (opendir(my $dh, $status_dir5))
341+
{
342+
@ready5 = map { s/\.ready$//r } grep { /\.ready$/ } readdir($dh);
343+
closedir($dh);
344+
}
345+
my $n_ready5 = scalar @ready5;
346+
note("Before checkpoint: $n_ready5 WAL files with .ready");
347+
cmp_ok($n_ready5, '>', 0, "standby has .ready WAL files before checkpoint");
348+
349+
# Trigger CreateRestartPoint (the standby equivalent of CHECKPOINT).
350+
# It must not remove WAL files that carry a .ready status.
351+
$standby5->safe_psql('postgres', 'CHECKPOINT');
352+
353+
my $wal_dir5 = $standby5->data_dir . '/pg_wal';
354+
my $deleted5 = 0;
355+
for my $f (@ready5)
356+
{
357+
unless (-f "$wal_dir5/$f")
358+
{
359+
$deleted5++;
360+
diag("BUG: $f had .ready but checkpoint deleted it from standby");
361+
}
362+
}
363+
is($deleted5, 0,
364+
"checkpoint does not delete WAL with .ready (not yet archived by primary)");
365+
366+
###############################################################################
367+
# Test 6: after archiving is restored on primary, standby .ready -> .done
368+
#
369+
# When archive_command is broken for a while and then fixed, the primary will
370+
# archive the previously-failed segments. The walsender sends an archival
371+
# status report to the standby which then converts .ready to .done.
372+
# This verifies the end-to-end recovery of the mechanism after an outage.
373+
###############################################################################
374+
375+
note("Test 6: .ready files become .done after archiving restored on primary");
376+
377+
# Capture archived_count before restoring so we can detect new archival
378+
my $archived_before5 =
379+
$primary5->safe_psql('postgres', 'SELECT archived_count FROM pg_stat_archiver');
380+
381+
# Restore archiving
382+
$primary5->adjust_conf('postgresql.conf', 'archive_command',
383+
qq{'cp %p "$archive_dir5/%f"'});
384+
$primary5->reload;
385+
386+
# Wait for primary to archive the segments that failed during the outage
387+
$primary5->poll_query_until('postgres',
388+
"SELECT archived_count > $archived_before5 FROM pg_stat_archiver")
389+
or die "primary5: archiver did not catch up after archive_command restored";
390+
391+
# The walsender sends archival status reports every ~10 s. Wait up to
392+
# timeout_default seconds for every .ready file to transition to .done.
393+
my $remaining5 = $n_ready5;
394+
for (my $i = 0; $i < $PostgreSQL::Test::Utils::timeout_default; $i++)
395+
{
396+
$remaining5 = 0;
397+
if (opendir(my $dh, $status_dir5))
398+
{
399+
$remaining5 = scalar(grep { /\.ready$/ } readdir($dh));
400+
closedir($dh);
401+
}
402+
last if $remaining5 == 0;
403+
sleep(1);
404+
}
405+
is($remaining5, 0,
406+
"all .ready files become .done after archiving restored on primary");
407+
270408
done_testing();
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Copyright (c) 2025, PostgreSQL Global Development Group
2+
3+
# Tests for archive_mode=shared correctness on standbys:
4+
#
5+
# 1. Checkpoint on standby must NOT remove WAL segments that have a .ready
6+
# status file (i.e. not yet archived by the primary). With the bug,
7+
# XLogArchiveCheckDone() returns true unconditionally during recovery for
8+
# any mode that is not "always", so checkpoint deletes these segments.
9+
#
10+
# 2. After archiving is broken on the primary and then restored, .ready files
11+
# on the standby must eventually transition to .done (primary sends archival
12+
# status reports to the standby via the walsender).
13+
14+
use strict;
15+
use warnings FATAL => 'all';
16+
use PostgreSQL::Test::Cluster;
17+
use PostgreSQL::Test::Utils;
18+
use Test::More;
19+
20+
# Use 1 MB WAL segments so we can generate many segments cheaply.
21+
my $wal_segsize = 1;
22+
23+
# An archive command that always fails (but is recognized by the archiver as a
24+
# real failure, not a missing command). Mirrors the approach in
25+
# 020_archive_status.pl to stay portable.
26+
my $broken_command =
27+
$PostgreSQL::Test::Utils::windows_os
28+
? q{copy "%p_does_not_exist" "%f_does_not_exist"}
29+
: q{cp "%p_does_not_exist" "%f_does_not_exist"};
30+
31+
my $archive_dir = PostgreSQL::Test::Utils::tempdir();
32+
my $good_command =
33+
$PostgreSQL::Test::Utils::windows_os
34+
? qq{copy "%p" "$archive_dir\\%f"}
35+
: qq{cp %p "$archive_dir/%f"};
36+
37+
###############################################################################
38+
# Set up primary with archive_mode=shared and BROKEN archiving so that every
39+
# WAL segment received by the standby gets a .ready file.
40+
###############################################################################
41+
42+
my $primary = PostgreSQL::Test::Cluster->new('primary');
43+
$primary->init(
44+
has_archiving => 1,
45+
allows_streaming => 1,
46+
extra => [ '--wal-segsize' => $wal_segsize ]);
47+
$primary->append_conf('postgresql.conf', qq{
48+
archive_mode = shared
49+
archive_command = '$broken_command'
50+
wal_keep_size = 0
51+
});
52+
$primary->start;
53+
54+
my $backup_name = 'standby_backup';
55+
$primary->backup($backup_name);
56+
57+
my $standby = PostgreSQL::Test::Cluster->new('standby');
58+
$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
59+
$standby->append_conf('postgresql.conf', qq{
60+
archive_mode = shared
61+
archive_command = '$good_command'
62+
wal_receiver_status_interval = 1s
63+
wal_keep_size = 0
64+
});
65+
$standby->start;
66+
67+
$primary->wait_for_catchup($standby);
68+
69+
###############################################################################
70+
# Generate WAL while archiving is broken.
71+
# The walreceiver will create .ready files for every received segment.
72+
###############################################################################
73+
74+
$primary->safe_psql('postgres', 'CREATE TABLE t (x int)');
75+
76+
# Switch WAL several times to create clearly-identifiable old segments.
77+
# We capture the name of the first switched-away segment; it is the primary
78+
# candidate that checkpoint would delete.
79+
my $target_seg = $primary->safe_psql('postgres',
80+
q{SELECT pg_walfile_name(pg_current_wal_lsn())});
81+
82+
for my $i (1..5)
83+
{
84+
$primary->safe_psql('postgres',
85+
"INSERT INTO t SELECT generate_series(1,500)");
86+
$primary->safe_psql('postgres', 'SELECT pg_switch_wal()');
87+
}
88+
89+
# Wait for the archiver to register failures so we are sure archiving is
90+
# truly broken (not just slow).
91+
$primary->poll_query_until('postgres',
92+
q{SELECT failed_count > 0 FROM pg_stat_archiver})
93+
or die "Timed out waiting for archiver to fail";
94+
95+
# Issue a CHECKPOINT on the primary so that the standby can form a
96+
# restartpoint whose redo LSN is past $target_seg.
97+
$primary->safe_psql('postgres', 'CHECKPOINT');
98+
99+
# Wait for the standby to replay everything up to that checkpoint.
100+
$primary->wait_for_catchup($standby);
101+
102+
my $standby_wal_dir = $standby->data_dir . '/pg_wal';
103+
my $standby_status_dir = "$standby_wal_dir/archive_status";
104+
105+
# The target segment must already be visible on the standby as .ready.
106+
my $target_ready = "$standby_status_dir/$target_seg.ready";
107+
ok(-f $target_ready,
108+
"standby has .ready file for segment $target_seg (not archived by primary)");
109+
110+
# The WAL file itself must also be present.
111+
ok(-f "$standby_wal_dir/$target_seg",
112+
"WAL segment $target_seg exists in standby pg_wal before CHECKPOINT");
113+
114+
###############################################################################
115+
# Test 1: CHECKPOINT (restartpoint) on standby must not remove .ready segments
116+
###############################################################################
117+
118+
# This triggers CreateRestartPoint, which calls RemoveOldXlogFiles.
119+
# With the bug, XLogArchiveCheckDone returns true for every segment in
120+
# archive_mode=shared during recovery, so $target_seg would be deleted.
121+
$standby->safe_psql('postgres', 'CHECKPOINT');
122+
123+
ok(-f "$standby_wal_dir/$target_seg",
124+
"WAL segment $target_seg still exists after CHECKPOINT on standby "
125+
. "(not deleted despite .ready status)");
126+
127+
ok(-f $target_ready,
128+
".ready file for $target_seg still present after CHECKPOINT on standby");
129+
130+
###############################################################################
131+
# Test 2: Restoring archiving on primary causes .ready -> .done on standby
132+
#
133+
# This part is independent of Test 1: we generate fresh WAL (with archiving
134+
# still broken) so the standby accumulates new .ready files, then restore
135+
# archiving and verify those files become .done.
136+
###############################################################################
137+
138+
# Generate a few more segments so the standby definitely has fresh .ready files
139+
# regardless of what checkpoint may have done above.
140+
for my $i (1..3)
141+
{
142+
$primary->safe_psql('postgres',
143+
"INSERT INTO t SELECT generate_series(1,200)");
144+
$primary->safe_psql('postgres', 'SELECT pg_switch_wal()');
145+
}
146+
$primary->wait_for_catchup($standby);
147+
148+
# Collect all current .ready files on the standby.
149+
my @ready_segs;
150+
if (opendir(my $dh, $standby_status_dir))
151+
{
152+
@ready_segs =
153+
map { (my $s = $_) =~ s/\.ready$//; $s }
154+
grep { /\.ready$/ } readdir($dh);
155+
closedir($dh);
156+
}
157+
note("Standby has "
158+
. scalar(@ready_segs)
159+
. " .ready segments before archiving is restored");
160+
cmp_ok(scalar(@ready_segs), '>', 0,
161+
"standby has fresh .ready files for newly received unarchived segments");
162+
163+
# Restore archiving on the primary.
164+
$primary->safe_psql('postgres', qq{
165+
ALTER SYSTEM SET archive_command TO '$good_command';
166+
SELECT pg_reload_conf();
167+
});
168+
169+
# Wait until primary has archived at least one segment.
170+
$primary->poll_query_until('postgres',
171+
q{SELECT archived_count > 0 FROM pg_stat_archiver})
172+
or die "Timed out waiting for primary to start archiving after restore";
173+
174+
# Generate one more WAL switch so the walsender picks up the updated
175+
# last_archived_wal and sends a fresh archival report to the standby.
176+
# (The walsender only sends when last_archived_wal changes and every
177+
# ARCHIVAL_REPORT_INTERVAL = 10 s at most.)
178+
$primary->safe_psql('postgres', 'SELECT pg_switch_wal()');
179+
$primary->wait_for_catchup($standby);
180+
181+
# Poll until all previously-.ready segments have become .done.
182+
# Allow up to the framework default timeout (usually 120 s); the walsender
183+
# reports every 10 s so convergence should happen well within that.
184+
my $remaining_ready = scalar(@ready_segs);
185+
for my $i (1 .. $PostgreSQL::Test::Utils::timeout_default)
186+
{
187+
$remaining_ready = 0;
188+
if (opendir(my $dh, $standby_status_dir))
189+
{
190+
# Count only the segments that were .ready before archiving was restored
191+
for my $seg (@ready_segs)
192+
{
193+
$remaining_ready++ if -f "$standby_status_dir/$seg.ready";
194+
}
195+
closedir($dh);
196+
}
197+
last if $remaining_ready == 0;
198+
sleep(1);
199+
}
200+
201+
is($remaining_ready, 0,
202+
"all .ready files on standby transitioned to .done "
203+
. "after archiving restored on primary");
204+
205+
# Sanity-check: the WAL files are still present (they weren't deleted by
206+
# checkpoint while .ready, nor disappeared otherwise).
207+
my @still_missing = grep { !-f "$standby_wal_dir/$_" } @ready_segs;
208+
is(scalar(@still_missing), 0,
209+
"WAL segments were not lost while waiting for archival reports");
210+
211+
done_testing();

0 commit comments

Comments
 (0)