Skip to content

Postgres Restore failed #4129

Open
Open
@highgo-sunrk

Description

@highgo-sunrk

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 1.23.7
  • PGO Image Tag:ubi8-5.7.3
  • Postgres Version 14
  • Storage: nfs

Steps to Reproduce

I performed two backups using pgBackRest. The first backup was automatically completed after the cluster was created. I executed the restore operation by modifying the CR file.
pgbackrest info

stanza: db
    status: ok
    cipher: none

    db (current)
        wal archive min/max (14): 000000010000000000000001/00000002000000000000000E

        full backup: 20250310-023907F
            timestamp start/stop: 2025-03-10 02:39:07+00 / 2025-03-10 02:40:15+00
            wal start/stop: 000000010000000000000003 / 000000010000000000000004
            database size: 52.3MB, database backup size: 52.3MB
            repo1: backup set size: 6.7MB, backup size: 6.7MB

        full backup: 20250310-030103F
            timestamp start/stop: 2025-03-10 03:01:03+00 / 2025-03-10 03:01:46+00
            wal start/stop: 000000010000000000000006 / 000000010000000000000007
            database size: 52.4MB, database backup size: 52.4MB
            repo1: backup set size: 6.7MB, backup size: 6.7MB

        incr backup: 20250310-030103F_20250310-030518I
            timestamp start/stop: 2025-03-10 03:05:18+00 / 2025-03-10 03:05:25+00
            wal start/stop: 00000001000000000000000A / 00000001000000000000000B
            database size: 52.4MB, database backup size: 4.0MB
            repo1: backup set size: 6.7MB, backup size: 447.0KB
            backup reference list: 20250310-030103F

The first restore was to the backup set 20250310-030103F_20250310-030518I, which completed normally and the data was intact. I then restored again to the first backup set 20250310-023907F, which also restored normally. However, when I tried to restore to the backup set 20250310-030103F_20250310-030518I once more, an error occurred. Upon checking the logs of the restore pod, I found the following error message.

 kg logs pod/v9test-pgbackrest-restore-g6gq4 -f 
+ pgbackrest restore --type=time '--target=2025-03-10 07:04:46' --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --target-action=promote --link-map=pg_wal=/pgdata/pg14_wal
2025-03-10 07:29:53.448 GMT [20] LOG:  starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.4.0, 64-bit
2025-03-10 07:29:53.463 GMT [20] LOG:  listening on IPv6 address "::1", port 5866
2025-03-10 07:29:53.463 GMT [20] LOG:  listening on IPv4 address "127.0.0.1", port 5866
2025-03-10 07:29:53.470 GMT [20] LOG:  listening on IPv6 address "::1", port 1521
2025-03-10 07:29:53.470 GMT [20] LOG:  listening on IPv4 address "127.0.0.1", port 1521
2025-03-10 07:29:53.472 GMT [20] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5866"
2025-03-10 07:29:53.485 GMT [20] LOG:  listening on Unix socket "/tmp/.s.PGSQL.1521"
2025-03-10 07:29:53.530 GMT [20] LOG:  This is an trial version of hgdb-enterprise-9.0.1, valid until 2099年10月25日. After-sales service hotline: 400-708-8006. Online technical support platform:https://support.highgo.com/.
2025-03-10 07:29:53.541 GMT [21] LOG:  database system was interrupted; last known up at 2025-03-10 07:03:31 GMT
2025-03-10 07:30:07.674 GMT [21] LOG:  restored log file "00000003.history" from archive
2025-03-10 07:30:07.742 GMT [21] LOG:  restored log file "00000004.history" from archive
2025-03-10 07:30:07.820 GMT [21] LOG:  restored log file "00000005.history" from archive
2025-03-10 07:30:07.893 GMT [21] LOG:  starting point-in-time recovery to 2025-03-10 07:04:46+00
2025-03-10 07:30:07.961 GMT [21] LOG:  restored log file "00000005.history" from archive
2025-03-10 07:30:08.333 GMT [21] LOG:  restored log file "00000004000000000000000D" from archive
2025-03-10 07:30:08.667 GMT [21] LOG:  restored log file "00000004000000000000000C" from archive
2025-03-10 07:30:08.709 GMT [21] FATAL:  requested timeline 5 is not a child of this server's history
2025-03-10 07:30:08.709 GMT [21] DETAIL:  Latest checkpoint is at 0/D000060 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at 0/F005068.
2025-03-10 07:30:08.712 GMT [20] LOG:  startup process (PID 21) exited with exit code 1
2025-03-10 07:30:08.712 GMT [20] LOG:  aborting startup due to startup process failure
2025-03-10 07:30:08.716 GMT [20] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.
2025-03-10 07:30:09.013 GMT [39] LOG:  starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.4.0, 64-bit
2025-03-10 07:30:09.014 GMT [39] LOG:  listening on IPv6 address "::1", port 5866
2025-03-10 07:30:09.014 GMT [39] LOG:  listening on IPv4 address "127.0.0.1", port 5866
2025-03-10 07:30:09.022 GMT [39] LOG:  listening on IPv6 address "::1", port 1521
2025-03-10 07:30:09.022 GMT [39] LOG:  listening on IPv4 address "127.0.0.1", port 1521
2025-03-10 07:30:09.024 GMT [39] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5866"
2025-03-10 07:30:09.032 GMT [39] LOG:  listening on Unix socket "/tmp/.s.PGSQL.1521"
2025-03-10 07:30:09.074 GMT [39] LOG:  This is an trial version of hgdb-enterprise-9.0.1, valid until 2099年10月25日. After-sales service hotline: 400-708-8006. Online technical support platform:https://support.highgo.com/.
2025-03-10 07:30:09.086 GMT [40] LOG:  database system was interrupted; last known up at 2025-03-10 07:03:31 GMT
2025-03-10 07:30:29.056 GMT [40] LOG:  restored log file "00000003.history" from archive
2025-03-10 07:30:29.114 GMT [40] LOG:  restored log file "00000004.history" from archive
2025-03-10 07:30:29.172 GMT [40] LOG:  restored log file "00000005.history" from archive
2025-03-10 07:30:29.257 GMT [40] LOG:  starting point-in-time recovery to 2025-03-10 07:04:46+00
2025-03-10 07:30:29.453 GMT [40] LOG:  restored log file "00000005.history" from archive
2025-03-10 07:30:29.849 GMT [40] LOG:  restored log file "00000004000000000000000D" from archive
2025-03-10 07:30:30.218 GMT [40] LOG:  restored log file "00000004000000000000000C" from archive
2025-03-10 07:30:30.255 GMT [40] FATAL:  requested timeline 5 is not a child of this server's history
2025-03-10 07:30:30.255 GMT [40] DETAIL:  Latest checkpoint is at 0/D000060 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at 0/F005068.
2025-03-10 07:30:30.258 GMT [39] LOG:  startup process (PID 40) exited with exit code 1
2025-03-10 07:30:30.258 GMT [39] LOG:  aborting startup due to startup process failure
2025-03-10 07:30:30.262 GMT [39] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.

I modified the operator's code and added the following lines:

if foundTarget {
	opts = append(opts, "--target-action=promote", "--target-timeline=current")
}

With the modified operator, the primary node can start normally, but the secondary node encounters the following error.

2025-03-10 07:42:35,011 WARNING: Postgresql is not running.
2025-03-10 07:42:35,012 INFO: Lock owner: v9test-highgo-vgwk-0; I am v9test-highgo-d4nx-0
2025-03-10 07:42:35,018 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202107181
  Database system identifier: 7480068508330737760
  Database cluster state: shut down in recovery
  pg_control last modified: Mon Mar 10 07:21:41 2025
  Latest checkpoint location: 0/D000060
  Latest checkpoint's REDO location: 0/C000550
  Latest checkpoint's REDO WAL file: 00000004000000000000000C
  Latest checkpoint's TimeLineID: 4
  Latest checkpoint's PrevTimeLineID: 4
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:818
  Latest checkpoint's NextOID: 40959
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 784
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 818
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Mar 10 07:20:41 2025
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/D000110
  Min recovery ending loc's timeline: 4
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_prepared_foreign_transactions setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 0bb02f977b2417c0296e95e2684c288621ab04cd0ed34e291a08ac247de78475
  database mode: 1
  case conversion mode: 1

2025-03-10 07:42:35,019 INFO: Lock owner: v9test-highgo-vgwk-0; I am v9test-highgo-d4nx-0
2025-03-10 07:42:35,024 INFO: starting as a secondary
2025-03-10 07:42:36,053 INFO: postmaster pid=114
/tmp/postgres:5866 - no response
2025-03-10 07:42:36.174 UTC [114] LOG:  pgaudit extension initialized
2025-03-10 07:42:36.364 UTC [114] LOG:  starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.4.0, 64-bit
2025-03-10 07:42:36.367 UTC [114] LOG:  listening on IPv4 address "0.0.0.0", port 5866
2025-03-10 07:42:36.367 UTC [114] LOG:  listening on IPv6 address "::", port 5866
2025-03-10 07:42:36.425 UTC [114] LOG:  listening on IPv6 address "::1", port 1521
2025-03-10 07:42:36.425 UTC [114] LOG:  listening on IPv4 address "127.0.0.1", port 1521
2025-03-10 07:42:36.445 UTC [114] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5866"
2025-03-10 07:42:36.483 UTC [114] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.1521"
2025-03-10 07:42:36.788 UTC [114] LOG:  This is an trial version of hgdb-enterprise-9.0.1, valid until 2099年10月25日. After-sales service hotline: 400-708-8006. Online technical support platform:https://support.highgo.com/.
2025-03-10 07:42:36.852 UTC [117] LOG:  database system was shut down in recovery at 2025-03-10 07:21:41 UTC
2025-03-10 07:42:37.245 UTC [120] FATAL:  the database system is starting up
/tmp/postgres:5866 - rejecting connections
2025-03-10 07:42:37.258 UTC [122] FATAL:  the database system is starting up
/tmp/postgres:5866 - rejecting connections
2025-03-10 07:42:37.354 UTC [117] LOG:  restored log file "00000005.history" from archive
2025-03-10 07:42:37.652 UTC [117] LOG:  restored log file "00000006.history" from archive
2025-03-10 07:42:37.894 UTC [117] LOG:  entering standby mode
2025-03-10 07:42:38.053 UTC [117] LOG:  restored log file "00000006.history" from archive
2025-03-10 07:42:38.304 UTC [129] FATAL:  the database system is starting up
/tmp/postgres:5866 - rejecting connections
2025-03-10 07:42:38.618 UTC [117] LOG:  restored log file "00000002000000000000000D" from archive
2025-03-10 07:42:38.852 UTC [117] FATAL:  requested timeline 6 does not contain minimum recovery point 0/D000110 on timeline 4
2025-03-10 07:42:38.865 UTC [114] LOG:  startup process (PID 117) exited with exit code 1
2025-03-10 07:42:38.865 UTC [114] LOG:  aborting startup due to startup process failure
2025-03-10 07:42:38.885 UTC [114] LOG:  database system is shut down
/tmp/postgres:5866 - no response

Has anyone encountered this issue before? Is my code modification correct? Please help me with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions