-
Notifications
You must be signed in to change notification settings - Fork 61
K8SPG-680: add ReadyForBackup
condition to the pg-cluster
#1133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
func (f *fakeClient) Patch(ctx context.Context, obj client.Object, patch client.Patch, options ...client.PatchOption) error { | ||
err := f.Client.Patch(ctx, obj, patch, options...) | ||
if !k8serrors.IsNotFound(err) { | ||
return err | ||
} | ||
if err := f.Create(ctx, obj); err != nil { | ||
return err | ||
} | ||
return f.Client.Patch(ctx, obj, patch, options...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? By removing it nothing fails on the controller tests.
@@ -505,7 +470,7 @@ func updatePGBackrestInfo(ctx context.Context, c client.Client, pod *corev1.Pod, | |||
} | |||
|
|||
func finishBackup(ctx context.Context, c client.Client, pgBackup *v2.PerconaPGBackup, job *batchv1.Job) (*reconcile.Result, error) { | |||
if checkBackupJob(job) == v2.BackupSucceeded { | |||
if job != nil && checkBackupJob(job) == v2.BackupSucceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe validate the input job once at the top of the function and avoid repeating the same check across different places?
e.g.
func finishBackup(ctx context.Context, c client.Client, pgBackup *v2.PerconaPGBackup, job *batchv1.Job) (*reconcile.Result, error) {
if job == nil {
// do something
}
https://perconadev.atlassian.net/browse/K8SPG-680
DESCRIPTION
Problem:
After a failed PVC resize on
cluster1-repo1
, scheduled backups cannot be created successfully. Although thepg-backup
object is created, it gets stuck in theStarting
state.Cause:
When a PVC resize fails, the crunchy's
PostgresCluster
resource gets anUnknown
status for thePGBackRestReplicaRepoReady
condition. This condition is required to create a backup job in thereconcileManualBackup
method:percona-postgresql-operator/internal/controller/postgrescluster/pgbackrest.go
Lines 2394 to 2407 in bd5bac1
As a result, the operator waits indefinitely for the backup job to appear:
percona-postgresql-operator/percona/controller/pgbackup/controller.go
Lines 190 to 195 in bd5bac1
Solution:
.status.conditions
field to thePerconaPGCluster
resource.PostgresCluster
resource (PGBackRestRepoHostReady
andPGBackRestReplicaCreate
) are notTrue
, a newReadyForBackup
condition is added toPerconaPGCluster
with theFalse
status.ReadyForBackup
isFalse
, the operator will skip the scheduled backup creation and log a message instead.PerconaPGBackup
resource is created and the operator is waiting for its backup job to appear, it will check theReadyForBackup
condition. If it was set toFalse
more than 2 minutes ago, the backup will be marked asFailed
.CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
Config/Logging/Testability