Skip to content

Percona Operator ignores Restore CRs after Error in Previous Restore #2154

@Henrik-St

Description

@Henrik-St

Report

After an error in a previous restore, the percona operator is in a faulty state, where:
When another restore custom resource is applied,
The percona operator does not react to it in any way and the restore resource stays stateless.

More about the problem

The custom resource never gets the requested state.
The operator logs don't show any signs of reaction to the new restore object in the DEBUG logs.
The only way i could find to resolve the faulty state, was to restart the operator pod.
Then the restore objects are again recognized by the operator.

Steps to reproduce

Reproduction:
1. Create a Restore Yaml which references a backup in another AWS Account (first-restore.yaml)
2. Set the role policies and add the bucket policies but omit the statement in the kms key policy, which allows to access the relevent s3 bucket.
3. Apply the restore custom resource
4. The Restore will log an error in the percona operator and its state will not change (log-first-restore.txt)
5. Delete the restore custom resource
6. Set the correct kms policy to allow the role to decrypt the s3 bucket.
7. Apply another restore custom resource, which can be valid or not. The operator will not react to it. Its state will also not change (second-restore.yaml, log-second-restore.txt)

Versions

1. Kubernetes Version: 1.32
2. Percona Operator: 1.19.1
3. Database Version: v7.0.15-9

Anything else?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions