Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions internal/common/constant/constant.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ const JSONTagString = "json"
// CommaString defines a constant for the comma string
const CommaString = ","

// WildcardString defines the wildcard character used in Velero resource filters
const WildcardString = "*"

// MaximumNacObjectNameLength represents Generated Non Admin Object Name and
// must be below 63 characters, because it's used within object Label Value
const MaximumNacObjectNameLength = validation.DNS1123LabelMaxLength
Expand Down
5 changes: 5 additions & 0 deletions internal/common/function/function_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,11 @@ func TestValidateBackupSpecEnforcedFields(t *testing.T) {
ParallelFilesUpload: 32, //nolint:revive // just test
},
},
{
name: "VolumeGroupSnapshotLabelKey",
enforcedValue: "enforced-label-key",
overrideValue: "user-label-key",
},
Comment on lines +381 to +385
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify VolumeGroupSnapshotLabelKey field exists in Velero BackupSpec

# Check go.mod for Velero version
echo "=== Velero version in go.mod ==="
rg -n 'github.com/vmware-tanzu/velero' go.mod

# Search for VolumeGroupSnapshotLabelKey usage in the codebase
echo -e "\n=== VolumeGroupSnapshotLabelKey references ==="
rg -n 'VolumeGroupSnapshotLabelKey'

# Verify the field exists in imported Velero types
echo -e "\n=== BackupSpec struct fields (if vendored) ==="
fd -t f 'backup_types.go' --exec grep -A 100 'type BackupSpec struct' {} \; 2>/dev/null | head -150

Repository: migtools/oadp-non-admin

Length of output: 736


🏁 Script executed:

#!/bin/bash
# Check the BackupSpec struct definition in the project
echo "=== Looking for BackupSpec struct in project imports ==="
rg -n 'type BackupSpec struct' --type go

# Check the imports in function_test.go to see which Velero version is being used
echo -e "\n=== Imports in function_test.go ==="
head -50 internal/common/function/function_test.go | grep -A 20 'import'

# Check the completeness check logic at lines 460-475
echo -e "\n=== Completeness check logic ==="
sed -n '455,480p' internal/common/function/function_test.go

Repository: migtools/oadp-non-admin

Length of output: 1531


🏁 Script executed:

#!/bin/bash
# Check when VolumeGroupSnapshotLabelKey was added to Velero
echo "=== Searching Velero upstream for VolumeGroupSnapshotLabelKey ==="

# Try to find any Velero types file that might be in vendor or cached
fd -t f -name '*velero*' -type f | grep -E '(types|backup)' | head -20

# Check if there's a vendor directory with Velero types
echo -e "\n=== Checking for vendored Velero types ==="
find . -path '*/velero/pkg/apis/velero/v1/*' -name '*.go' 2>/dev/null | head -10

# Look for any BackupSpec references that show actual field names
echo -e "\n=== Looking for BackupSpec field references in YAML/config ==="
rg -A 50 'type.*BackupSpec' --type go 2>/dev/null | head -100

Repository: migtools/oadp-non-admin

Length of output: 4242


🌐 Web query:

Velero VolumeGroupSnapshotLabelKey release version when added

💡 Result:

The VolumeGroupSnapshotLabelKey feature, which allows customizing the label key used by Velero to group PVCs for VolumeGroupSnapshots (default: velero.io/volume-group), was added in Velero v1.17.0. It is configurable via the Backup spec field volumeGroupSnapshotLabelKey, the server flag --volume-group-snapshot-label-key, or defaults to velero.io/volume-group.

Citations:


CRITICAL: Test case references a non-existent BackupSpec field in current Velero version.

The VolumeGroupSnapshotLabelKey field was added in Velero v1.17.0, but this project uses v1.14.0. The test case at lines 381-385 will fail when the completeness check (lines 460-475) attempts to validate the field against the actual velerov1.BackupSpec struct via reflection. The field does not exist in v1.14.0's BackupSpec.

Either upgrade Velero to v1.17.0 or later, or remove this test case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/function/function_test.go` around lines 381 - 385, The test
includes a case named "VolumeGroupSnapshotLabelKey" that references
BackupSpec.VolumeGroupSnapshotLabelKey which does not exist in the vendored
Velero v1.14.0; remove that test case (or replace it with a field present in
v1.14.0) so the completeness check in function_test.go (the reflection-based
validation around lines 460-475) no longer expects the nonexistent field;
alternatively, if you intend to keep the test, upgrade the Velero dependency to
v1.17.0+ so BackupSpec contains VolumeGroupSnapshotLabelKey and the reflection
check will pass.

}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
Expand Down
18 changes: 13 additions & 5 deletions internal/controller/nonadminbackup_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"context"
"errors"
"reflect"
"slices"

"github.com/go-logr/logr"
velerov1 "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
Expand Down Expand Up @@ -705,11 +706,18 @@ func (r *NonAdminBackupReconciler) createVeleroBackupAndSyncWithNonAdminBackup(c

if haveNewResourceFilterParameters {
// Use the new-style exclusion list
backupSpec.ExcludedNamespaceScopedResources = append(backupSpec.ExcludedNamespaceScopedResources,
alwaysExcludedNamespacedResources...)
backupSpec.ExcludedClusterScopedResources = append(backupSpec.ExcludedClusterScopedResources,
alwaysExcludedClusterResources...)
} else {
// Skip appending when wildcard '*' is already present, as it already
// excludes everything and mixing '*' with specific items causes Velero
// FailedValidation.
if !slices.Contains(backupSpec.ExcludedNamespaceScopedResources, constant.WildcardString) {
backupSpec.ExcludedNamespaceScopedResources = append(backupSpec.ExcludedNamespaceScopedResources,
alwaysExcludedNamespacedResources...)
}
if !slices.Contains(backupSpec.ExcludedClusterScopedResources, constant.WildcardString) {
backupSpec.ExcludedClusterScopedResources = append(backupSpec.ExcludedClusterScopedResources,
alwaysExcludedClusterResources...)
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
} else if !slices.Contains(backupSpec.ExcludedResources, constant.WildcardString) {
// Fallback to the old-style exclusion list
backupSpec.ExcludedResources = append(backupSpec.ExcludedResources,
alwaysExcludedNamespacedResources...)
Expand Down
80 changes: 80 additions & 0 deletions internal/controller/nonadminbackup_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1546,6 +1546,86 @@ var _ = ginkgo.Describe("Test full reconcile loop of NonAdminBackup Controller",
},
addSyncLabel: true,
}),
ginkgo.Entry("Should not append default excluded resources when wildcard is used in excludedClusterScopedResources", nonAdminBackupFullReconcileScenario{
Comment thread
mpryc marked this conversation as resolved.
spec: nacv1alpha1.NonAdminBackupSpec{
BackupSpec: &velerov1.BackupSpec{
ExcludedClusterScopedResources: []string{"*"},
IncludedNamespaceScopedResources: []string{"pvc"},
},
},
status: nacv1alpha1.NonAdminBackupStatus{
Phase: nacv1alpha1.NonAdminPhaseCreated,
VeleroBackup: &nacv1alpha1.VeleroBackup{
Namespace: oadpNamespace,
Status: nil,
Spec: &velerov1.BackupSpec{
ExcludedClusterScopedResources: []string{"*"},
IncludedNamespaceScopedResources: []string{"pvc"},
ExcludedNamespaceScopedResources: []string{
nacv1alpha1.NonAdminBackups,
nacv1alpha1.NonAdminRestores,
nacv1alpha1.NonAdminBackupStorageLocations,
},
},
},
Conditions: []metav1.Condition{
{
Type: "Accepted",
Status: metav1.ConditionTrue,
Reason: "BackupAccepted",
Message: "backup accepted",
},
{
Type: "Queued",
Status: metav1.ConditionTrue,
Reason: "BackupScheduled",
Message: "Created Velero Backup object",
},
},
},
}),
ginkgo.Entry("Should not append default excluded resources when wildcard is used in excludedNamespaceScopedResources", nonAdminBackupFullReconcileScenario{
spec: nacv1alpha1.NonAdminBackupSpec{
BackupSpec: &velerov1.BackupSpec{
ExcludedNamespaceScopedResources: []string{"*"},
IncludedClusterScopedResources: []string{"persistentvolumes"},
},
},
status: nacv1alpha1.NonAdminBackupStatus{
Phase: nacv1alpha1.NonAdminPhaseCreated,
VeleroBackup: &nacv1alpha1.VeleroBackup{
Namespace: oadpNamespace,
Status: nil,
Spec: &velerov1.BackupSpec{
ExcludedNamespaceScopedResources: []string{"*"},
IncludedClusterScopedResources: []string{"persistentvolumes"},
ExcludedClusterScopedResources: []string{
"securitycontextconstraints",
"clusterroles",
"clusterrolebindings",
"priorityclasses",
"customresourcedefinitions",
"virtualmachineclusterinstancetypes",
"virtualmachineclusterpreferences",
},
},
},
Conditions: []metav1.Condition{
{
Type: "Accepted",
Status: metav1.ConditionTrue,
Reason: "BackupAccepted",
Message: "backup accepted",
},
{
Type: "Queued",
Status: metav1.ConditionTrue,
Reason: "BackupScheduled",
Message: "Created Velero Backup object",
},
},
},
}),
)

ginkgo.DescribeTable("Reconcile triggered by NonAdminBackup sync event",
Expand Down
100 changes: 52 additions & 48 deletions internal/controller/nonadminbackupstoragelocation_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,42 +12,43 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

RESOURCE CONFLICT RESOLUTION ENHANCEMENTS:
This file has been enhanced to resolve resource conflict issues that occurred when
multiple controllers or processes attempted to update the same NonAdminBackupStorageLocationRequest
objects simultaneously. The following changes were made:

1. RETRY LOGIC FRAMEWORK (updateStatusWithRetry function):
- Uses standard Kubernetes client-go retry.RetryOnConflict with DefaultRetry settings
- Handles "object has been modified" errors gracefully
- Fetches fresh object copies to avoid stale ResourceVersion conflicts
- Leverages proven Kubernetes retry patterns (5 attempts, 10ms+jitter)

2. NIL SAFETY CHECKS (ensureNonAdminRequest function):
- Prevents panic when SourceNonAdminBSL is nil during initialization
- Converts terminal errors to requeue conditions for uninitialized status
- Allows proper status initialization timing in high-concurrency environments

3. OPTIMIZED STATUS UPDATES (createNonAdminRequest function):
- Uses fast-path direct updates for new objects
- Falls back to retry logic only when conflicts are detected
- Preserves computed status values while ensuring conflict resilience

4. TEST ENVIRONMENT ADAPTATIONS:
- Increased timeouts to accommodate retry logic execution time
- Reduced polling frequency to handle Kubernetes client rate limiting
- Added delays to prevent overwhelming API server during test runs

These enhancements ensure that OADP non-admin backup operations complete successfully
even under high concurrency or when multiple reconciliation events occur simultaneously.
*/

// RESOURCE CONFLICT RESOLUTION ENHANCEMENTS:
// This file has been enhanced to resolve resource conflict issues that occurred when
// multiple controllers or processes attempted to update the same NonAdminBackupStorageLocationRequest
// objects simultaneously. The following changes were made:
//
// 1. RETRY LOGIC FRAMEWORK (updateStatusWithRetry function):
// - Uses standard Kubernetes client-go retry.RetryOnConflict with DefaultRetry settings
// - Handles "object has been modified" errors gracefully
// - Fetches fresh object copies to avoid stale ResourceVersion conflicts
// - Leverages proven Kubernetes retry patterns (5 attempts, 10ms+jitter)
//
// 2. NIL SAFETY CHECKS (ensureNonAdminRequest function):
// - Prevents panic when SourceNonAdminBSL is nil during initialization
// - Converts terminal errors to requeue conditions for uninitialized status
// - Allows proper status initialization timing in high-concurrency environments
//
// 3. OPTIMIZED STATUS UPDATES (createNonAdminRequest function):
// - Uses fast-path direct updates for new objects
// - Falls back to retry logic only when conflicts are detected
// - Preserves computed status values while ensuring conflict resilience
//
// 4. TEST ENVIRONMENT ADAPTATIONS:
// - Increased timeouts to accommodate retry logic execution time
// - Reduced polling frequency to handle Kubernetes client rate limiting
// - Added delays to prevent overwhelming API server during test runs
//
// These enhancements ensure that OADP non-admin backup operations complete successfully
// even under high concurrency or when multiple reconciliation events occur simultaneously.

package controller

import (
"context"
"errors"
"fmt"
"reflect"
"time"

Expand Down Expand Up @@ -125,7 +126,10 @@ func (r *NonAdminBackupStorageLocationReconciler) updateStatusWithRetry(ctx cont
// Get the latest version of the object from the API server to ensure we have
// the most recent ResourceVersion and avoid stale object conflicts
key := client.ObjectKeyFromObject(obj)
fresh := obj.DeepCopyObject().(client.Object)
fresh, ok := obj.DeepCopyObject().(client.Object)
if !ok {
return errors.New("failed to convert deep copy to client.Object")
}
if err := r.Get(ctx, key, fresh); err != nil {
return err // RetryOnConflict will handle conflict vs non-conflict errors
}
Expand Down Expand Up @@ -643,7 +647,11 @@ func (r *NonAdminBackupStorageLocationReconciler) createNonAdminRequest(ctx cont
// - Event-driven reconciliation causing concurrent status updates
logger.V(1).Info("NonAdminBackupStorageLocationRequest already exists")
if updateErr := r.updateStatusWithRetry(ctx, logger, nabslRequest, func(obj client.Object) bool {
req := obj.(*nacv1alpha1.NonAdminBackupStorageLocationRequest)
req, ok := obj.(*nacv1alpha1.NonAdminBackupStorageLocationRequest)
if !ok {
Comment thread
mpryc marked this conversation as resolved.
logger.Error(fmt.Errorf("expected *NonAdminBackupStorageLocationRequest, got %T", obj), "Unexpected type assertion failure")
return false
}
return updatePhaseIfNeeded(&req.Status.Phase, req.Spec.ApprovalDecision)
}); updateErr != nil {
logger.Error(updateErr, failedUpdateStatusError)
Expand Down Expand Up @@ -704,26 +712,22 @@ func (r *NonAdminBackupStorageLocationReconciler) createNonAdminRequest(ctx cont
// - Correctness: Proper status initialization even under load
if updated := updateNonAdminRequestStatus(&nonAdminBslRequest.Status, nabsl, approvalDecision); updated {
if updateErr := r.Status().Update(ctx, &nonAdminBslRequest); updateErr != nil {
if apierrors.IsConflict(updateErr) {
// CONFLICT DETECTED: Another process modified the request between create and status update
// This can happen when:
// - Admin approves/rejects the request immediately after creation
// - Multiple reconcile loops are triggered by related events
// - High concurrency in the test environment
logger.V(1).Info("Conflict on initial status update, retrying with fresh object...")
if retryErr := r.updateStatusWithRetry(ctx, logger, &nonAdminBslRequest, func(obj client.Object) bool {
req := obj.(*nacv1alpha1.NonAdminBackupStorageLocationRequest)
return updateNonAdminRequestStatus(&req.Status, nabsl, approvalDecision)
}); retryErr != nil {
logger.Error(retryErr, failedUpdateStatusError)
return false, retryErr
}
} else {
// NON-CONFLICT ERROR: Validation, permission, or other API server issue
// Don't retry these as they're likely to persist
if !apierrors.IsConflict(updateErr) {
logger.Error(updateErr, failedUpdateStatusError)
return false, updateErr
}
// CONFLICT DETECTED: Another process modified the request between create and status update
logger.V(1).Info("Conflict on initial status update, retrying with fresh object...")
if retryErr := r.updateStatusWithRetry(ctx, logger, &nonAdminBslRequest, func(obj client.Object) bool {
req, ok := obj.(*nacv1alpha1.NonAdminBackupStorageLocationRequest)
if !ok {
return false
}
return updateNonAdminRequestStatus(&req.Status, nabsl, approvalDecision)
}); retryErr != nil {
logger.Error(retryErr, failedUpdateStatusError)
return false, retryErr
}
}
}

Expand Down
7 changes: 5 additions & 2 deletions internal/controller/nonadminrestore_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"errors"
"reflect"
"slices"

"github.com/go-logr/logr"
velerov1 "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
Expand Down Expand Up @@ -348,8 +349,10 @@ func (r *NonAdminRestoreReconciler) createVeleroRestore(ctx context.Context, log
}
}

restoreSpec.ExcludedResources = append(restoreSpec.ExcludedResources,
"volumesnapshotclasses")
if !slices.Contains(restoreSpec.ExcludedResources, constant.WildcardString) {
Comment thread
mpryc marked this conversation as resolved.
restoreSpec.ExcludedResources = append(restoreSpec.ExcludedResources,
"volumesnapshotclasses")
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

veleroRestore = &velerov1.Restore{
ObjectMeta: metav1.ObjectMeta{
Expand Down
67 changes: 66 additions & 1 deletion internal/controller/nonadminrestore_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"fmt"
"reflect"
"slices"
"strings"
"time"

Expand Down Expand Up @@ -366,7 +367,11 @@ var _ = ginkgo.Describe("Test full reconcile loop of NonAdminRestore Controller"
ginkgo.By("Validating Velero Restore Spec")
expectedSpec := scenario.enforcedRestoreSpec.DeepCopy()
expectedSpec.IncludedNamespaces = []string{nonAdminRestoreNamespace}
expectedSpec.ExcludedResources = []string{"volumesnapshotclasses"}
if !slices.Contains(scenario.spec.RestoreSpec.ExcludedResources, "*") {
expectedSpec.ExcludedResources = append(expectedSpec.ExcludedResources, "volumesnapshotclasses")
} else {
expectedSpec.ExcludedResources = scenario.spec.RestoreSpec.ExcludedResources
}
gomega.Expect(reflect.DeepEqual(veleroRestore.Spec, *expectedSpec)).To(gomega.BeTrue())
}

Expand Down Expand Up @@ -605,5 +610,65 @@ var _ = ginkgo.Describe("Test full reconcile loop of NonAdminRestore Controller"
QueueInfo: nil,
},
}),
ginkgo.Entry("Should not append default excluded resources when wildcard is used in excludedResources", nonAdminRestoreFullReconcileScenario{
spec: nacv1alpha1.NonAdminRestoreSpec{
RestoreSpec: &velerov1.RestoreSpec{
BackupName: "test",
ExcludedResources: []string{"*"},
},
},
backupStatus: nacv1alpha1.NonAdminBackupStatus{
Phase: nacv1alpha1.NonAdminPhaseCreated,
VeleroBackup: &nacv1alpha1.VeleroBackup{
Status: &velerov1.BackupStatus{
Phase: velerov1.BackupPhaseCompleted,
CompletionTimestamp: &metav1.Time{Time: time.Now()},
},
},
Conditions: []metav1.Condition{
{
Type: "Accepted",
Status: metav1.ConditionTrue,
Reason: "BackupAccepted",
Message: "backup accepted",
LastTransitionTime: metav1.NewTime(time.Now()),
},
{
Type: "Queued",
Status: metav1.ConditionTrue,
Reason: "BackupScheduled",
Message: "Created Velero Backup object",
LastTransitionTime: metav1.NewTime(time.Now()),
},
},
QueueInfo: &nacv1alpha1.QueueInfo{
EstimatedQueuePosition: 0,
},
},
status: nacv1alpha1.NonAdminRestoreStatus{
Phase: nacv1alpha1.NonAdminPhaseCreated,
VeleroRestore: &nacv1alpha1.VeleroRestore{
Status: nil,
},
Conditions: []metav1.Condition{
{
Type: "Accepted",
Status: metav1.ConditionTrue,
Reason: "RestoreAccepted",
Message: "restore accepted",
},
{
Type: "Queued",
Status: metav1.ConditionTrue,
Reason: "RestoreScheduled",
Message: "Created Velero Restore object",
},
},
QueueInfo: &nacv1alpha1.QueueInfo{
EstimatedQueuePosition: 1,
},
},
enforcedRestoreSpec: &velerov1.RestoreSpec{},
}),
)
})