[AV-127651] Add legacy backup/schedule + snapshot schedule acceptance tests; harden bucket setup#583
Conversation
…and cloud snapshot backup schedule These tests are added to validate the legacy backup endpoint and related flows. The legacy backup resource currently fails creation with HTTP 404 "Unable to find the specified bucket" against a bucket that the bucket endpoint resolves successfully. This will be investigated and fixed in this branch. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…etup If TF_VAR_bucket_id is set but does not resolve, fall through to discovery instead of trusting the value blindly. If no buckets exist on the cluster, create one and tear it down at the end. Buckets that pre-existed (or were supplied via env var) are not destroyed. This eliminates the silent 404 from bucket-scoped endpoints (legacy backup, backup schedule, etc.) when the env var points to a stale or typo'd id. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Droid finished @panigrahisubhrajit's task —— View job |
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
There was a problem hiding this comment.
Pull request overview
Adds new acceptance test coverage for legacy backup resources and the cloud snapshot backup schedule data source, and hardens the acceptance-test setup to avoid stale/missing bucket IDs breaking bucket-scoped endpoints.
Changes:
- Added acceptance tests for
couchbase-capella_backup,couchbase-capella_backup_schedule, andcouchbase-capella_cloud_snapshot_backup_schedule(data source). - Updated acceptance-test harness bucket resolution to validate
TF_VAR_bucket_id, discover an existing bucket, or create/cleanup a test bucket. - Introduced bucket helper functions (
bucketExists,discoverFirstBucket,destroyBucket) and tracked whether setup created the bucket.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| acceptance_tests/setup_test.go | Implements bucket resolution ladder and conditional bucket cleanup. |
| acceptance_tests/globals.go | Adds globalBucketCreated flag to track bucket lifecycle ownership. |
| acceptance_tests/bucket.go | Adds bucket existence/discovery helpers and delete support for setup cleanup. |
| acceptance_tests/backup_acceptance_test.go | New acceptance tests for legacy backup resource + backups data source. |
| acceptance_tests/backup_schedule_acceptance_test.go | New acceptance tests for legacy backup schedule resource (incl. import + invalid cases). |
| acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go | New acceptance test for cloud snapshot backup schedule data source. |
There was a problem hiding this comment.
Validated all 5 review candidates for this PR revision. I approved 4 comments with concrete trigger paths and observable failures (bucket name mismatch on env-provided bucket ID, plus three parallel-test shared-state race conditions), and rejected 1 candidate as speculative due insufficient evidence of a guaranteed failure path.
- Replace err.(*api.Error) type assertions with errors.As to satisfy errorlint (api.Error may be wrapped on retry). - Extract the bucket-resolution ladder from setup() into resolveBucket() so the nestif complexity stays under threshold. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Adds optional TF_VAR_snapshot_cluster_id / TF_VAR_snapshot_bucket_id plumbing (with fallback to the primary cluster/bucket) and snapshotClusterId() / snapshotBucketId() helpers so cloud_snapshot_* tests can target a dedicated cluster in CI when the matching CAPELLA_SNAPSHOT_CLUSTER_ID / CAPELLA_SNAPSHOT_BUCKET_ID secrets are configured. Mirrors the change on PR #582 so this branch picks up the same wiring once merged. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
The snapshot cluster wiring is unused on this branch (no cloud_snapshot_* tests live here) and trips the unused linter. The plumbing is owned by PR #582; removing it here keeps the diff minimal and lint-clean. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
The previous version provisioned a fresh couchbase-capella_backup resource and waited for it to complete. Capella's legacy bucket backup endpoint serialises backups per bucket; when this test runs minutes after TestAccBackupResource on the same shared bucket, the second backup record gets stuck in a non-terminal state and the resource hits its 60 minute completion timeout, blowing out CI. Switch to a read-only data source config that just queries the existing backups list. This still exercises the data source schema, request plumbing, and response decoding without relying on Capella scheduling a new backup mid-test. Specific data.0.* assertions are dropped because the data source's contents depend on whether TestAccBackupResource ran first; the resource and bucket id echo checks remain to confirm the read returned successfully. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Capella's legacy bucket backup endpoint serialises manual backups per
bucket and enforces a per-bucket spacing window — back-to-back manual
backups against the same bucket get stuck in a non-terminal state and
trip the 60 minute resource wait.
Instead of running TestAccBackupResource and TestAccDatasourceBackups
back to back on the same bucket (which was timing out at 60m on CI),
collapse them into a single ParallelTest with three sequential steps:
1. Create the backup resource and verify its attributes.
2. Layer the couchbase-capella_backups data source on top of the
same resource (depends_on) and verify list contents.
3. Import-state verification.
A single backup is created and reused, so the spacing window never
trips. The data source still exercises its read path with concrete
data.0.* assertions.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
- bucket.go: replace bucketExists with fetchBucket that also reads the bucket name; resolveBucket now sets globalBucketName when adopting a TF_VAR_bucket_id-provided bucket, fixing assertion failures when the env bucket is not named "default" - backup_acceptance_test.go: switch to resource.Test (non-parallel) to eliminate racy "latest backup" conflicts between parallel test runs - backup_schedule_acceptance_test.go: switch to resource.Test to serialise access to the per-bucket singleton schedule - cloud_snapshot_backup_schedule_datasource_test.go: switch to resource.Test to avoid racing TestAccSnapshotBackupScheduleResource on the per-cluster singleton schedule Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
The backend intermittently returns HTTP 500 on POST /clusters while still creating the cluster. When this happens setup() returns early without setting globalClusterId, then cleanup() fails to delete the project because the cluster exists (400 "has Cluster resources"). Fix: after createCluster returns an error, scan the project's cluster list by name. If the cluster was actually created, adopt its ID so setup continues normally and cleanup can destroy it. Also reverts the ParallelTest→Test changes from the previous commit; tests should remain parallel per team convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Allows callers to supply an explicit bucket name when providing TF_VAR_bucket_id, as a faster alternative to the GET-bucket fetch that resolveBucket performs when adopting an env-provided bucket. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
- globals.go: restore globalBucketCreated flag - bucket.go: set globalBucketCreated=true after resolveBucket creates a new bucket; add destroyBucket for cleanup - setup_test.go: restrict the AV-129960 cluster-adoption fallback to 5xx errors only so 4xx (auth/validation) failures are not masked; destroy bucket in cleanup when globalBucketCreated is set Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
|
please address the review comments |
…d, not on 5xx adoption Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
…ns don't fail Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Replace TestCheckResourceAttrPair on data.0.id with a custom check that scans all items in the datasource result. The previous assertion only verified index 0, which could match an unrelated existing backup if the created one was not returned first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
Snapshot backup schedule and legacy backup schedule are singletons (per-cluster and per-bucket respectively). Running them under ParallelTest races with other tests that mutate the same resource, causing nondeterministic failures. Serialize both with resource.Test. Also assert copy_to_regions.# = "0" in the snapshot schedule datasource test — the config sets copy_to_regions = null so zero regions is the deterministic expected value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
…egions - Fix errcheck: replace fmt.Sscanf (unchecked return) with strconv.Atoi in testAccCheckDataSourceContainsBackup - Restore resource.ParallelTest on snapshot schedule datasource and backup schedule tests — team convention - Keep copy_to_regions.# = "0" assertion on snapshot schedule datasource (config sets copy_to_regions = null so zero is deterministic) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes. |
createBucket sets globalBucketId once the POST succeeds, so the bucket exists remotely from that point on. Previously globalBucketCreated was only flipped after bucketWait returned successfully, meaning a wait timeout (5 min default) would leave the flag false and skip cleanup — leaking a real bucket across flaky runs. Flip the flag immediately after createBucket succeeds so destroyBucket always runs, regardless of whether the wait step completes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The legacy bucket backup wait was hitting its 60-minute deadline on busy clusters (TestAccBackupResource failing at exactly 3600s). The existing comment "can change after discussion" invited the bump. Two changes: - Increase timeout 60 → 90 minutes to give realistic headroom for bucket backups under CI load. - Reduce poll cadence from every 1s to every 30s. The previous 1s cadence issued ~3600 GETs per wait and likely contributed to backend pressure; 30s still detects completion promptly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jira
Description
Adds acceptance tests for the legacy
couchbase-capella_backupandcouchbase-capella_backup_scheduleresources, plus thecouchbase-capella_cloud_snapshot_backup_scheduledata source. Also hardens the acceptance-test harness so a stale or missingTF_VAR_bucket_idno longer silently propagates into bucket-scoped endpoints (which producedcode:6008404s tracked in AV-127651).Bucket resolution ladder (new)
TF_VAR_bucket_idis set, verify with a GET. If it 404s, log and ignore the env value instead of trusting it.default).Tests added
backup_acceptance_test.go:backup_schedule_acceptance_test.go:cloud_snapshot_backup_schedule_datasource_test.go:Type of Change
Manual Testing Approach
How was this change tested and do you have evidence? (REQUIRED: Select at least 1)
Testing
Testing
All 13 tests pass against a real Capella tenant.
Further comments
Files changed:
acceptance_tests/backup_acceptance_test.go(new)acceptance_tests/backup_schedule_acceptance_test.go(new)acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go(new)acceptance_tests/bucket.go(modified)acceptance_tests/setup_test.go(modified)acceptance_tests/globals.go(modified)