Skip to content

[AV-127651] Add legacy backup/schedule + snapshot schedule acceptance tests; harden bucket setup#583

Merged
panigrahisubhrajit merged 21 commits into
mainfrom
AV-127651-fix-legacy-backup
May 14, 2026
Merged

[AV-127651] Add legacy backup/schedule + snapshot schedule acceptance tests; harden bucket setup#583
panigrahisubhrajit merged 21 commits into
mainfrom
AV-127651-fix-legacy-backup

Conversation

@panigrahisubhrajit
Copy link
Copy Markdown
Contributor

@panigrahisubhrajit panigrahisubhrajit commented May 7, 2026

Jira

Description

Adds acceptance tests for the legacy couchbase-capella_backup and couchbase-capella_backup_schedule resources, plus the couchbase-capella_cloud_snapshot_backup_schedule data source. Also hardens the acceptance-test harness so a stale or missing TF_VAR_bucket_id no longer silently propagates into bucket-scoped endpoints (which produced code:6008 404s tracked in AV-127651).

Bucket resolution ladder (new)

  1. If TF_VAR_bucket_id is set, verify with a GET. If it 404s, log and ignore the env value instead of trusting it.
  2. List buckets on the cluster and adopt the first one (preferring default).
  3. If the cluster has none, create a fresh test bucket and tear it down in cleanup. Pre-existing buckets are not destroyed.

Tests added

backup_acceptance_test.go:

  • TestAccBackupResource, TestAccBackupResourceInvalidBucket, TestAccBackupResourceInvalidProject, TestAccBackupResourceRestoreTimesOnCreate, TestAccDatasourceBackups

backup_schedule_acceptance_test.go:

  • TestAccBackupScheduleResource, TestAccBackupScheduleResourceInvalidDayOfWeek, TestAccBackupScheduleResourceInvalidRetention, TestAccBackupScheduleResourceInvalidType, TestAccBackupScheduleResourceInvalidStartAt, TestAccBackupScheduleResourceInvalidBucket

cloud_snapshot_backup_schedule_datasource_test.go:

  • TestAccDatasourceCloudSnapshotBackupSchedule, TestAccDatasourceCloudSnapshotBackupScheduleInvalidCluster

Type of Change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change updates the ci/cd workflow.
  • Documentation fix/enhancement.

Manual Testing Approach

How was this change tested and do you have evidence? (REQUIRED: Select at least 1)

  • Manually tested
  • Unit tested
  • Acceptance tested
  • Unable to test / will not test (Please provide comments in section below)

Testing

Testing

All 13 tests pass against a real Capella tenant.

Further comments

Files changed:

  • acceptance_tests/backup_acceptance_test.go (new)
  • acceptance_tests/backup_schedule_acceptance_test.go (new)
  • acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go (new)
  • acceptance_tests/bucket.go (modified)
  • acceptance_tests/setup_test.go (modified)
  • acceptance_tests/globals.go (modified)

panigrahisubhrajit and others added 2 commits May 6, 2026 21:26
…and cloud snapshot backup schedule

These tests are added to validate the legacy backup endpoint and related
flows. The legacy backup resource currently fails creation with HTTP 404
"Unable to find the specified bucket" against a bucket that the bucket
endpoint resolves successfully. This will be investigated and fixed in
this branch.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…etup

If TF_VAR_bucket_id is set but does not resolve, fall through to
discovery instead of trusting the value blindly. If no buckets exist on
the cluster, create one and tear it down at the end. Buckets that
pre-existed (or were supplied via env var) are not destroyed.

This eliminates the silent 404 from bucket-scoped endpoints (legacy
backup, backup schedule, etc.) when the env var points to a stale or
typo'd id.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 03:50
@panigrahisubhrajit panigrahisubhrajit requested a review from a team as a code owner May 7, 2026 03:50
@factory-droid
Copy link
Copy Markdown
Contributor

factory-droid Bot commented May 7, 2026

Droid finished @panigrahisubhrajit's task —— View job


@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new acceptance test coverage for legacy backup resources and the cloud snapshot backup schedule data source, and hardens the acceptance-test setup to avoid stale/missing bucket IDs breaking bucket-scoped endpoints.

Changes:

  • Added acceptance tests for couchbase-capella_backup, couchbase-capella_backup_schedule, and couchbase-capella_cloud_snapshot_backup_schedule (data source).
  • Updated acceptance-test harness bucket resolution to validate TF_VAR_bucket_id, discover an existing bucket, or create/cleanup a test bucket.
  • Introduced bucket helper functions (bucketExists, discoverFirstBucket, destroyBucket) and tracked whether setup created the bucket.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
acceptance_tests/setup_test.go Implements bucket resolution ladder and conditional bucket cleanup.
acceptance_tests/globals.go Adds globalBucketCreated flag to track bucket lifecycle ownership.
acceptance_tests/bucket.go Adds bucket existence/discovery helpers and delete support for setup cleanup.
acceptance_tests/backup_acceptance_test.go New acceptance tests for legacy backup resource + backups data source.
acceptance_tests/backup_schedule_acceptance_test.go New acceptance tests for legacy backup schedule resource (incl. import + invalid cases).
acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go New acceptance test for cloud snapshot backup schedule data source.

Comment thread acceptance_tests/setup_test.go Outdated
Comment thread acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go
Comment thread acceptance_tests/setup_test.go Outdated
Comment thread acceptance_tests/backup_schedule_acceptance_test.go
Comment thread acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go
Comment thread acceptance_tests/backup_acceptance_test.go
Copy link
Copy Markdown
Contributor

@factory-droid factory-droid Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated all 5 review candidates for this PR revision. I approved 4 comments with concrete trigger paths and observable failures (bucket name mismatch on env-provided bucket ID, plus three parallel-test shared-state race conditions), and rejected 1 candidate as speculative due insufficient evidence of a guaranteed failure path.

- Replace err.(*api.Error) type assertions with errors.As to satisfy
  errorlint (api.Error may be wrapped on retry).
- Extract the bucket-resolution ladder from setup() into resolveBucket()
  so the nestif complexity stays under threshold.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Adds optional TF_VAR_snapshot_cluster_id / TF_VAR_snapshot_bucket_id
plumbing (with fallback to the primary cluster/bucket) and snapshotClusterId() /
snapshotBucketId() helpers so cloud_snapshot_* tests can target a dedicated
cluster in CI when the matching CAPELLA_SNAPSHOT_CLUSTER_ID /
CAPELLA_SNAPSHOT_BUCKET_ID secrets are configured. Mirrors the change on
PR #582 so this branch picks up the same wiring once merged.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

The snapshot cluster wiring is unused on this branch (no cloud_snapshot_*
tests live here) and trips the unused linter. The plumbing is owned by
PR #582; removing it here keeps the diff minimal and lint-clean.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

The previous version provisioned a fresh couchbase-capella_backup
resource and waited for it to complete. Capella's legacy bucket backup
endpoint serialises backups per bucket; when this test runs minutes
after TestAccBackupResource on the same shared bucket, the second
backup record gets stuck in a non-terminal state and the resource
hits its 60 minute completion timeout, blowing out CI.

Switch to a read-only data source config that just queries the existing
backups list. This still exercises the data source schema, request
plumbing, and response decoding without relying on Capella scheduling a
new backup mid-test. Specific data.0.* assertions are dropped because
the data source's contents depend on whether TestAccBackupResource ran
first; the resource and bucket id echo checks remain to confirm the
read returned successfully.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Capella's legacy bucket backup endpoint serialises manual backups per
bucket and enforces a per-bucket spacing window — back-to-back manual
backups against the same bucket get stuck in a non-terminal state and
trip the 60 minute resource wait.

Instead of running TestAccBackupResource and TestAccDatasourceBackups
back to back on the same bucket (which was timing out at 60m on CI),
collapse them into a single ParallelTest with three sequential steps:

  1. Create the backup resource and verify its attributes.
  2. Layer the couchbase-capella_backups data source on top of the
     same resource (depends_on) and verify list contents.
  3. Import-state verification.

A single backup is created and reused, so the spacing window never
trips. The data source still exercises its read path with concrete
data.0.* assertions.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

- bucket.go: replace bucketExists with fetchBucket that also reads the
  bucket name; resolveBucket now sets globalBucketName when adopting a
  TF_VAR_bucket_id-provided bucket, fixing assertion failures when the
  env bucket is not named "default"
- backup_acceptance_test.go: switch to resource.Test (non-parallel) to
  eliminate racy "latest backup" conflicts between parallel test runs
- backup_schedule_acceptance_test.go: switch to resource.Test to
  serialise access to the per-bucket singleton schedule
- cloud_snapshot_backup_schedule_datasource_test.go: switch to
  resource.Test to avoid racing TestAccSnapshotBackupScheduleResource
  on the per-cluster singleton schedule

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

The backend intermittently returns HTTP 500 on POST /clusters while
still creating the cluster. When this happens setup() returns early
without setting globalClusterId, then cleanup() fails to delete the
project because the cluster exists (400 "has Cluster resources").

Fix: after createCluster returns an error, scan the project's cluster
list by name. If the cluster was actually created, adopt its ID so
setup continues normally and cleanup can destroy it.

Also reverts the ParallelTest→Test changes from the previous commit;
tests should remain parallel per team convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Allows callers to supply an explicit bucket name when providing
TF_VAR_bucket_id, as a faster alternative to the GET-bucket fetch that
resolveBucket performs when adopting an env-provided bucket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread acceptance_tests/setup_test.go
Comment thread acceptance_tests/setup_test.go
Comment thread acceptance_tests/bucket.go
Comment thread acceptance_tests/backup_acceptance_test.go
- globals.go: restore globalBucketCreated flag
- bucket.go: set globalBucketCreated=true after resolveBucket creates a
  new bucket; add destroyBucket for cleanup
- setup_test.go: restrict the AV-129960 cluster-adoption fallback to
  5xx errors only so 4xx (auth/validation) failures are not masked;
  destroy bucket in cleanup when globalBucketCreated is set

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comment thread acceptance_tests/setup_test.go
Comment thread acceptance_tests/backup_acceptance_test.go
Comment thread acceptance_tests/cloud_snapshot_backup_schedule_datasource_test.go
@stanleefdz
Copy link
Copy Markdown
Contributor

please address the review comments

…d, not on 5xx adoption

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

…ns don't fail

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Replace TestCheckResourceAttrPair on data.0.id with a custom check that
scans all items in the datasource result. The previous assertion only
verified index 0, which could match an unrelated existing backup if the
created one was not returned first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Snapshot backup schedule and legacy backup schedule are singletons
(per-cluster and per-bucket respectively). Running them under
ParallelTest races with other tests that mutate the same resource,
causing nondeterministic failures. Serialize both with resource.Test.

Also assert copy_to_regions.# = "0" in the snapshot schedule datasource
test — the config sets copy_to_regions = null so zero regions is the
deterministic expected value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

…egions

- Fix errcheck: replace fmt.Sscanf (unchecked return) with strconv.Atoi
  in testAccCheckDataSourceContainsBackup
- Restore resource.ParallelTest on snapshot schedule datasource and
  backup schedule tests — team convention
- Keep copy_to_regions.# = "0" assertion on snapshot schedule datasource
  (config sets copy_to_regions = null so zero is deterministic)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

No PR template selections were detected. Please make sure to fill out the PR template properly by selecting the appropriate checkboxes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread acceptance_tests/bucket.go Outdated
Comment thread acceptance_tests/backup_acceptance_test.go
createBucket sets globalBucketId once the POST succeeds, so the bucket
exists remotely from that point on. Previously globalBucketCreated was
only flipped after bucketWait returned successfully, meaning a wait
timeout (5 min default) would leave the flag false and skip cleanup —
leaking a real bucket across flaky runs.

Flip the flag immediately after createBucket succeeds so destroyBucket
always runs, regardless of whether the wait step completes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label May 12, 2026
panigrahisubhrajit and others added 2 commits May 12, 2026 20:04
The legacy bucket backup wait was hitting its 60-minute deadline on
busy clusters (TestAccBackupResource failing at exactly 3600s). The
existing comment "can change after discussion" invited the bump.

Two changes:
- Increase timeout 60 → 90 minutes to give realistic headroom for
  bucket backups under CI load.
- Reduce poll cadence from every 1s to every 30s. The previous 1s
  cadence issued ~3600 GETs per wait and likely contributed to backend
  pressure; 30s still detects completion promptly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@panigrahisubhrajit panigrahisubhrajit merged commit a6ab9bc into main May 14, 2026
13 checks passed
@panigrahisubhrajit panigrahisubhrajit deleted the AV-127651-fix-legacy-backup branch May 14, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants