spike: OSD-on-GCP support in rhcs#1164
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds WIF resource and data source, GCP machine-pool and OSD cluster resources, a reusable wif-gcp module and root examples, generated docs/templates, provider registrations, and tests enabling OSD on GCP in the RHCS Terraform provider. ChangesAll changes (single review narrative)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @naeemarsalan. Thanks for your PR. I'm waiting for a terraform-redhat member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (8)
provider/wifconfig/resource_test.go (2)
53-82: 🏗️ Heavy liftConsider expanding data source test coverage.
Similar to the resource tests, these validate structure but not behavior. For production, consider adding tests that cover:
- Data source lookup logic (by ID and display_name per the data source design)
- OCM search/filter behavior
- State population from API responses
- Error cases (not found, ambiguous matches, etc.)
Deferring to a follow-up is acceptable for the current spike scope.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/wifconfig/resource_test.go` around lines 53 - 82, Add behavioral unit tests for WifConfigDataSource: extend provider/wifconfig/resource_test.go to include tests that exercise lookup logic (lookup by ID and display_name) against WifConfigDataSource methods (e.g., call NewDataSource(), then use its Read/ReadData or equivalent lookup entry points), simulate OCM search/filter responses to verify correct candidate selection and handling of ambiguous results, assert state population from mocked API responses into the datasource state, and add error-case tests for not-found and ambiguous matches to ensure diagnostics are produced; target the WifConfigDataSource, NewDataSource, Metadata, and Schema code paths when wiring mocks for OCM interactions.
20-51: 🏗️ Heavy liftConsider expanding test coverage beyond structural checks.
The current tests verify interface compliance and schema structure, which is valuable for compile-time safety. However, for production readiness, consider adding tests that validate:
- CRUD behavior and state mapping logic
- Error handling and validation rules
- ModifyPlan behavior (since
ResourceWithModifyPlanis implemented)- Edge cases and null/unknown value handling
Given this is a spike PR with acknowledged gaps, deferring comprehensive behavior tests to a follow-up is reasonable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/wifconfig/resource_test.go` around lines 20 - 51, Add behavior tests beyond structural checks: write unit tests that exercise wifconfig.New()'s resource implementation (WifConfigResource) for Create/Read/Update/Delete by invoking the resource methods (Create, Read, Update, Delete) with realistic resource.CreateRequest/ReadRequest/UpdateRequest/DeleteRequest and asserting state mapping in responses; add tests for ModifyPlan by calling ModifyPlan (ResourceWithModifyPlan) with plans containing unknown/null values and asserting the modified plan and diagnostics; add tests for ImportState via ImportStateRequest and for validation/error paths to assert diagnostics when invalid input is provided; ensure tests reference WifConfigResource, New(), Metadata, Schema, Create, Read, Update, Delete, ModifyPlan, and ImportState to locate code.provider/machinepool/gcp/machine_pool_resource.go (1)
372-450: ⚡ Quick winDocument the replicas default or remove it if it's an API default.
The
buildMachinePoolObjectmethod sets a default value of 3 forreplicaswhen not specified (line 394). However, the schema does not declare aDefaultattribute forreplicas, and it's unclear whether this is a provider-side default or simply mirroring an API default.As per coding guidelines, "Do not add a default that conflicts with the API default unless documented; adding an incompatible default is a breaking change." If the OCM API already defaults replicas to 3, remove the provider-side default. Otherwise, add a
Defaultattribute to the schema to make this behavior explicit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/machinepool/gcp/machine_pool_resource.go` around lines 372 - 450, The function buildMachinePoolObject currently injects a provider-side default of 3 into the replicas field (local variable replicas and builder.Replicas(...)) when s.Replicas is unset, which may conflict with the OCM API default; either remove this inline default so replicas is omitted when s.Replicas is null/unknown, or add a Default: int64(3) to the Terraform schema for the replicas attribute in the MachinePoolState/schema so the behavior is explicit; update the logic around s.Replicas (and the local replicas variable) and the schema definition in the MachinePool resource accordingly to keep provider behavior consistent with the API and documented via the schema.subsystem/osdgcp/osdgcp_test.go (1)
54-58: 🏗️ Heavy liftExpand subsystem test coverage before production merge.
This integration test currently only validates that the provider loads and schemas are valid—a minimal smoke test appropriate for the spike phase. Before merging to production, add subsystem tests covering:
- Full CRUD lifecycle for
rhcs_wif_config,rhcs_cluster_osd_gcp, andrhcs_gcp_machine_poolresources- WIF config data source lookup scenarios (by id, by display_name)
- Error handling paths (invalid configurations, API failures, not-found cases)
- Idempotent delete and retry behaviors noted in the PR objectives
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@subsystem/osdgcp/osdgcp_test.go` around lines 54 - 58, The current test only calls Terraform.Validate() inside the Describe "OSD-GCP provider init"; extend this test file by adding new It/Describe blocks that implement full subsystem integration tests: add CRUD lifecycle tests for resources rhcs_wif_config, rhcs_cluster_osd_gcp, and rhcs_gcp_machine_pool (create, read/verify, update, delete and verify deletion/idempotent delete), add data source lookup tests for WIF config by id and by display_name, and add negative tests exercising invalid configs, API failure/not-found responses and retry behavior for delete operations; locate and extend the existing Describe("OSD-GCP provider init") block and create helper test functions or Ginkgo It blocks to perform apply/plan/assert sequences and cleanup so the suite exercises idempotency and retry paths.provider/clusterosdgcp/cluster_osd_gcp_resource_test.go (1)
40-65: 🏗️ Heavy liftAdd behavior-focused unit tests for new validator and lifecycle edge cases.
Current tests mostly assert schema/interface presence; they don’t verify key behavior changes (e.g.,
availability_zonesvalidator branches, delete idempotency on 404, and no-op update semantics).As per coding guidelines, "Ensure assertions cover behavior changes, not only happy paths."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/clusterosdgcp/cluster_osd_gcp_resource_test.go` around lines 40 - 65, Tests currently only assert schema presence and validators existence; add behavior-focused unit tests that exercise the new availability_zones validator branches and lifecycle edge cases: write tests that call the validator returned by clusterosdgcp.New().(resource.ResourceWithConfigValidators).ConfigValidators to assert valid/invalid cases for availability_zones, add tests for Delete handling where the provider's Delete function treats a 404 as success (idempotency), and add Update tests asserting no-op semantics when configured fields are unchanged; locate helpers around clusterosdgcp.New, r.ConfigValidators, and the resource lifecycle methods (Create/Read/Update/Delete) to hook in mocked responses and assert the expected behavior rather than only checking schema keys in resp.Schema.Attributes.examples/create_osd_gcp_cluster_full/modules/wif-gcp/main.tf (1)
15-30: ⚡ Quick winAdd an
attribute_conditionon the workload identity provider for defense-in-depth.Line 15-30 config trusts issuer/audience but leaves provider-level claim constraints open. Adding
attribute_condition(for expected claim shape) narrows misuse risk and addresses the current IaC security finding class.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/main.tf` around lines 15 - 30, The google_iam_workload_identity_pool_provider resource (resource "google_iam_workload_identity_pool_provider" "oidc") lacks an attribute_condition to restrict accepted token claims; add an attribute_condition argument to this resource that enforces the expected claim shape (e.g., require specific sub pattern and/or audience) using the mapped assertion fields (like assertion.sub and assertion.aud) so only tokens matching your expected claims are accepted; implement the condition as a Terraform string expression (or via a variable such as var.identity_provider.attribute_condition) and ensure it references the same claim names used in attribute_mapping.examples/create_osd_gcp_cluster_full/modules/wif-gcp/README.md (1)
56-76: ⚡ Quick winFix the usage snippet paths/names to match actual module and resource identifiers.
Line 57 and Line 75 appear outdated: the module source path and resource type in the snippet don’t match this PR’s module folder/resource naming, so users will copy a non-working example.
Proposed snippet correction
module "wif_gcp" { - source = "../../modules/osd-wif-gcp" + source = "../../modules/wif-gcp" ... -resource "osdgoogle_cluster" "cluster" { +resource "rhcs_cluster_osd_gcp" "cluster" {As per coding guidelines,
examples/**/*.tf: "Use provider/resource naming consistent with this repo (rhcs_*, realistic ROSA/OCM scenarios)."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/README.md` around lines 56 - 76, The example uses outdated identifiers: update the module source in module "wif_gcp" (replace "../../modules/osd-wif-gcp") to the actual module folder name used in this repo (use the rhcs_* module name, e.g. "../../modules/rhcs-wif-gcp"), and change the example resource type resource "osdgoogle_cluster" "cluster" to the repo-consistent resource identifier that starts with rhcs_ (replace with the actual rhcs_* resource name used in this codebase); ensure the variable references (rhcs_wif_config.*) remain consistent with the corrected module/resource names.examples/create_osd_gcp_cluster_full/variables.tf (1)
12-28: ⚡ Quick winAdd input validation for constrained variables to fail fast.
compute_nodes,product, andbilling_modelare documented with constraints but currently accept invalid values until provider/API errors.Suggested fix
variable "compute_nodes" { type = number default = 3 description = "Worker node count. Trial quota caps the cluster at 40 worker vCPUs." + validation { + condition = var.compute_nodes >= 1 && floor(var.compute_nodes) == var.compute_nodes + error_message = "compute_nodes must be a positive whole number." + } } @@ variable "product" { type = string default = "osd" description = "OSD product: 'osd' (paid) or 'osdtrial' (60-day evaluation)." + validation { + condition = contains(["osd", "osdtrial"], var.product) + error_message = "product must be either 'osd' or 'osdtrial'." + } } @@ variable "billing_model" { type = string default = null description = "Billing model. Allowed: 'standard' or 'marketplace-gcp'. Leave null to let OCM pick the default for the product ('standard' for osdtrial, 'marketplace-gcp' for osd)." + validation { + condition = var.billing_model == null || contains( + ["standard", "marketplace-gcp"], + var.billing_model + ) + error_message = "billing_model must be null, 'standard', or 'marketplace-gcp'." + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/create_osd_gcp_cluster_full/variables.tf` around lines 12 - 28, Add Terraform variable validation blocks to fail fast: for variable "compute_nodes" add a validation that ensures var.compute_nodes is an integer >= 1 (and optionally enforce an upper bound if you want to reflect the trial quota) with a clear error_message; for variable "product" add a validation that only allows the strings "osd" or "osdtrial" (use contains(["osd","osdtrial"], var.product)) and a descriptive error_message; for variable "billing_model" add a validation that allows null or only the allowed values "standard" or "marketplace-gcp" (e.g., var.billing_model == null || contains([...], var.billing_model)) with an error_message explaining allowed values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/resources/cluster_osd_gcp.md`:
- Line 77: Update the user-facing schema description for "availability_zones" in
docs/resources/cluster_osd_gcp.md by replacing the phrase "bare metal types"
with the hyphenated compound "bare-metal types" so the sentence reads
consistently (e.g., "...Ensure the compute machine type is available in each
zone (e.g. bare-metal types are zone-specific).") and preserve surrounding
punctuation and formatting.
In `@docs/resources/gcp_machine_pool.md`:
- Around line 22-37: The example resource rhcs_gcp_machine_pool "workers_large"
uses a reserved value for the name field ("workers-large") which triggers
machinePoolNameValidator; update the name property to a non-reserved value
(e.g., "large-nodes" or "custom-workers") so it no longer matches the
"workers-*" pattern, leaving the rest of the block (instance_type, replicas,
labels, gcp) unchanged and keeping the resource identifier rhcs_gcp_machine_pool
"workers_large" as-is.
In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tf`:
- Line 2: Update the module's Terraform required_version to at least 1.3.0
because the variables use Terraform type-constraint defaults (e.g.,
optional(list(object(...)), [])). Change the required_version value (the
required_version setting in versions.tf) from ">= 1.0" to ">= 1.3.0" (or a
suitable range like ">= 1.3.0, < 2.0") so the optional(...) syntax is supported.
In `@examples/create_osd_gcp_cluster_full/variables.tf`:
- Around line 1-4: Add an input validation block to the variable "cluster_name"
to enforce RFC1035-like constraints so derived service account IDs like
google_service_account.kms.account_id = "${var.cluster_name}-kms" are valid;
specifically ensure "cluster_name" matches allowed characters (lowercase
letters, digits, and hyphens), does not start or end with a hyphen, and meets
length limits (e.g., 1–63 chars), returning a clear validation message
referencing "cluster_name" when it fails; update the variable "cluster_name"
declaration accordingly so kms.tf's google_service_account.kms.account_id will
always produce a valid account_id.
In `@examples/resources/gcp_machine_pool/example_1.tf`:
- Around line 1-14: The example resource rhcs_gcp_machine_pool "workers_large"
uses name = "workers-large" which violates the machinePoolNameValidator that
reserves names matching "worker" or "workers-*"; update the name attribute on
the rhcs_gcp_machine_pool resource to a value that does not start with "worker"
or "workers-" (for example "large-workers" or "large-pool") so Terraform
validation passes and keep the rest of the resource (labels, gcp block,
instance_type, replicas) unchanged.
In `@provider/clusterosdgcp/cluster_osd_gcp_resource.go`:
- Line 590: In the Delete implementation of the ClusterOSDGCPResource (the
Delete method in cluster_osd_gcp_resource.go), remove the explicit call to
resp.State.RemoveResource(ctx); delete that line so the Delete handler does not
write state, and ensure the method simply returns without setting state or
diagnostics on success (letting the framework remove the resource from state
automatically).
- Around line 543-545: When buildClusterPatch returns nil, Update currently
writes the plan into state (using resp.State.Set with plan) which can persist
unsupported/immutable changes; instead do not write plan — set state from the
existing observed state (the current resource state variable used in Update,
e.g., "state" or "currentState") or simply return without modifying state.
Modify the nil-patch branch in Update so it calls resp.State.Set(ctx, &state)
(or leaves state unchanged) rather than resp.State.Set(ctx, &plan), and keep the
resp.Diagnostics.Append usage for any diagnostic reporting.
- Around line 574-577: In the Delete handler for the cluster resource, change
the error handling after calling
r.collection.Cluster(state.ID.ValueString()).Delete().SendContext(ctx) so that a
"not found"/404 result is treated as success (idempotent) instead of returning
an error; detect the 404 by checking the API client's error type or status code
(e.g., using errors.As / type assertion to the client's error type or inspecting
a StatusCode method) and only call resp.Diagnostics.AddError and return for
non-404 errors, otherwise silently succeed and return.
In `@provider/machinepool/gcp/machine_pool_resource.go`:
- Around line 345-362: In MachinePoolResource.Delete, treat a 404/“not found”
from the chain
r.collection.Cluster(...).MachinePools().MachinePool(...).Delete().SendContext(ctx)
as success instead of adding an error; detect the error (e.g., inspect returned
error or unwrap to check for status code 404 or a sentinel NotFound error) and
only call resp.Diagnostics.AddError for non-404 failures; also remove the
redundant resp.State.RemoveResource(ctx) call since the framework drops state
when no errors are present.
- Around line 452-484: populateState currently always writes state.Replicas from
obj.Replicas(), which causes drift when autoscaling is configured; change the
logic in populateState so that if obj.Autoscaling() != nil you do not overwrite
the plan's null for replicas — set state.Replicas to types.Int64Null() (or
otherwise leave it null) instead of types.Int64Value(int64(obj.Replicas())); if
obj.Autoscaling() == nil keep the existing behavior of setting state.Replicas
from obj.Replicas().
In `@provider/wifconfig/data_source.go`:
- Around line 288-304: The hasID/hasDisplayName checks currently call
config.ID.ValueString() / config.DisplayName.ValueString() after only checking
IsNull(), which misinterprets Unknown values as empty strings; update the logic
in the WIF data source lookup to first check IsUnknown() for config.ID and
config.DisplayName and add a clear resp.Diagnostics.AddError (or AddWarning)
when either field is unknown, and only call ValueString() when the value is
neither Null nor Unknown; reference the variables config.ID, config.DisplayName,
hasID, hasDisplayName and the diagnostic calls in data_source.go to locate and
change the validation flow.
In `@provider/wifconfig/resource.go`:
- Around line 421-427: The ImportState method in WifConfigResource currently
passthroughs the ID without validation; update WifConfigResource.ImportState to
first check req.ID is non-empty (and if your provider uses a specific format,
validate that pattern too), and if invalid add a diagnostic error to
resp.Diagnostics and return; only when the ID is valid call
resource.ImportStatePassthroughID(ctx, path.Root("id"), req, resp). Ensure you
reference req.ID for the check and use resp.Diagnostics to report errors.
- Around line 413-419: WifConfigResource.Delete currently treats any error from
r.wifConfigs.WifConfig(state.ID.ValueString()).Delete().SendContext(ctx) as a
failure and also calls resp.State.RemoveResource(ctx); change it to treat an
HTTP 404 ("not found"/already deleted) as a non-error (do not call
resp.Diagnostics.AddError for that case) and simply return without diagnostics
so the framework will clear state automatically, and remove the explicit
resp.State.RemoveResource(ctx) call; detect 404 by inspecting the error returned
from SendContext (HTTP status or API error type) and only add diagnostics for
other errors.
In `@subsystem/osdgcp/osdgcp_test.go`:
- Line 56: The test is asserting Terraform.Validate() with BeZero() which
requires RunOutput.out/err to be empty; instead check the exit code only. Change
the assertion from Expect(Terraform.Validate()).To(BeZero()) to inspect the
RunOutput returned by Terraform.Validate()/Terraform.Run() (or the RunOutput
variable produced by Terraform.Validate) and assert that RunOutput.ExitCode == 0
(e.g., Expect(runOutput.ExitCode).To(Equal(0))). Keep any stdout/stderr checks
optional and do not rely on them for pass/fail.
- Around line 37-47: The test currently asserts the whole RunOutput with
Expect(Terraform.Validate()).To(BeZero()), which is brittle because RunOutput
contains stdout; instead call Terraform.Validate(), capture the returned
RunOutput (e.g., result := Terraform.Validate()), and assert only on its
ExitCode field (e.g., Expect(result.ExitCode).To(Equal(0)) or
Expect(result.ExitCode).To(BeZero())), leaving stdout/unexported fields
untested; update the assertion around the Terraform.Validate() call to check
RunOutput.ExitCode rather than the whole struct.
---
Nitpick comments:
In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/main.tf`:
- Around line 15-30: The google_iam_workload_identity_pool_provider resource
(resource "google_iam_workload_identity_pool_provider" "oidc") lacks an
attribute_condition to restrict accepted token claims; add an
attribute_condition argument to this resource that enforces the expected claim
shape (e.g., require specific sub pattern and/or audience) using the mapped
assertion fields (like assertion.sub and assertion.aud) so only tokens matching
your expected claims are accepted; implement the condition as a Terraform string
expression (or via a variable such as var.identity_provider.attribute_condition)
and ensure it references the same claim names used in attribute_mapping.
In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/README.md`:
- Around line 56-76: The example uses outdated identifiers: update the module
source in module "wif_gcp" (replace "../../modules/osd-wif-gcp") to the actual
module folder name used in this repo (use the rhcs_* module name, e.g.
"../../modules/rhcs-wif-gcp"), and change the example resource type resource
"osdgoogle_cluster" "cluster" to the repo-consistent resource identifier that
starts with rhcs_ (replace with the actual rhcs_* resource name used in this
codebase); ensure the variable references (rhcs_wif_config.*) remain consistent
with the corrected module/resource names.
In `@examples/create_osd_gcp_cluster_full/variables.tf`:
- Around line 12-28: Add Terraform variable validation blocks to fail fast: for
variable "compute_nodes" add a validation that ensures var.compute_nodes is an
integer >= 1 (and optionally enforce an upper bound if you want to reflect the
trial quota) with a clear error_message; for variable "product" add a validation
that only allows the strings "osd" or "osdtrial" (use
contains(["osd","osdtrial"], var.product)) and a descriptive error_message; for
variable "billing_model" add a validation that allows null or only the allowed
values "standard" or "marketplace-gcp" (e.g., var.billing_model == null ||
contains([...], var.billing_model)) with an error_message explaining allowed
values.
In `@provider/clusterosdgcp/cluster_osd_gcp_resource_test.go`:
- Around line 40-65: Tests currently only assert schema presence and validators
existence; add behavior-focused unit tests that exercise the new
availability_zones validator branches and lifecycle edge cases: write tests that
call the validator returned by
clusterosdgcp.New().(resource.ResourceWithConfigValidators).ConfigValidators to
assert valid/invalid cases for availability_zones, add tests for Delete handling
where the provider's Delete function treats a 404 as success (idempotency), and
add Update tests asserting no-op semantics when configured fields are unchanged;
locate helpers around clusterosdgcp.New, r.ConfigValidators, and the resource
lifecycle methods (Create/Read/Update/Delete) to hook in mocked responses and
assert the expected behavior rather than only checking schema keys in
resp.Schema.Attributes.
In `@provider/machinepool/gcp/machine_pool_resource.go`:
- Around line 372-450: The function buildMachinePoolObject currently injects a
provider-side default of 3 into the replicas field (local variable replicas and
builder.Replicas(...)) when s.Replicas is unset, which may conflict with the OCM
API default; either remove this inline default so replicas is omitted when
s.Replicas is null/unknown, or add a Default: int64(3) to the Terraform schema
for the replicas attribute in the MachinePoolState/schema so the behavior is
explicit; update the logic around s.Replicas (and the local replicas variable)
and the schema definition in the MachinePool resource accordingly to keep
provider behavior consistent with the API and documented via the schema.
In `@provider/wifconfig/resource_test.go`:
- Around line 53-82: Add behavioral unit tests for WifConfigDataSource: extend
provider/wifconfig/resource_test.go to include tests that exercise lookup logic
(lookup by ID and display_name) against WifConfigDataSource methods (e.g., call
NewDataSource(), then use its Read/ReadData or equivalent lookup entry points),
simulate OCM search/filter responses to verify correct candidate selection and
handling of ambiguous results, assert state population from mocked API responses
into the datasource state, and add error-case tests for not-found and ambiguous
matches to ensure diagnostics are produced; target the WifConfigDataSource,
NewDataSource, Metadata, and Schema code paths when wiring mocks for OCM
interactions.
- Around line 20-51: Add behavior tests beyond structural checks: write unit
tests that exercise wifconfig.New()'s resource implementation
(WifConfigResource) for Create/Read/Update/Delete by invoking the resource
methods (Create, Read, Update, Delete) with realistic
resource.CreateRequest/ReadRequest/UpdateRequest/DeleteRequest and asserting
state mapping in responses; add tests for ModifyPlan by calling ModifyPlan
(ResourceWithModifyPlan) with plans containing unknown/null values and asserting
the modified plan and diagnostics; add tests for ImportState via
ImportStateRequest and for validation/error paths to assert diagnostics when
invalid input is provided; ensure tests reference WifConfigResource, New(),
Metadata, Schema, Create, Read, Update, Delete, ModifyPlan, and ImportState to
locate code.
In `@subsystem/osdgcp/osdgcp_test.go`:
- Around line 54-58: The current test only calls Terraform.Validate() inside the
Describe "OSD-GCP provider init"; extend this test file by adding new
It/Describe blocks that implement full subsystem integration tests: add CRUD
lifecycle tests for resources rhcs_wif_config, rhcs_cluster_osd_gcp, and
rhcs_gcp_machine_pool (create, read/verify, update, delete and verify
deletion/idempotent delete), add data source lookup tests for WIF config by id
and by display_name, and add negative tests exercising invalid configs, API
failure/not-found responses and retry behavior for delete operations; locate and
extend the existing Describe("OSD-GCP provider init") block and create helper
test functions or Ginkgo It blocks to perform apply/plan/assert sequences and
cleanup so the suite exercises idempotency and retry paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f9c1ac30-2ed6-4525-9a41-e38b6e52ba2a
📒 Files selected for processing (39)
docs/data-sources/wif_config.mddocs/resources/cluster_osd_gcp.mddocs/resources/gcp_machine_pool.mddocs/resources/wif_config.mdexamples/create_osd_gcp_cluster_full/cluster.tfexamples/create_osd_gcp_cluster_full/host_iam.tfexamples/create_osd_gcp_cluster_full/kms.tfexamples/create_osd_gcp_cluster_full/main.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/README.mdexamples/create_osd_gcp_cluster_full/modules/wif-gcp/iam.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/main.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/outputs.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/service_accounts.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/variables.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tfexamples/create_osd_gcp_cluster_full/outputs.tfexamples/create_osd_gcp_cluster_full/providers.tfexamples/create_osd_gcp_cluster_full/variables.tfexamples/create_osd_gcp_cluster_full/wif.tfexamples/data-sources/wif_config/example_1.tfexamples/resources/cluster_osd_gcp/example_1.tfexamples/resources/gcp_machine_pool/example_1.tfexamples/resources/wif_config/example_1.tfprovider/clusterosdgcp/cluster_osd_gcp_resource.goprovider/clusterosdgcp/cluster_osd_gcp_resource_test.goprovider/clusterosdgcp/cluster_osd_gcp_state.goprovider/machinepool/gcp/machine_pool_resource.goprovider/machinepool/gcp/machine_pool_resource_test.goprovider/machinepool/gcp/machine_pool_state.goprovider/provider.goprovider/wifconfig/data_source.goprovider/wifconfig/resource.goprovider/wifconfig/resource_test.goprovider/wifconfig/state.gosubsystem/osdgcp/osdgcp_test.gotemplates/data-sources/wif_config.md.tmpltemplates/resources/cluster_osd_gcp.md.tmpltemplates/resources/gcp_machine_pool.md.tmpltemplates/resources/wif_config.md.tmpl
| @@ -0,0 +1,10 @@ | |||
| terraform { | |||
| required_version = ">= 1.0" | |||
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the minimum Terraform version that fully supports optional object attributes with default values in type constraints (e.g., optional(list(object(...)), []))?
💡 Result:
Minimum Terraform version: 1.3.0. Terraform 1.3 adds support for optional object type attributes with inline default values in type constraints via optional(type, default) (including for complex nested types such as optional(list(object(...)), []), which uses the same optional(type, default) mechanism). HashiCorp’s v1.3.0 release notes describe “Optional attributes for object type constraints” and explicitly state that optional now accepts an optional second argument for the default value, and recommend requiring v1.3.0 or later for this new syntax [1]. The current Terraform documentation for type constraints also describes that optional attributes can be declared with a default value as the second argument to optional, and that this default is used when the caller omits the attribute [2].
Citations:
- 1: https://github.com/hashicorp/terraform/releases/tag/v1.3.0
- 2: https://developer.hashicorp.com/terraform/language/expressions/type-constraints
Increase Terraform required_version to match optional(..., default) usage
examples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tf sets required_version = ">= 1.0", but this module uses Terraform type-constraint defaults like optional(list(object(...)), []) in modules/wif-gcp/variables.tf, which requires Terraform >= 1.3.0.
Proposed fix
terraform {
- required_version = ">= 1.0"
+ required_version = ">= 1.3"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| required_version = ">= 1.0" | |
| terraform { | |
| required_version = ">= 1.3" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tf` at line 2,
Update the module's Terraform required_version to at least 1.3.0 because the
variables use Terraform type-constraint defaults (e.g.,
optional(list(object(...)), [])). Change the required_version value (the
required_version setting in versions.tf) from ">= 1.0" to ">= 1.3.0" (or a
suitable range like ">= 1.3.0, < 2.0") so the optional(...) syntax is supported.
| variable "cluster_name" { | ||
| type = string | ||
| description = "Cluster name; also used to derive WIF display name and KMS key/SA names." | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the current format and length constraints for GCP IAM service account account_id(including Terraformgoogle_service_account.account_id)?
💡 Result:
Current format + length constraints for GCP IAM service account account_id (and Terraform google_service_account.account_id): 1) Length - Must be between 6 and 30 characters [1][2]. 2) Allowed characters / format - Must consist of lowercase letters, digits, and dashes [1]. - Must match the regex requirement enforced by Terraform documentation: a-z with dashes allowed, consistent with RFC1035 [2]. - Practically, Terraform expects values that fit the RFC1035-style “lowercase + numbers/dashes” pattern (and rejects names that don’t match). For example, Terraform errors when the provided account_id contains characters/format that don’t satisfy its regexp validation [3]. 3) Email generation implication - The account_id is used to generate the service account email address (account_id@PROJECT_ID.iam.gserviceaccount.com) and a stable unique id [2]. Notes specific to Terraform - In the Terraform resource google_service_account, account_id is required, and changing it forces creation of a new service account [2].
Citations:
- 1: https://docs.cloud.google.com/iam/docs/service-accounts-create
- 2: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/google_service_account
- 3: google_servce_account account_id validation failing for old service account hashicorp/terraform-provider-google#21052
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la examples/create_osd_gcp_cluster_full || true
rg -n "variable \"cluster_name\"|cluster_name" examples/create_osd_gcp_cluster_full/variables.tf examples/create_osd_gcp_cluster_full/kms.tf || true
echo "---- variables.tf (first 120 lines) ----"
sed -n '1,120p' examples/create_osd_gcp_cluster_full/variables.tf
echo "---- kms.tf (first 240 lines) ----"
sed -n '1,240p' examples/create_osd_gcp_cluster_full/kms.tfRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 4946
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- variables.tf: lines 1-220 ----"
sed -n '1,220p' examples/create_osd_gcp_cluster_full/variables.tf
echo "---- variables.tf: validation blocks ----"
rg -n "^\s*validation\s*\{" examples/create_osd_gcp_cluster_full/variables.tf || true
echo "---- other cluster_name usages in this example ----"
rg -n "cluster_name" examples/create_osd_gcp_cluster_full -SRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 3388
🌐 Web query:
In Terraform google_service_account resource, what is the exact validation pattern/regex for the account_id argument?
💡 Result:
In the Terraform provider resource google_service_account, the account_id argument is validated to match the exact regular expression ^a-z?$. [1][2] Notes: the provider documentation states account_id “must be 6-30 characters long” and “match the regular expression a-z to comply with RFC1035”, while provider error messages show the regex as ^a-z?$ for minimal examples. [1][2]
Citations:
- 1: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/google_service_account.html
- 2: google_servce_account account_id validation failing for old service account hashicorp/terraform-provider-google#21052
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- variables.tf: lines 1-220 ----"
sed -n '1,220p' examples/create_osd_gcp_cluster_full/variables.tf
echo "---- variables.tf: validation blocks ----"
rg -n "^\s*validation\s*\{" examples/create_osd_gcp_cluster_full/variables.tf || true
echo "---- other cluster_name usages in this example ----"
rg -n "cluster_name" examples/create_osd_gcp_cluster_full -SRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 3388
🌐 Web query:
In Terraform google_service_account resource, what is the exact validation pattern/regex for the account_id argument?
💡 Result:
In the Terraform google_service_account resource, the account_id argument is validated to match the regular expression ^[a-z]$ (as reflected by the provider’s validation error messages and the documented expectation to match /a-z/ as RFC1035). [1][2] So the exact pattern (as enforced by the provider) is: ^[a-z]$. [1][3][4]
Citations:
- 1: google_servce_account account_id validation failing for old service account hashicorp/terraform-provider-google#21052
- 2: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/google_service_account
- 3: Service account account_id can start with a digit hashicorp/terraform-provider-google#9954
- 4: Unable to assign IAM to Google-managed service accounts that begin with a number hashicorp/terraform-provider-google#9027
Constrain cluster_name to avoid invalid derived GCP service account IDs.
examples/create_osd_gcp_cluster_full/kms.tf builds google_service_account.kms.account_id as ${var.cluster_name}-kms. Without validation, non-RFC1035-compliant cluster_name values fail late during apply.
Suggested fix
variable "cluster_name" {
type = string
description = "Cluster name; also used to derive WIF display name and KMS key/SA names."
+ validation {
+ condition = (
+ can(regex("^[a-z][a-z0-9-]*[a-z0-9]$", var.cluster_name)) &&
+ length(var.cluster_name) >= 2 &&
+ length(var.cluster_name) <= 26
+ )
+ error_message = "cluster_name must be lowercase alphanumeric/hyphen, start with a letter, end with alphanumeric, and be <= 26 chars (so derived ${var.cluster_name}-kms stays within service account account_id limits)."
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| variable "cluster_name" { | |
| type = string | |
| description = "Cluster name; also used to derive WIF display name and KMS key/SA names." | |
| } | |
| variable "cluster_name" { | |
| type = string | |
| description = "Cluster name; also used to derive WIF display name and KMS key/SA names." | |
| validation { | |
| condition = ( | |
| can(regex("^[a-z][a-z0-9-]*[a-z0-9]$", var.cluster_name)) && | |
| length(var.cluster_name) >= 2 && | |
| length(var.cluster_name) <= 26 | |
| ) | |
| error_message = "cluster_name must be lowercase alphanumeric/hyphen, start with a letter, end with alphanumeric, and be <= 26 chars (so derived ${var.cluster_name}-kms stays within service account account_id limits)." | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/create_osd_gcp_cluster_full/variables.tf` around lines 1 - 4, Add an
input validation block to the variable "cluster_name" to enforce RFC1035-like
constraints so derived service account IDs like
google_service_account.kms.account_id = "${var.cluster_name}-kms" are valid;
specifically ensure "cluster_name" matches allowed characters (lowercase
letters, digits, and hyphens), does not start or end with a hyphen, and meets
length limits (e.g., 1–63 chars), returning a clear validation message
referencing "cluster_name" when it fails; update the variable "cluster_name"
declaration accordingly so kms.tf's google_service_account.kms.account_id will
always produce a valid account_id.
| hasID := !config.ID.IsNull() && strings.TrimSpace(config.ID.ValueString()) != "" | ||
| hasDisplayName := !config.DisplayName.IsNull() && strings.TrimSpace(config.DisplayName.ValueString()) != "" | ||
|
|
||
| if hasID && hasDisplayName { | ||
| resp.Diagnostics.AddError( | ||
| "Invalid configuration", | ||
| "Cannot specify both id and display_name. Specify exactly one.", | ||
| ) | ||
| return | ||
| } | ||
| if !hasID && !hasDisplayName { | ||
| resp.Diagnostics.AddError( | ||
| "Invalid configuration", | ||
| "Either id or display_name must be specified.", | ||
| ) | ||
| return | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the file and print the relevant region with line numbers
FILE="provider/wifconfig/data_source.go"
if [ ! -f "$FILE" ]; then
echo "Missing $FILE" >&2
exit 1
fi
nl -ba "$FILE" | sed -n '240,340p'
# Search for IsUnknown usage patterns in the provider wifconfig package
rg -n "IsUnknown\\(" provider/wifconfig/data_source.go provider/wifconfig -S || true
# Search for patterns of handling null/unknown for config fields in other data sources
rg -n "ValueString\\(" provider/wifconfig -S
rg -n "Either id or display_name must be specified" -S || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 128
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the file and print the relevant region with line numbers
FILE="provider/wifconfig/data_source.go"
if [ ! -f "$FILE" ]; then
echo "Missing $FILE" >&2
exit 1
fi
nl -ba "$FILE" | sed -n '240,340p'
# Search for IsUnknown usage patterns in the provider wifconfig package
rg -n "IsUnknown\\(" provider/wifconfig/data_source.go provider/wifconfig -S || true
# Search for patterns of handling null/unknown for config fields in other data sources
rg -n "ValueString\\(" provider/wifconfig -S
rg -n "Either id or display_name must be specified" -S || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 128
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
if [ ! -f "$FILE" ]; then
echo "Missing $FILE" >&2
exit 1
fi
# Show requested region (approx) with line numbers via awk
awk 'NR>=240 && NR<=340 { printf "%d\t%s\n", NR, $0 }' "$FILE"
echo "---- rg IsUnknown() in provider/wifconfig ----"
rg -n "IsUnknown\\(" provider/wifconfig -S || true
echo "---- rg ValueString() in provider/wifconfig ----"
rg -n "ValueString\\(" provider/wifconfig -S || true
echo "---- rg the specific error message ----"
rg -n "Either id or display_name must be specified" provider/wifconfig -S || true
echo "---- show any surrounding handling for config.ID/config.DisplayName ----"
rg -n "config\\.(ID|DisplayName)" provider/wifconfig/data_source.go -SRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 5582
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
if [ ! -f "$FILE" ]; then
echo "Missing $FILE" >&2
exit 1
fi
# Show requested region (approx) with line numbers via awk
awk 'NR>=240 && NR<=340 { printf "%d\t%s\n", NR, $0 }' "$FILE"
echo "---- rg IsUnknown() in provider/wifconfig ----"
rg -n "IsUnknown\\(" provider/wifconfig -S || true
echo "---- rg ValueString() in provider/wifconfig ----"
rg -n "ValueString\\(" provider/wifconfig -S || true
echo "---- rg the specific error message ----"
rg -n "Either id or display_name must be specified" provider/wifconfig -S || true
echo "---- show any surrounding handling for config.ID/config.DisplayName ----"
rg -n "config\\.(ID|DisplayName)" provider/wifconfig/data_source.go -SRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 5582
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
awk 'NR>=1 && NR<=220 { printf "%d\t%s\n", NR, $0 }' "$FILE" | tail -n +1
echo "---- search for type WifConfigState / ID / DisplayName ----"
rg -n "type\\s+WifConfigState|WifConfigState" "$FILE"
rg -n "ID\\s+types\\.String|DisplayName\\s+types\\.String|ID\\s+.*Value|DisplayName\\s+.*Value" "$FILE" -S || true
echo "---- show schema attributes around id/display_name (first 220 already shown, but focus next region) ----"
rg -n "id\"|display_name" "$FILE" -n
# Print region around schema for the two attributes
# Try to locate first occurrence line numbers
IDLINE=$(rg -n "\"id\"" "$FILE" | head -n1 | cut -d: -f1 || true)
DLINE=$(rg -n "\"display_name\"" "$FILE" | head -n1 | cut -d: -f1 || true)
echo "IDLINE=$IDLINE DLINE=$DLINE"
if [ -n "${IDLINE:-}" ]; then awk -v s=$((IDLINE-30)) -v e=$((IDLINE+30)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"; fi
if [ -n "${DLINE:-}" ]; then awk -v s=$((DLINE-30)) -v e=$((DLINE+30)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"; fiRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 15583
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
echo "---- top of file (definitions/imports/types) ----"
awk 'NR>=1 && NR<=220 { printf "%d\t%s\n", NR, $0 }' "$FILE"
echo "---- locate WifConfigState ----"
rg -n "type\\s+WifConfigState|WifConfigState" "$FILE"
echo "---- locate ID/DisplayName field declarations ----"
rg -n "ID\\s+types\\.|DisplayName\\s+types\\.|\\bID\\b.*types\\.|\\bDisplayName\\b.*types\\." "$FILE" -S || true
echo "---- locate schema keys ----"
rg -n "\"id\"|\"display_name\"" "$FILE" -n
IDLINE=$(rg -n "\"id\"" "$FILE" | head -n1 | cut -d: -f1 || true)
DLINE=$(rg -n "\"display_name\"" "$FILE" | head -n1 | cut -d: -f1 || true)
echo "IDLINE=$IDLINE DLINE=$DLINE"
if [ -n "${IDLINE:-}" ]; then awk -v s=$((IDLINE-30)) -v e=$((IDLINE+30)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"; fi
if [ -n "${DLINE:-}" ]; then awk -v s=$((DLINE-30)) -v e=$((DLINE+30)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"; fiRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 14166
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- locate WifConfigState definition ----"
rg -n "type\\s+WifConfigState\\s+struct" provider/wifconfig -S || true
rg -n "WifConfigState" provider/wifconfig -S
echo "---- inspect state.go (or whichever file defines it) ----"
STATE_FILE=$(rg -n "type\\s+WifConfigState\\s+struct" provider/wifconfig -S | head -n1 | cut -d: -f1 || true)
if [ -n "${STATE_FILE:-}" ]; then
awk 'NR>=1 && NR<=220 {print NR"\t"$0}' "$STATE_FILE"
fi
echo "---- locate IsUnknown usage in other data sources ----"
rg -n "IsUnknown\\(" provider -S -g'*data_source*.go' || true
echo "---- locate 'IsUnknown()' + 'return' patterns in provider ----"
rg -n "IsUnknown\\(\\).*\\{[[:space:]]*return" provider -S -g'*data_source*.go' || true
echo "---- locate 'ValueString()' patterns with guards ----"
rg -n "IsNull\\(\\).*ValueString\\(|IsUnknown\\(\\).*ValueString\\(" provider -S -g'*data_source*.go' || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 5878
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- locate WifConfigState definition ----"
rg -n "type\\s+WifConfigState\\s+struct" provider/wifconfig -S || true
rg -n "WifConfigState" provider/wifconfig -S
echo "---- inspect state file defining WifConfigState ----"
STATE_FILE=$(rg -n "type\\s+WifConfigState\\s+struct" provider/wifconfig -S | head -n1 | cut -d: -f1 || true)
if [ -n "${STATE_FILE:-}" ] && [ -f "$STATE_FILE" ]; then
echo "STATE_FILE=$STATE_FILE"
awk 'NR>=1 && NR<=260 {print NR"\t"$0}' "$STATE_FILE"
fi
echo "---- locate IsUnknown usage in other data sources ----"
rg -n "IsUnknown\\(" provider -S -g'*data_source*.go' || true
echo "---- locate 'IsUnknown()' + 'return' patterns in provider ----"
rg -n "IsUnknown\\(\\).*\\{[[:space:]]*return" provider -S -g'*data_source*.go' || true
echo "---- locate ValueString() usage with null/unknown guards in data sources ----"
rg -n "IsNull\\(\\).*ValueString\\(|IsUnknown\\(\\).*ValueString\\(" provider -S -g'*data_source*.go' || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 5936
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '80,160p' provider/versions/versions_data_source.go
echo "----"
sed -n '80,160p' provider/cloudprovider/cloud_providers_data_source.goRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 4696
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '80,160p' provider/versions/versions_data_source.go
echo "----"
sed -n '80,160p' provider/cloudprovider/cloud_providers_data_source.goRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 4696
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
echo "---- Read method remainder (320-430) ----"
awk 'NR>=300 && NR<=460 {print NR"\t"$0}' "$FILE"
echo "---- locate resp.State.Set in Read ----"
rg -n "resp\\.State\\.Set" "$FILE"
echo "---- show around resp.State.Set ----"
LINE=$(rg -n "resp\\.State\\.Set" "$FILE" | head -n1 | cut -d: -f1 || true)
if [ -n "${LINE:-}" ]; then
awk -v s=$((LINE-20)) -v e=$((LINE+20)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"
fiRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 2952
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/data_source.go"
echo "---- Read method remainder (300-460) ----"
awk 'NR>=300 && NR<=460 {print NR"\t"$0}' "$FILE"
echo "---- locate resp.State.Set in Read ----"
rg -n "resp\\.State\\.Set" "$FILE"
echo "---- show around resp.State.Set ----"
LINE=$(rg -n "resp\\.State\\.Set" "$FILE" | head -n1 | cut -d: -f1 || true)
if [ -n "${LINE:-}" ]; then
awk -v s=$((LINE-20)) -v e=$((LINE+20)) 'NR>=s && NR<=e {print NR"\t"$0}' "$FILE"
fiRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 2952
🌐 Web query:
terraform-plugin-framework types.String ValueString IsUnknown panic
💡 Result:
In Terraform Plugin Framework, types.String has IsUnknown, and you should never call ValueString/ValueStringPointer when the value is unknown; instead, check IsUnknown first. If you do call ValueString (or any accessor that expects a known value) while the value is unknown, your provider will not have a “real” Go string to return; the correct pattern is to guard with IsUnknown (and IsNull) first. The framework docs show exactly this flow: check data.ExampleAttribute.IsNull / IsUnknown before accessing its known string with ValueString. [1] Key API details for types.String: - (types.String).IsUnknown bool returns true when the string is unknown. [1] - (types.String).ValueString returns the known string, or an empty string if null or unknown. [1] - (types.String).ValueStringPointer returns nil if null, or a pointer to an empty string if unknown. [1] - (types.String).String is reserved for debugging and returns “ ” for null and “ ” for unknown; use ValueString/ValueStringPointer for access. [1] So where does a “panic” typically come from? - The common mistake is treating “unknown” as if it were a known value (e.g., calling ValueString unconditionally and then dereferencing/using it as though it were a real value, or using the debug String result in places that assume a real value). The docs explicitly direct you to use IsUnknown/IsNull checks before accessing. [1] - If the panic you see is actually from Go (e.g., nil pointer dereference) it’s usually caused by downstream code that assumes a non-nil pointer/value. For example, ValueStringPointer is documented to return nil for null, and a pointer to an empty string for unknown; if your code assumes non-nil/non-empty without checking IsNull/IsUnknown, you can trigger your own panic. [1] Practical fix pattern (provider code): - if attr.IsNull {... } - else if attr.IsUnknown {... } - else { v:= attr.ValueString... } This is directly aligned with the framework’s example guidance. [1]
Citations:
Guard types.String.ValueString() against unknown values in WIF data source lookup.
provider/wifconfig/data_source.go computes hasID/hasDisplayName using config.ID.ValueString() and config.DisplayName.ValueString() after only checking IsNull(). With Terraform Plugin Framework, types.String can be Unknown, and ValueString() returns an empty string for unknown—leading to incorrect/misleading “Either id or display_name must be specified.” diagnostics. Add IsUnknown() handling before calling ValueString() (and produce a clear diagnostic when the value is unknown).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@provider/wifconfig/data_source.go` around lines 288 - 304, The
hasID/hasDisplayName checks currently call config.ID.ValueString() /
config.DisplayName.ValueString() after only checking IsNull(), which
misinterprets Unknown values as empty strings; update the logic in the WIF data
source lookup to first check IsUnknown() for config.ID and config.DisplayName
and add a clear resp.Diagnostics.AddError (or AddWarning) when either field is
unknown, and only call ValueString() when the value is neither Null nor Unknown;
reference the variables config.ID, config.DisplayName, hasID, hasDisplayName and
the diagnostic calls in data_source.go to locate and change the validation flow.
| _, err := r.wifConfigs.WifConfig(state.ID.ValueString()).Delete().SendContext(ctx) | ||
| if err != nil { | ||
| resp.Diagnostics.AddError("failed to delete WIF config", err.Error()) | ||
| return | ||
| } | ||
| resp.State.RemoveResource(ctx) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the Delete implementation around the reported lines
sed -n '360,480p' provider/wifconfig/resource.go | cat -n
# Extract the whole Delete method for context (best-effort)
rg -n "func \\(.*\\) Delete\\(" -S provider/wifconfig/resource.go
# Print from the Delete signature to just after its end (heuristic: next "}\n" at same indent)
python3 - <<'PY'
import re, itertools, sys, pathlib
p = pathlib.Path("provider/wifconfig/resource.go")
text = p.read_text()
m = re.search(r'func \([^)]*\) Delete\([^)]*\) [^{]*\{', text)
if not m:
print("Delete not found")
sys.exit(0)
start = m.start()
# naive bracket matching
i = start
depth = 0
in_str = False
escape = False
quote = None
for j,ch in enumerate(text[start:], start=start):
if in_str:
if escape:
escape=False
elif ch=='\\':
escape=True
elif ch==quote:
in_str=False
else:
if ch in ('"',"'","`"):
in_str=True; quote=ch
elif ch=='{':
depth += 1
elif ch=='}':
depth -= 1
if depth==0:
end=j+1
break
else:
end=None
print(text[start:end])
PY
# Check whether other resources handle Delete idempotently / use resp.State.RemoveResource
rg -n "RemoveResource\\(ctx\\)" provider --glob='*.go'
rg -n "StatusNotFound" provider --glob='*.go'
rg -n "Delete\\(" provider --glob='*.go' | head -n 50Repository: terraform-redhat/terraform-provider-rhcs
Length of output: 18989
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show full Delete method with nearby context
sed -n '380,460p' provider/wifconfig/resource.go | cat -n
# Look for existing "ignore 404" patterns related to Delete (status-not-found checks near delete)
rg -n "Status\\(\\) == http\\.StatusNotFound" provider --glob='*.go'
# Look for "StatusNotFound" usage where Delete is called (broader heuristic: NotFound + Delete in the same file)
rg -n "Delete\\(\\)\\.SendContext\\(ctx\\)" provider --glob='*.go' > /tmp/delete_sendctx.txt
python3 - <<'PY'
import re, pathlib
pairs=[]
for line in pathlib.Path('/tmp/delete_sendctx.txt').read_text().splitlines():
path, ln, _ = line.split(':',2)[0], None, None
PY
# Grep for NotFound handling around Delete implementations by searching for "failed to delete" messages
rg -n "failed to delete" provider --glob='*.go' -C 3Repository: terraform-redhat/terraform-provider-rhcs
Length of output: 11117
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect ImageMirrorResource around the NotFound checks
sed -n '280,360p' provider/imagemirror/image_mirror_resource.go | cat -n
sed -n '180,250p' provider/imagemirror/image_mirror_resource.go | cat -n
# Find any Delete() calls that include http.StatusNotFound checks (status check may be via resp.Status() or sdkErr.Status())
rg -n "Delete\\(\\).*SendContext\\(ctx\\)" provider --glob='*.go'
rg -n "Delete\\(.*SendContext\\(ctx\\)" provider --glob='*.go' -C 5 | rg -n "NotFound|Status\\(\\) == http\\.StatusNotFound|Status\\(\\)\\s*!?=\\s*http\\.StatusNotFound|not found"
# Check if resp.State.RemoveResource is actually used in Delete methods as a common convention
rg -n "func \\(.*\\) Delete\\(.*\\)\\s*\\{[\\s\\S]*RemoveResource\\(ctx\\)" provider --glob='*.go' | head -n 20Repository: terraform-redhat/terraform-provider-rhcs
Length of output: 7906
🌐 Web query:
Terraform Plugin Framework Delete resp.State.RemoveResource(ctx) guidance
💡 Result:
In Terraform Plugin Framework, you should generally NOT call resp.State.RemoveResource(ctx) inside Delete unless you have a specific reason; if Delete(ctx, req, resp) returns without adding error diagnostics, the framework automatically calls DeleteResponse.State.RemoveResource. [1][2] What to do in Delete - Use the prior state from req.State to do the remote destroy logic (req.State is populated from Terraform’s prior state; response diagnostics only report status). [1] - Perform your “destroy” side-effects. - If the remote object is already gone, “ignore errors that signify the resource is no longer existent” (i.e., don’t treat 404/not-found as an error; then return). [1] - If you reach the end of Delete without errors, simply return; the framework will remove the resource from state for you. [1][2] When you DO call resp.State.RemoveResource(ctx) - In the Read function to handle “resource no longer existent” (e.g., when the API returns 404) you can call resp.State.RemoveResource(ctx) and return early so Terraform can recreate on the next plan/apply. [3] - Calling RemoveResource explicitly is also warned against in Create/Update Read-like flows because it can lead to “provider produced inconsistent result after apply”; the framework guidance is to ensure removal only happens in the correct operation, and the Delete operation already performs automatic removal on success. [4] Direct guidance for your specific question (Delete + resp.State.RemoveResource(ctx)) - Preferred pattern: - Do not call resp.State.RemoveResource(ctx) in Delete; just ensure no error diagnostics are added on successful deletion. The framework will remove state automatically. [1][2] - Avoid explicitly calling it in Delete unless the framework behavior is not sufficient for your case; the official Delete docs explicitly say to skip calling RemoveResource and rely on automatic handling when there are no error diagnostics. [1] Notes on the API - State.RemoveResource(ctx) clears the entire resource from state by setting Raw to a nil value of the schema type. [2] - State.RemoveResource(ctx) is specifically documented as being automatically called on DeleteResourceResponse.State when the Delete method completes without error. [2]
Citations:
- 1: https://docs.hashicorp.com/terraform/plugin/framework/resources/delete
- 2: https://github.com/hashicorp/terraform-plugin-framework/blob/main/tfsdk/state.go
- 3: https://developer.hashicorp.com/terraform/plugin/framework/resources/read
- 4: https://github.com/hashicorp/web-unified-docs/blob/main/content/terraform-plugin-framework/v1.18.x/docs/plugin/framework/migrating/resources/crud.mdx
Make WifConfigResource.Delete idempotent on 404 and rely on the framework to clear state
- Current
Deletedoesn’t ignore “already deleted” (HTTP 404) and will add an error diagnostic + keep the resource in state. - Remove the explicit
resp.State.RemoveResource(ctx); whenDeletefinishes without diagnostics, the Plugin Framework clears state automatically.
🔧 Proposed fix
- _, err := r.wifConfigs.WifConfig(state.ID.ValueString()).Delete().SendContext(ctx)
+ delResp, err := r.wifConfigs.WifConfig(state.ID.ValueString()).Delete().SendContext(ctx)
if err != nil {
+ if delResp != nil && delResp.Status() == http.StatusNotFound {
+ return
+ }
resp.Diagnostics.AddError("failed to delete WIF config", err.Error())
return
}
- resp.State.RemoveResource(ctx)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@provider/wifconfig/resource.go` around lines 413 - 419,
WifConfigResource.Delete currently treats any error from
r.wifConfigs.WifConfig(state.ID.ValueString()).Delete().SendContext(ctx) as a
failure and also calls resp.State.RemoveResource(ctx); change it to treat an
HTTP 404 ("not found"/already deleted) as a non-error (do not call
resp.Diagnostics.AddError for that case) and simply return without diagnostics
so the framework will clear state automatically, and remove the explicit
resp.State.RemoveResource(ctx) call; detect 404 by inspecting the error returned
from SendContext (HTTP status or API error type) and only add diagnostics for
other errors.
| func (r *WifConfigResource) ImportState( | ||
| ctx context.Context, | ||
| req resource.ImportStateRequest, | ||
| resp *resource.ImportStateResponse, | ||
| ) { | ||
| resource.ImportStatePassthroughID(ctx, path.Root("id"), req, resp) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/resource.go"
# Show around the ImportState implementation
echo "---- ImportState function context ----"
nl -ba "$FILE" | sed -n '380,470p'
# Check whether WifConfigResource implements ResourceWithImportState anywhere
echo "---- interface implementation checks ----"
rg -n "type WifConfigResource|ResourceWithImportState|ImportState\\(" -S "$FILE"
# Search repository for WifConfigResource registration / Schema / ID expectations
echo "---- search for id format/validation helpers in provider/wifconfig ----"
rg -n "Root\\(\"id\"\\)|ImportStatePassthroughID|Invalid import ID|ID must|validate.*id|import.*id" -S provider/wifconfig || true
# Also search entire provider directory for other import implementations to match patterns
echo "---- search other resources' ImportState implementations ----"
rg -n "ImportState\\(\\s*\\n\\s*ctx context\\.Context" -S provider || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 184
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/resource.go"
echo "---- ImportState function context ----"
nl -ba "$FILE" | sed -n '380,470p'
echo "---- interface implementation checks ----"
rg -n "type WifConfigResource|ResourceWithImportState|ImportState\\(" -S "$FILE"
echo "---- search for id format/validation helpers in provider/wifconfig ----"
rg -n "Root\\(\"id\"\\)|ImportStatePassthroughID|Invalid import ID|ID must|validate.*id|import.*id" -S provider/wifconfig || true
echo "---- search other resources' ImportState implementations ----"
rg -n "ImportState\\(" -S provider || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 184
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="provider/wifconfig/resource.go"
echo "---- ImportState function context ----"
nl -ba "$FILE" | sed -n '380,470p'
echo "---- interface implementation checks ----"
rg -n "type WifConfigResource|ResourceWithImportState|ImportState\\(" -S "$FILE"
echo "---- search for id format/validation helpers in provider/wifconfig ----"
rg -n "Root\\(\"id\"\\)|ImportStatePassthroughID|Invalid import ID|ID must|validate.*id|import.*id" -S provider/wifconfig || true
echo "---- search other resources' ImportState implementations ----"
rg -n "ImportState\\(" -S provider || trueRepository: terraform-redhat/terraform-provider-rhcs
Length of output: 184
Add minimal import ID validation (or align with existing patterns).
The current ImportState only passthroughs req.ID and does not show any ID format validation in this resource. Add at least a non-empty guard (and validate the known ID format if the provider defines one) before calling resource.ImportStatePassthroughID.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@provider/wifconfig/resource.go` around lines 421 - 427, The ImportState
method in WifConfigResource currently passthroughs the ID without validation;
update WifConfigResource.ImportState to first check req.ID is non-empty (and if
your provider uses a specific format, validate that pattern too), and if invalid
add a diagnostic error to resp.Diagnostics and return; only when the ID is valid
call resource.ImportStatePassthroughID(ctx, path.Root("id"), req, resp). Ensure
you reference req.ID for the check and use resp.Diagnostics to report errors.
CodeRabbit caught three real bugs and a doc nit on PR terraform-redhat#1164: * examples/resources/gcp_machine_pool/example_1.tf was using name = "workers-large" — exactly the pattern the resource's own machinePoolNameValidator rejects ("workers-*" is reserved for the cluster's default worker pool). Renamed to "large". Doc auto-updates via make docs. * Cluster Update no-op path was writing plan back into state, which would persist plan-only edits to immutable attributes as if they were applied (false-success drift). Now: when buildClusterPatch returns nil, refresh from OCM and write the server view, not the plan. * Schema description: "bare metal types" -> "bare-metal types" (compound adjective). LanguageTool nit, regenerated into docs. Not addressed (intentionally): * CodeRabbit suggested removing resp.State.RemoveResource(ctx) from Delete handlers per generic Plugin Framework guidance, but rhcs's own code (dnsdomain, clusterrosa/classic, machinepool/classic) uses it. Staying consistent with repo precedent. * Idempotent-Delete-on-404 was already fixed in RFC-10. CodeRabbit reviewed an early snapshot of the PR. * Heavy-lift suggestions for full behavioral test coverage on Create/ Read/Update/Delete/ModifyPlan deferred — already tracked in workspace ISSUES.md and the PR body's "gaps remaining" section. Validated: go build, go test ./{clusterosdgcp,wifconfig,machinepool/gcp}, make docs, make fmt all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@naeemarsalan : thanks for leading this effort. Read appreciate your contributions. Regarding the first bug, it is accurate that the GCP Marketplace is not a supported billing model for the OSD Trial. OSD Trial is a legacy trial offering based on annual subscriptions, and there is no way to convert an OSD Trial cluster to a GCP Marketplace billing model. Hence, customers who want to natively transact through the Marketplace (and utilize GCP committed discounts) should transact through the GCP Marketplace from the beginning. |
rhcs has no OSD path today, and no Terraform module on top of rhcs
for OSD either. This adds OSD-on-GCP as a first-class set of resources
so rhcs is the single provider for every supported cluster type. OCM
endpoints are cluster-type-agnostic, so the uplift is schema + wiring
inside the provider, not OCM API changes.
New resources and data sources, all under provider/:
* wifconfig/ - rhcs_wif_config resource + data source.
* clusterosdgcp/ - rhcs_cluster_osd_gcp. Composes WIF + CMEK + PSC +
Shared VPC + private API. Cluster-admin htpasswd
user is folded inline via admin_credentials /
create_admin_user, mirroring rhcs_cluster_rosa_classic.
* machinepool/gcp/ - rhcs_gcp_machine_pool. Distinct TypeName mirroring
the rhcs_hcp_machine_pool precedent.
Existing cluster-agnostic resources reused as-is: rhcs_dns_domain,
rhcs_cluster_wait, rhcs_identity_provider (mobb already imports the
last one directly from rhcs).
Docs, examples, and tests:
* docs/resources/{cluster_osd_gcp,wif_config,gcp_machine_pool}.md +
docs/data-sources/wif_config.md, regenerated via make docs from
templates that match the ROSA convention.
* examples/resources/<name>/example_1.tf for each new endpoint.
* examples/create_osd_gcp_cluster_full/ runbook-style end-to-end
example with WIF + CMEK + Shared VPC + PSC + private API + inline
admin user, plus a bundled wif-gcp sub-module that drives GCP-side
IAM via the hashicorp/google provider directly.
* Unit tests for all three packages (interface assertions, schema,
TypeName) plus internal regression tests pinned to two bugs caught
during smoke testing.
* subsystem/osdgcp/ scaffold suite. Full CRUD coverage matching
subsystem/classic is the largest remaining gap before merge.
Bug fixes folded in, each caught during live smoke testing:
* billing_model is pure pass-through (OCM rejected marketplace-gcp on
osdtrial with CLUSTERS-MGMT-400).
* host-project IAM bindings added to runbook example for Shared VPC
pre-install validation.
* populateState preserves input-only fields when OCM omits them on Get
(was wiping gcp_encryption_key and gcp_network, tripping
plan-after-apply consistency).
* Update no-op path refreshes server state instead of writing the plan
back (otherwise plan-only edits to immutable attributes would persist
as if applied).
* Cluster, wif_config, and machine_pool Delete treat 404 as success
(idempotent). wif_config Delete additionally retries on OCM's
eventually-consistent WIF-in-use error during cluster->wif destroy,
with linear backoff up to 12 attempts.
End-to-end validation: applied against real OCM + real GCP. A
WIF-bound, CMEK-encrypted, Shared-VPC, private-PSC OSD-GCP cluster
reached OCM ready state. Inline admin htpasswd IDP and cluster-admins
group registration confirmed on the live cluster. Full destroy
completed without manual ocm delete intervention; the WIF in-use retry
path was exercised live and the [INFO] log shows the retry firing four
times with linear backoff before succeeding on the fifth attempt.
make pre-push-checks green on the branch tip.
ROSA Classic and HCP code paths are untouched; the new resources are
additive.
This PR was written in part with the assistance of generative AI; all
changes were validated against real OCM + GCP and pass the local
pre-push hook before submission.
Signed-off-by: Anaeem <naeemarsalan@gmail.com>
20e3310 to
8ca838b
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/create_osd_gcp_cluster_full/modules/wif-gcp/README.md`:
- Around line 75-79: The example uses the wrong resource type; update the
Terraform example to reference the new RHCS resource name `rhcs_cluster_osd_gcp`
instead of `osdgoogle_cluster` and ensure attributes (like depends_on, name, and
wif_config_id) remain attached to `rhcs_cluster_osd_gcp` so the end-to-end
example matches the PR-introduced resource; locate the block that defines
`resource "osdgoogle_cluster" "cluster"` and change the resource type to
`rhcs_cluster_osd_gcp` while leaving the resource local name (`cluster`) and
attributes intact.
- Around line 56-57: The README usage example references the wrong module source
path: inside the module block named module "wif_gcp" update the source string
from "../../modules/osd-wif-gcp" to the correct relative path for the new module
(e.g., "../../modules/wif-gcp") so a copy-paste usage example points at the
actual module added in this PR; edit the module "wif_gcp" block in the README.md
to use the correct source path.
In `@provider/clusterosdgcp/cluster_osd_gcp_internal_test.go`:
- Around line 11-14: Replace the machine-local absolute path in the test file
comment with a repo-relative reference: change
"/home/anaeem/rhcs-osd-gcp/ISSUES.md" to "ISSUES.md" (or an explicit issue ID)
in cluster_osd_gcp_internal_test.go so the comment points to a
repository-relative file; update the comment that references the bug for tests
exercising buildClusterObject and populateState to use the repo-relative path or
issue number.
In `@provider/wifconfig/resource.go`:
- Around line 502-529: populateState in WifConfigResource never sets
state.OpenshiftVersion, so the data source and imports return null; update
populateState to populate the OpenshiftVersion field on WifConfigState from the
API object (e.g., call obj.OpenshiftVersion() or the correct getter on
*cmv1.WifConfig) and assign it via types.StringValue(...) when present,
otherwise set an empty types.StringValue(""); ensure the assignment is added
alongside the existing state.ID/DisplayName logic so imports and the
rhcs_wif_config data source can read the value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 89399ae7-9e3e-4730-9d95-004502f565b8
📒 Files selected for processing (40)
docs/data-sources/wif_config.mddocs/resources/cluster_osd_gcp.mddocs/resources/gcp_machine_pool.mddocs/resources/wif_config.mdexamples/create_osd_gcp_cluster_full/cluster.tfexamples/create_osd_gcp_cluster_full/host_iam.tfexamples/create_osd_gcp_cluster_full/kms.tfexamples/create_osd_gcp_cluster_full/main.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/README.mdexamples/create_osd_gcp_cluster_full/modules/wif-gcp/iam.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/main.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/outputs.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/service_accounts.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/variables.tfexamples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tfexamples/create_osd_gcp_cluster_full/outputs.tfexamples/create_osd_gcp_cluster_full/providers.tfexamples/create_osd_gcp_cluster_full/variables.tfexamples/create_osd_gcp_cluster_full/wif.tfexamples/data-sources/wif_config/example_1.tfexamples/resources/cluster_osd_gcp/example_1.tfexamples/resources/gcp_machine_pool/example_1.tfexamples/resources/wif_config/example_1.tfprovider/clusterosdgcp/cluster_osd_gcp_internal_test.goprovider/clusterosdgcp/cluster_osd_gcp_resource.goprovider/clusterosdgcp/cluster_osd_gcp_resource_test.goprovider/clusterosdgcp/cluster_osd_gcp_state.goprovider/machinepool/gcp/machine_pool_resource.goprovider/machinepool/gcp/machine_pool_resource_test.goprovider/machinepool/gcp/machine_pool_state.goprovider/provider.goprovider/wifconfig/data_source.goprovider/wifconfig/resource.goprovider/wifconfig/resource_test.goprovider/wifconfig/state.gosubsystem/osdgcp/osdgcp_test.gotemplates/data-sources/wif_config.md.tmpltemplates/resources/cluster_osd_gcp.md.tmpltemplates/resources/gcp_machine_pool.md.tmpltemplates/resources/wif_config.md.tmpl
✅ Files skipped from review due to trivial changes (6)
- docs/data-sources/wif_config.md
- templates/data-sources/wif_config.md.tmpl
- templates/resources/gcp_machine_pool.md.tmpl
- templates/resources/wif_config.md.tmpl
- docs/resources/cluster_osd_gcp.md
- templates/resources/cluster_osd_gcp.md.tmpl
🚧 Files skipped from review as they are similar to previous changes (27)
- examples/resources/gcp_machine_pool/example_1.tf
- examples/create_osd_gcp_cluster_full/outputs.tf
- examples/create_osd_gcp_cluster_full/modules/wif-gcp/outputs.tf
- provider/wifconfig/state.go
- examples/create_osd_gcp_cluster_full/variables.tf
- examples/create_osd_gcp_cluster_full/providers.tf
- examples/create_osd_gcp_cluster_full/modules/wif-gcp/versions.tf
- subsystem/osdgcp/osdgcp_test.go
- examples/resources/wif_config/example_1.tf
- provider/clusterosdgcp/cluster_osd_gcp_resource_test.go
- examples/create_osd_gcp_cluster_full/modules/wif-gcp/variables.tf
- examples/data-sources/wif_config/example_1.tf
- examples/create_osd_gcp_cluster_full/main.tf
- provider/clusterosdgcp/cluster_osd_gcp_state.go
- examples/create_osd_gcp_cluster_full/kms.tf
- provider/provider.go
- examples/create_osd_gcp_cluster_full/wif.tf
- examples/resources/cluster_osd_gcp/example_1.tf
- examples/create_osd_gcp_cluster_full/modules/wif-gcp/service_accounts.tf
- provider/wifconfig/resource_test.go
- provider/machinepool/gcp/machine_pool_state.go
- examples/create_osd_gcp_cluster_full/cluster.tf
- examples/create_osd_gcp_cluster_full/modules/wif-gcp/iam.tf
- provider/machinepool/gcp/machine_pool_resource_test.go
- examples/create_osd_gcp_cluster_full/host_iam.tf
- provider/clusterosdgcp/cluster_osd_gcp_resource.go
- provider/machinepool/gcp/machine_pool_resource.go
Three nits from the PR review:
* examples/create_osd_gcp_cluster_full/modules/wif-gcp/README.md was
carrying stale references from the mobb provider it was ported from:
the embedded module source path pointed at "../../modules/osd-wif-gcp"
(which never existed in rhcs) and the example downstream cluster was
declared as osdgoogle_cluster rather than rhcs_cluster_osd_gcp.
Updated to the in-PR module path "./modules/wif-gcp" and the
rhcs_cluster_osd_gcp resource type so a copy-paste of the example
actually resolves.
* provider/clusterosdgcp/cluster_osd_gcp_internal_test.go was referring
to bugs by an absolute path to the workspace ISSUES.md file
(/home/anaeem/...). That path means nothing once the file leaves my
machine. Replaced with a repo-relative "smoke-test bug #N" phrasing
that matches the PR cover letter's enumeration so the link survives.
* provider/wifconfig/resource.go populateState was silently dropping
openshift_version when refreshing from OCM. Confirmed against the
ocm-sdk-go WifConfig type that there is no OpenshiftVersion()
getter; only WifTemplates() ([]string) is exposed, which is a lossy
transform of the user input (4.21.15 -> v4.21 strips the patch).
Since openshift_version is an Optional (not Computed) attribute,
writing a derived value would surface as a permanent plan diff on
every refresh. Documented this deliberately as a comment on
populateState; on import the attribute lands null and the user can
re-add it. Not writing a wrong value is the conservative behaviour
here.
Validated: go build, go test ./{clusterosdgcp,wifconfig}, make fmt
all clean.
Signed-off-by: Anaeem <naeemarsalan@gmail.com>
CodeRabbit re-flagged the populateState gap on second pass. Verified again at the pinned ocm-sdk-go v0.1.501 that there is no WifConfig.OpenshiftVersion() getter -- only WifTemplates() ([]string). The literal suggested fix does not compile. Instead, surface a best-effort value derived from WifTemplates -- but only when state.OpenshiftVersion is null or unknown (the import path, or the data source Read where state starts at its zero value). On a normal Read after Create the state already carries the user's value (e.g. "4.21.15") and we leave it alone, so the lossy round-trip (4.21.15 -> v4.21 -> 4.21) does not produce a permanent plan diff against the user's HCL. Net effect: * `terraform import rhcs_wif_config.wif <id>` now lands a populated openshift_version field (the WifTemplates "vX.Y" value, minus the leading "v"). Was null before. * `rhcs_wif_config` data source picks up openshift_version on lookup. Was null before. * Regular Read after Create: unchanged. State still carries the user-supplied value; no spurious diff. Validated: go build, go test ./provider/wifconfig/... clean. Signed-off-by: Anaeem <naeemarsalan@gmail.com>
Spike. Not for merge yet.
why
what's new
rhcs_wif_configresource + data source —provider/wifconfig/rhcs_cluster_osd_gcp—provider/clusterosdgcp/. Composes WIF + CMEK + PSC + Shared VPC + private API. Cluster-admin htpasswd user is folded inline likerhcs_cluster_rosa_classicrhcs_gcp_machine_pool—provider/machinepool/gcp/. Distinct TypeName mirroringrhcs_hcp_machine_poolexamples/create_osd_gcp_cluster_full/. GCP IAM ships as a bundled sub-module that could graduate into aterraform-rhcs-osd-gcprepoexample_1.tffor the docs Example Usage section, regenerated viamake docsprovider/imagemirror/. Internal tests pinned to the bugs caught during smoke testingwhat we tested
make pre-push-checksgreen (build, fmt, lint, docs-lint, coverage, all 4 subsystem suites: classic 236/236, hcp 196/196, osdgcp 1/1, framework 0)make unit-testandmake e2e-unit-testpass (Prowci/prow/unitequivalents)docker build .succeeds (Prowci/prow/e2e-imagesequivalent)dev_overrides,terraform validatecleanreadystate. API URL + console URL live; inline admin htpasswd IDP + cluster-admins group both confirmed via OCMocm deleteintervention needed[INFO] WIF delete retry pendinglog lines from the live destroy showing retry firing 4 times with linear backoff (15s / 30s / 45s / 60s) on OCMCLUSTERS-MGMT-400 ... is in use and cannot be deleted— then succeeded on attempt 5 once OCM released the cluster->wif referencebugs caught + fixed during smoke testing (now folded into the single commit)
marketplace-gcpbilling_model rejected by OCM for OSD Trialbilling_modelnow pure pass-through, OCM picks per-product defaulthost_iam.tfto the example with dynamic SA prefix filteradmin_credentialsoutputs (OCM doesn't return passwords on Read)try()populateStatewas nilling input-only fields (gcp_encryption_key,gcp_network) when OCM omitted them on GetBeZero()on the whole RunOutput structExitCodeonlygaps remaining before production
These are tracked in workspace docs (not in-tree).
subsystem/osdgcp/osdgcp_test.gois scaffold-only. Need to mirrorsubsystem/classic/cluster_resource_rosa_create_test.go(~479 lines, hand-rolled mock OCM JSON).WaitForClusterToBeReadyexceeds the wait_timeout (default 60min, hit twice in our smoke test), Terraform discards the resource but OCM keeps the cluster. Recovery needsterraform import. Fix: write minimal state with the cluster id before the wait.TestClusterDelete_IdempotentOn404,TestWifConfigDelete_IdempotentOn404+ retry,TestMachinePoolDelete_IdempotentOn404. Need a fake OCM HTTP server.OSDGCP-1placeholder in the commit subject.known limitations not in our control
cluster_nameafter destroy hits 409 on the keyring and 400 on the key when the cluster's install tries to encrypt VM disks. Workaround:terraform importthe keyring/key, thencryptoKeyVersions/<n>:restore+ PATCHstate:ENABLED. Will be addressed in the example withlifecycle.prevent_destroy = trueor a random suffix in a follow-up.not changed
ROSA Classic + HCP code paths untouched. New resources are additive.
AGENTS.mdsays rhcs is ROSA-only by policy. Putting this up early so the scope conversation can happen before more time goes in.Summary by CodeRabbit
New Features
Documentation