OCM-24713 | feat(machine-pool): add node_drain_grace_period attribute#145
Conversation
WalkthroughThis PR extends the Changesaws_node_pool schema extensions
CI/Build tooling
🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
variables.tf (2)
415-415:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument node_drain_grace_period range and requirements in the variable description.
The
machine_poolsdescription doesn't mention the valid range (0–10080 minutes) or provider requirements for the newnode_drain_grace_periodfield. Users will encounter validation errors without clear guidance from the variable documentation. As per coding guidelines, document minimum OpenShift version requirements in the variable description when a feature needs a specific minimum version.Consider updating the description to include the valid range and any version/provider requirements mentioned in the PR summary (RHCS provider >= 1.7.6).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@variables.tf` at line 415, Update the machine_pools variable description to document the new node_drain_grace_period field: state that node_drain_grace_period is specified in minutes and must be within the range 0–10080, indicate whether 0 disables graceful drain, and add provider/version requirements (RHCS provider >= 1.7.6) plus the minimum OpenShift version needed for this feature; reference the machine_pools variable and the node_drain_grace_period field in the description so users get validation guidance and required versions.
383-416: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd root-level validation for node_drain_grace_period to catch errors early.
The validation for
node_drain_grace_period(0–10080 range) exists only at the module level. Users won't see validation errors until module instantiation, which delays feedback. As per coding guidelines, add root validation blocks for cross-field map validation rules that users hit early; child modules may keep lifecycle precondition as a second line of defense.🛡️ Proposed validation block to add
default = {} description = "Provides a typed map to add multiple machine pools after cluster creation. Each key is an arbitrary label; each value aligns with the [machine-pool](./modules/machine-pool) submodule (required: name, subnet_id, openshift_version, aws_node_pool). Optional fields match that module's optional inputs; omit autoscaling to use a fixed replica count with autoscaling disabled." + + validation { + condition = alltrue([ + for _, mp in var.machine_pools : + mp.aws_node_pool.node_drain_grace_period == null ? true : ( + mp.aws_node_pool.node_drain_grace_period >= 0 && + mp.aws_node_pool.node_drain_grace_period <= 10080 + ) + ]) + error_message = "node_drain_grace_period must be between 0 and 10080 minutes (7 days)." + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@variables.tf` around lines 383 - 416, Add a root-level validation block to variable "machine_pools" that checks each map entry's aws_node_pool.node_drain_grace_period (when present) is between 0 and 10080; implement the condition using a for-expression over var.machine_pools and try(...) to allow missing values (e.g., require either try(entry.value.aws_node_pool.node_drain_grace_period, null) == null or the numeric range check), and provide a clear error_message mentioning machine_pools and aws_node_pool.node_drain_grace_period so users get immediate validation before module instantiation.
🧹 Nitpick comments (1)
modules/machine-pool/tests/aws_node_pool.tftest.hcl (1)
157-235: ⚡ Quick winConsider adding boundary value tests for comprehensive coverage.
The current tests cover invalid (>10080), valid (60), and null cases. Consider adding tests for the boundary values (0, 10080) and a negative value (-1) to ensure the validation correctly handles edge cases.
🧪 Suggested additional test cases
# Test minimum boundary value (0 minutes) run "node_drain_grace_period_zero_plan" { command = plan providers = { rhcs = rhcs.no_override } variables { cluster_id = "fake-cluster-123" name = "test-pool" subnet_id = "subnet-fake123" openshift_version = "4.15.0" aws_node_pool = { instance_type = "m5.xlarge" tags = {} node_drain_grace_period = 0 } } assert { condition = var.aws_node_pool.node_drain_grace_period == 0 error_message = "Expected node_drain_grace_period to be 0." } } # Test maximum boundary value (10080 minutes) run "node_drain_grace_period_max_plan" { command = plan providers = { rhcs = rhcs.no_override } variables { cluster_id = "fake-cluster-123" name = "test-pool" subnet_id = "subnet-fake123" openshift_version = "4.15.0" aws_node_pool = { instance_type = "m5.xlarge" tags = {} node_drain_grace_period = 10080 } } assert { condition = var.aws_node_pool.node_drain_grace_period == 10080 error_message = "Expected node_drain_grace_period to be 10080." } } # Test negative value fails validation run "negative_node_drain_grace_period_fails" { command = plan providers = { rhcs = rhcs.no_override } variables { cluster_id = "fake-cluster-123" name = "test-pool" subnet_id = "subnet-fake123" openshift_version = "4.15.0" aws_node_pool = { instance_type = "m5.xlarge" tags = {} node_drain_grace_period = -1 } } expect_failures = [ var.aws_node_pool, ] }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modules/machine-pool/tests/aws_node_pool.tftest.hcl` around lines 157 - 235, Add boundary and negative-value tests for aws_node_pool.node_drain_grace_period: create new tftest runs similar to existing ones using run names like "node_drain_grace_period_zero_plan", "node_drain_grace_period_max_plan", and "negative_node_drain_grace_period_fails"; for the zero and max cases set aws_node_pool.node_drain_grace_period to 0 and 10080 respectively and add asserts that the variable equals those values, and for the negative case set it to -1 and include expect_failures referencing var.aws_node_pool to ensure validation rejects negative values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@variables.tf`:
- Line 415: Update the machine_pools variable description to document the new
node_drain_grace_period field: state that node_drain_grace_period is specified
in minutes and must be within the range 0–10080, indicate whether 0 disables
graceful drain, and add provider/version requirements (RHCS provider >= 1.7.6)
plus the minimum OpenShift version needed for this feature; reference the
machine_pools variable and the node_drain_grace_period field in the description
so users get validation guidance and required versions.
- Around line 383-416: Add a root-level validation block to variable
"machine_pools" that checks each map entry's
aws_node_pool.node_drain_grace_period (when present) is between 0 and 10080;
implement the condition using a for-expression over var.machine_pools and
try(...) to allow missing values (e.g., require either
try(entry.value.aws_node_pool.node_drain_grace_period, null) == null or the
numeric range check), and provide a clear error_message mentioning machine_pools
and aws_node_pool.node_drain_grace_period so users get immediate validation
before module instantiation.
---
Nitpick comments:
In `@modules/machine-pool/tests/aws_node_pool.tftest.hcl`:
- Around line 157-235: Add boundary and negative-value tests for
aws_node_pool.node_drain_grace_period: create new tftest runs similar to
existing ones using run names like "node_drain_grace_period_zero_plan",
"node_drain_grace_period_max_plan", and
"negative_node_drain_grace_period_fails"; for the zero and max cases set
aws_node_pool.node_drain_grace_period to 0 and 10080 respectively and add
asserts that the variable equals those values, and for the negative case set it
to -1 and include expect_failures referencing var.aws_node_pool to ensure
validation rejects negative values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0e7a0434-1cf8-4b44-aa8b-195ec0e02860
📒 Files selected for processing (5)
README.mdmodules/machine-pool/README.mdmodules/machine-pool/tests/aws_node_pool.tftest.hclmodules/machine-pool/variables.tfvariables.tf
603866d to
b5bdcd0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@variables.tf`:
- Line 394: Remove the unsupported variable node_drain_grace_period from
variables.tf and any references to it in the module configuration and calls
(e.g., where it would be passed into rhcs_hcp_machine_pool); the
rhcs_hcp_machine_pool resource in terraform-redhat/rhcs v1.7.6 does not support
node_drain_grace_period, so delete the variable declaration and any usages to
match the upstream provider schema.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d1c1b21f-cf6e-46b9-8537-29fc7eeba296
📒 Files selected for processing (5)
README.mdmodules/machine-pool/README.mdmodules/machine-pool/tests/aws_node_pool.tftest.hclmodules/machine-pool/variables.tfvariables.tf
✅ Files skipped from review due to trivial changes (2)
- README.md
- modules/machine-pool/tests/aws_node_pool.tftest.hcl
🚧 Files skipped from review as they are similar to previous changes (2)
- modules/machine-pool/variables.tf
- modules/machine-pool/README.md
b5bdcd0 to
34eadb4
Compare
34eadb4 to
22af8ce
Compare
|
@coderabbitai review the changes |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== Root variables.tf ==="
cat variables.tf
echo ""
echo "=== modules/machine-pool/variables.tf ==="
cat modules/machine-pool/variables.tf
echo ""
echo "=== modules/machine-pool/main.tf ==="
cat modules/machine-pool/main.tf
echo ""
echo "=== modules/machine-pool/tests/aws_node_pool.tftest.hcl ==="
cat modules/machine-pool/tests/aws_node_pool.tftest.hcl
echo ""
echo "=== modules/machine-pool/README.md (node_drain section) ==="
grep -n "node_drain" modules/machine-pool/README.md
echo ""
echo "=== README.md (node_drain section) ==="
grep -n "node_drain" README.mdLength of output: 36546
Review: OCM-24713 —
|
| Severity | Item |
|---|---|
number type allows floats; consider adding an integer guard in both validation blocks |
|
| 💡 Suggestion | Add boundary tests for 0 and 10080 |
| 💡 Suggestion | Add a negative-value rejection test |
| 💡 Suggestion | Add a comment explaining why with_override is used for the null test |
|
/retest-required |
|
|
The provider already models this as |
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Amanda Hager Lopes de Andrade Katz <amanda.katz@redhat.com>
22af8ce to
a7a0d4a
Compare
Good catch, I added one for integer and positive number |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
variables.tf (1)
389-422:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd root fail-fast validation for
machine_pools[*].aws_node_pool.node_drain_grace_period.Line 400 widens the public API, but this root variable block still has no validation for nullable-safe bounds (and whole-minute values). That pushes invalid inputs to downstream/module failures instead of failing at the root boundary.
Suggested patch
variable "machine_pools" { type = map(object({ @@ default = {} description = "Provides a typed map to add multiple machine pools after cluster creation. Each key is an arbitrary label; each value aligns with the [machine-pool](./modules/machine-pool) submodule (required: name, subnet_id, openshift_version, aws_node_pool). Optional fields match that module's optional inputs; omit autoscaling to use a fixed replica count with autoscaling disabled." + + validation { + condition = alltrue([ + for _, mp in var.machine_pools : + mp.aws_node_pool.node_drain_grace_period == null ? true : ( + mp.aws_node_pool.node_drain_grace_period >= 0 && + mp.aws_node_pool.node_drain_grace_period <= 10080 && + mp.aws_node_pool.node_drain_grace_period == floor(mp.aws_node_pool.node_drain_grace_period) + ) + ]) + error_message = "Each machine_pools.aws_node_pool.node_drain_grace_period must be null or a whole number between 0 and 10080 (minutes)." + } }As per coding guidelines, “variables.tf: Add root validation blocks for cross-field map validation rules that users hit early; child modules may keep lifecycle precondition as a second line of defense” and “Variable validation with nullable values: ... use short-circuiting ... when the value is non-null.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@variables.tf` around lines 389 - 422, Add a validation block to the root variable "machine_pools" that iterates its map values and enforces nullable-safe bounds and whole-minute semantics for aws_node_pool.node_drain_grace_period: for each entry (refer to variable "machine_pools" and field aws_node_pool.node_drain_grace_period) short-circuit when the value is null, otherwise require it to be a non-negative integer and divisible by 60 (e.g., using all(keys(var.machine_pools), k -> (var.machine_pools[k].aws_node_pool.node_drain_grace_period == null) || (var.machine_pools[k].aws_node_pool.node_drain_grace_period >= 0 && var.machine_pools[k].aws_node_pool.node_drain_grace_period % 60 == 0))); include a clear validation error_message describing the allowed values.Source: Coding guidelines
🧹 Nitpick comments (1)
Makefile (1)
71-71: 💤 Low valueConsider updating the comment to reflect the new execution order.
The comment lists checks in the old order ("verify, verify-gen, lint, unit-tests, license-check, docs-lint") while the code now executes them as: license-check → docs-lint → verify-gen → unit-tests → lint → verify. Updating the comment to match execution order would improve clarity for future maintainers.
📝 Suggested comment update
-# Merge gate: verify, verify-gen, lint, unit-tests, license-check, docs-lint (fail-fast). +# Merge gate: license-check, docs-lint, verify-gen, unit-tests, lint, verify (fail-fast, ordered by speed).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 71, Update the merge gate comment text on the Makefile line that currently reads "verify, verify-gen, lint, unit-tests, license-check, docs-lint (fail-fast)" so it matches the actual execution order: "license-check → docs-lint → verify-gen → unit-tests → lint → verify (fail-fast)"; edit the comment string in the same location to replace the old sequence with the new sequence exactly as shown so future maintainers see the correct order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@variables.tf`:
- Around line 389-422: Add a validation block to the root variable
"machine_pools" that iterates its map values and enforces nullable-safe bounds
and whole-minute semantics for aws_node_pool.node_drain_grace_period: for each
entry (refer to variable "machine_pools" and field
aws_node_pool.node_drain_grace_period) short-circuit when the value is null,
otherwise require it to be a non-negative integer and divisible by 60 (e.g.,
using all(keys(var.machine_pools), k ->
(var.machine_pools[k].aws_node_pool.node_drain_grace_period == null) ||
(var.machine_pools[k].aws_node_pool.node_drain_grace_period >= 0 &&
var.machine_pools[k].aws_node_pool.node_drain_grace_period % 60 == 0))); include
a clear validation error_message describing the allowed values.
---
Nitpick comments:
In `@Makefile`:
- Line 71: Update the merge gate comment text on the Makefile line that
currently reads "verify, verify-gen, lint, unit-tests, license-check, docs-lint
(fail-fast)" so it matches the actual execution order: "license-check →
docs-lint → verify-gen → unit-tests → lint → verify (fail-fast)"; edit the
comment string in the same location to replace the old sequence with the new
sequence exactly as shown so future maintainers see the correct order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 10d68e85-af56-48b4-b6c6-44e2f105f47f
📒 Files selected for processing (7)
MakefileREADME.mdmodules/machine-pool/README.mdmodules/machine-pool/tests/aws_node_pool.tftest.hclmodules/machine-pool/variables.tfmodules/machine-pool/versions.tfvariables.tf
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- modules/machine-pool/versions.tf
- modules/machine-pool/variables.tf
- modules/machine-pool/tests/aws_node_pool.tftest.hcl
- modules/machine-pool/README.md
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amandahla, olucasfreitas The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/override ci/prow/rosa-hcp-private Failed due error " The vpc 'vpc-02763f495de8418b5' has dependencies and cannot be deleted.", not related to change. |
|
@amandahla: Overrode contexts on behalf of amandahla: ci/prow/rosa-hcp-private, ci/prow/rosa-hcp-public DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
8f49326
into
terraform-redhat:main
PR Summary
Adds optional
node_drain_grace_periodfield toaws_node_poolin the machine-pool module and raises the minimum rhcs provider constraint to >= 1.7.7 where the attribute is first supported.Detailed Description of the Issue
rhcs_hcp_machine_poolexposesnode_drain_grace_period(minutes) since provider v1.7.7, but the machine-pool module'saws_node_poolvariable did not surface it, leaving consumers unable to control how long the provider waits before forcibly terminating draining nodes. This PRwires the field through, validates the accepted range (0–10080), and bumps both the module and root provider minimum constraints from >= 1.7.3/1.7.6 to >= 1.7.7.
Also included: tooling fixes to
Makefileandscripts/verify-gen.shthat makemake verify-genandmake lintmore robust in isolated CI environments (Prow), replacing git-status-based doc drift detection with SHA-based hashing and isolatingterraform initto a tempdirectory.
Related Issues and PRs
#Type of Change
Previous Behavior
aws_node_poolhad nonode_drain_grace_periodfield. The rhcs provider used its own default drain timeout; module consumers had no way to override it.Behavior After This Change
aws_node_pool.node_drain_grace_period(optional number) accepts 0–10080 minutes and is passed directly torhcs_hcp_machine_pool. Omitting it preserves the provider default. Values outside [0, 10080] failterraform planwith a validation error.How to Test (Step-by-Step)
Preconditions
RHCS_TOKENand AWS credentials setTest Steps
make pre-push-checkscd modules/machine-pool && terraform test(runsaws_node_pool.tftest.hclincluding the three newnode_drain_grace_periodrun blocks)node_drain_grace_period = 60and confirm the value appears interraform showoutput.Expected Results
invalid_node_drain_grace_period_fails,valid_node_drain_grace_period_plan, andnode_drain_grace_period_null_plan.node_drain_grace_period = 10081fails plan validation.Proof of the Fix
Breaking Changes
Breaking Change Details / Migration Plan
The minimum rhcs provider constraint is raised from >= 1.7.3 (module) / >= 1.7.6 (root) to >= 1.7.7. Consumers pinned below 1.7.7 must upgrade their provider before applying. The
aws_node_poolinterface change is additive (new optional field with null default).Developer Verification Checklist
rhcsresources/variables), I linked official Red Hat or cited ROSA HCP documentation that supports reference alignment, or I explained why the change still belongs in-repo perModule scope (AWS-only vs core HCP)in.cursor/rules/rosa-hcp-terraform.mdc.[JIRA-TICKET] | [TYPE]: <MESSAGE>.make pre-push-checkspasses (or each step:verify,verify-gen,lint,unit-tests,license-check,docs-lint).make terraform-docs).Summary by CodeRabbit
Release Notes
New Features
Tests
Chores