Skip to content

OCM-24713 | feat(machine-pool): add node_drain_grace_period attribute#145

Merged
openshift-merge-bot[bot] merged 1 commit into
terraform-redhat:mainfrom
amandahla:OCM-24713-node-drain-grace-period
Jun 10, 2026
Merged

OCM-24713 | feat(machine-pool): add node_drain_grace_period attribute#145
openshift-merge-bot[bot] merged 1 commit into
terraform-redhat:mainfrom
amandahla:OCM-24713-node-drain-grace-period

Conversation

@amandahla

@amandahla amandahla commented May 15, 2026

Copy link
Copy Markdown
Member

PR Summary

Adds optional node_drain_grace_period field to aws_node_pool in the machine-pool module and raises the minimum rhcs provider constraint to >= 1.7.7 where the attribute is first supported.

Detailed Description of the Issue

rhcs_hcp_machine_pool exposes node_drain_grace_period (minutes) since provider v1.7.7, but the machine-pool module's aws_node_pool variable did not surface it, leaving consumers unable to control how long the provider waits before forcibly terminating draining nodes. This PR
wires the field through, validates the accepted range (0–10080), and bumps both the module and root provider minimum constraints from >= 1.7.3/1.7.6 to >= 1.7.7.

Also included: tooling fixes to Makefile and scripts/verify-gen.sh that make make verify-gen and make lint more robust in isolated CI environments (Prow), replacing git-status-based doc drift detection with SHA-based hashing and isolating terraform init to a temp
directory.

Related Issues and PRs

  • Jira: OCM-24713
  • Fixes: #
  • Related PR(s):
  • Related design/docs:

Type of Change

  • feat - adds a new module capability or new user-facing behavior.
  • fix - resolves incorrect module behavior or bug.
  • docs - updates documentation only.
  • style - formatting/naming changes with no logic impact.
  • refactor - module code restructuring with no behavior change.
  • test - adds or updates tests only.
  • chore - maintenance work (tooling, housekeeping, non-product code).
  • build - changes build system, packaging, or dependencies for build output.
  • ci - changes CI pipelines, jobs, or automation workflows.
  • perf - improves performance without changing intended behavior.

Previous Behavior

aws_node_pool had no node_drain_grace_period field. The rhcs provider used its own default drain timeout; module consumers had no way to override it.

Behavior After This Change

aws_node_pool.node_drain_grace_period (optional number) accepts 0–10080 minutes and is passed directly to rhcs_hcp_machine_pool. Omitting it preserves the provider default. Values outside [0, 10080] fail terraform plan with a validation error.

How to Test (Step-by-Step)

Preconditions

  • Terraform >= 1.5.7
  • rhcs provider >= 1.7.7
  • RHCS_TOKEN and AWS credentials set

Test Steps

  1. make pre-push-checks
  2. cd modules/machine-pool && terraform test (runs aws_node_pool.tftest.hcl including the three new node_drain_grace_period run blocks)
  3. In a live environment: deploy a machine pool with node_drain_grace_period = 60 and confirm the value appears in terraform show output.

Expected Results

  • All unit tests pass, including invalid_node_drain_grace_period_fails, valid_node_drain_grace_period_plan, and node_drain_grace_period_null_plan.
  • Setting node_drain_grace_period = 10081 fails plan validation.
  • Live deployment reflects the configured drain timeout.

Proof of the Fix

  • Screenshots:
  • Videos:
  • Logs/CLI output:
  • Other artifacts:

Breaking Changes

  • No breaking changes
  • Yes, this PR introduces a breaking change (describe impact and migration plan below)

Breaking Change Details / Migration Plan

The minimum rhcs provider constraint is raised from >= 1.7.3 (module) / >= 1.7.6 (root) to >= 1.7.7. Consumers pinned below 1.7.7 must upgrade their provider before applying. The aws_node_pool interface change is additive (new optional field with null default).

Developer Verification Checklist

  • AWS-only changes: If this PR is mainly AWS-only (no rhcs resources/variables), I linked official Red Hat or cited ROSA HCP documentation that supports reference alignment, or I explained why the change still belongs in-repo per Module scope (AWS-only vs core HCP) in .cursor/rules/rosa-hcp-terraform.mdc.
  • I checked if this affects terraform-rhcs-rosa-classic and submitted (or already submitted) a companion PR when needed.
  • Commit subject/title follows [JIRA-TICKET] | [TYPE]: <MESSAGE>.
  • PR description clearly explains both what changed and why.
  • Relevant Jira/GitHub issues and related PRs are linked.
  • Tests were added/updated where appropriate.
  • I manually tested the change.
  • make pre-push-checks passes (or each step: verify, verify-gen, lint, unit-tests, license-check, docs-lint).
  • Documentation was added/updated where appropriate (see make terraform-docs).
  • Any risk, limitation, or follow-up work is documented.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for configuring node drain grace period in machine pools.
    • Added capacity reservation preference option for AWS node pools.
  • Tests

    • Added test coverage for node drain grace period behavior.
  • Chores

    • Updated minimum rhcs provider version to 1.7.7.
    • Reordered pre-push validation checks.

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR extends the aws_node_pool schema with an optional node_drain_grace_period field, adds validation rules and test coverage, updates documentation and provider version requirements, and reorders the Makefile pre-push-checks target for faster feedback iteration.

Changes

aws_node_pool schema extensions

Layer / File(s) Summary
Variable schema and validation
variables.tf, modules/machine-pool/variables.tf
Root and module aws_node_pool objects gain optional node_drain_grace_period field. Module-level validation enforces 0–10080 minute range, integer values, and non-negative constraint with distinct error messages.
node_drain_grace_period test coverage
modules/machine-pool/tests/aws_node_pool.tftest.hcl
Mock provider defaults set node_drain_grace_period to null. Three plan-run tests verify validation failure at 10081, success and wiring at 60, and explicit null wiring.
Documentation and provider version
modules/machine-pool/README.md, README.md, modules/machine-pool/versions.tf
Module README documents new field and updates rhcs provider requirement from ≥1.7.3 to ≥1.7.7. Root README extends aws_node_pool schema documentation with capacity_reservation_preference field.

CI/Build tooling

Layer / File(s) Summary
pre-push-checks target reordering
Makefile
Reordered recipe execution: license-check and docs-lint run first for rapid feedback, followed by verify-gen, unit-tests, lint, with verify last.

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

lgtm

Suggested reviewers

  • gdbranco
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main feature addition: the new node_drain_grace_period attribute for the machine-pool module, directly corresponding to the primary changes in the PR.
Description check ✅ Passed The description follows the template structure, covers the problem (missing field exposure), why it's needed (consumer control), what changed (field addition and provider bump), and testing approach. Most required sections are completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Pr Checklist Claims Vs Evidence (Generic) ✅ Passed All 6 checked items satisfied: commit format correct, detailed description present, Jira issue linked, tests added, documentation updated, migration plan documented.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
variables.tf (2)

415-415: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document node_drain_grace_period range and requirements in the variable description.

The machine_pools description doesn't mention the valid range (0–10080 minutes) or provider requirements for the new node_drain_grace_period field. Users will encounter validation errors without clear guidance from the variable documentation. As per coding guidelines, document minimum OpenShift version requirements in the variable description when a feature needs a specific minimum version.

Consider updating the description to include the valid range and any version/provider requirements mentioned in the PR summary (RHCS provider >= 1.7.6).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@variables.tf` at line 415, Update the machine_pools variable description to
document the new node_drain_grace_period field: state that
node_drain_grace_period is specified in minutes and must be within the range
0–10080, indicate whether 0 disables graceful drain, and add provider/version
requirements (RHCS provider >= 1.7.6) plus the minimum OpenShift version needed
for this feature; reference the machine_pools variable and the
node_drain_grace_period field in the description so users get validation
guidance and required versions.

383-416: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add root-level validation for node_drain_grace_period to catch errors early.

The validation for node_drain_grace_period (0–10080 range) exists only at the module level. Users won't see validation errors until module instantiation, which delays feedback. As per coding guidelines, add root validation blocks for cross-field map validation rules that users hit early; child modules may keep lifecycle precondition as a second line of defense.

🛡️ Proposed validation block to add
   default     = {}
   description = "Provides a typed map to add multiple machine pools after cluster creation. Each key is an arbitrary label; each value aligns with the [machine-pool](./modules/machine-pool) submodule (required: name, subnet_id, openshift_version, aws_node_pool). Optional fields match that module's optional inputs; omit autoscaling to use a fixed replica count with autoscaling disabled."
+
+  validation {
+    condition = alltrue([
+      for _, mp in var.machine_pools :
+      mp.aws_node_pool.node_drain_grace_period == null ? true : (
+        mp.aws_node_pool.node_drain_grace_period >= 0 &&
+        mp.aws_node_pool.node_drain_grace_period <= 10080
+      )
+    ])
+    error_message = "node_drain_grace_period must be between 0 and 10080 minutes (7 days)."
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@variables.tf` around lines 383 - 416, Add a root-level validation block to
variable "machine_pools" that checks each map entry's
aws_node_pool.node_drain_grace_period (when present) is between 0 and 10080;
implement the condition using a for-expression over var.machine_pools and
try(...) to allow missing values (e.g., require either
try(entry.value.aws_node_pool.node_drain_grace_period, null) == null or the
numeric range check), and provide a clear error_message mentioning machine_pools
and aws_node_pool.node_drain_grace_period so users get immediate validation
before module instantiation.
🧹 Nitpick comments (1)
modules/machine-pool/tests/aws_node_pool.tftest.hcl (1)

157-235: ⚡ Quick win

Consider adding boundary value tests for comprehensive coverage.

The current tests cover invalid (>10080), valid (60), and null cases. Consider adding tests for the boundary values (0, 10080) and a negative value (-1) to ensure the validation correctly handles edge cases.

🧪 Suggested additional test cases
# Test minimum boundary value (0 minutes)
run "node_drain_grace_period_zero_plan" {
  command = plan

  providers = {
    rhcs = rhcs.no_override
  }

  variables {
    cluster_id        = "fake-cluster-123"
    name              = "test-pool"
    subnet_id         = "subnet-fake123"
    openshift_version = "4.15.0"

    aws_node_pool = {
      instance_type           = "m5.xlarge"
      tags                    = {}
      node_drain_grace_period = 0
    }
  }

  assert {
    condition     = var.aws_node_pool.node_drain_grace_period == 0
    error_message = "Expected node_drain_grace_period to be 0."
  }
}

# Test maximum boundary value (10080 minutes)
run "node_drain_grace_period_max_plan" {
  command = plan

  providers = {
    rhcs = rhcs.no_override
  }

  variables {
    cluster_id        = "fake-cluster-123"
    name              = "test-pool"
    subnet_id         = "subnet-fake123"
    openshift_version = "4.15.0"

    aws_node_pool = {
      instance_type           = "m5.xlarge"
      tags                    = {}
      node_drain_grace_period = 10080
    }
  }

  assert {
    condition     = var.aws_node_pool.node_drain_grace_period == 10080
    error_message = "Expected node_drain_grace_period to be 10080."
  }
}

# Test negative value fails validation
run "negative_node_drain_grace_period_fails" {
  command = plan

  providers = {
    rhcs = rhcs.no_override
  }

  variables {
    cluster_id        = "fake-cluster-123"
    name              = "test-pool"
    subnet_id         = "subnet-fake123"
    openshift_version = "4.15.0"

    aws_node_pool = {
      instance_type           = "m5.xlarge"
      tags                    = {}
      node_drain_grace_period = -1
    }
  }

  expect_failures = [
    var.aws_node_pool,
  ]
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/machine-pool/tests/aws_node_pool.tftest.hcl` around lines 157 - 235,
Add boundary and negative-value tests for aws_node_pool.node_drain_grace_period:
create new tftest runs similar to existing ones using run names like
"node_drain_grace_period_zero_plan", "node_drain_grace_period_max_plan", and
"negative_node_drain_grace_period_fails"; for the zero and max cases set
aws_node_pool.node_drain_grace_period to 0 and 10080 respectively and add
asserts that the variable equals those values, and for the negative case set it
to -1 and include expect_failures referencing var.aws_node_pool to ensure
validation rejects negative values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@variables.tf`:
- Line 415: Update the machine_pools variable description to document the new
node_drain_grace_period field: state that node_drain_grace_period is specified
in minutes and must be within the range 0–10080, indicate whether 0 disables
graceful drain, and add provider/version requirements (RHCS provider >= 1.7.6)
plus the minimum OpenShift version needed for this feature; reference the
machine_pools variable and the node_drain_grace_period field in the description
so users get validation guidance and required versions.
- Around line 383-416: Add a root-level validation block to variable
"machine_pools" that checks each map entry's
aws_node_pool.node_drain_grace_period (when present) is between 0 and 10080;
implement the condition using a for-expression over var.machine_pools and
try(...) to allow missing values (e.g., require either
try(entry.value.aws_node_pool.node_drain_grace_period, null) == null or the
numeric range check), and provide a clear error_message mentioning machine_pools
and aws_node_pool.node_drain_grace_period so users get immediate validation
before module instantiation.

---

Nitpick comments:
In `@modules/machine-pool/tests/aws_node_pool.tftest.hcl`:
- Around line 157-235: Add boundary and negative-value tests for
aws_node_pool.node_drain_grace_period: create new tftest runs similar to
existing ones using run names like "node_drain_grace_period_zero_plan",
"node_drain_grace_period_max_plan", and
"negative_node_drain_grace_period_fails"; for the zero and max cases set
aws_node_pool.node_drain_grace_period to 0 and 10080 respectively and add
asserts that the variable equals those values, and for the negative case set it
to -1 and include expect_failures referencing var.aws_node_pool to ensure
validation rejects negative values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0e7a0434-1cf8-4b44-aa8b-195ec0e02860

📥 Commits

Reviewing files that changed from the base of the PR and between 6a843a1 and 603866d.

📒 Files selected for processing (5)
  • README.md
  • modules/machine-pool/README.md
  • modules/machine-pool/tests/aws_node_pool.tftest.hcl
  • modules/machine-pool/variables.tf
  • variables.tf

@amandahla amandahla marked this pull request as draft May 15, 2026 19:39
@amandahla amandahla force-pushed the OCM-24713-node-drain-grace-period branch from 603866d to b5bdcd0 Compare May 18, 2026 16:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@variables.tf`:
- Line 394: Remove the unsupported variable node_drain_grace_period from
variables.tf and any references to it in the module configuration and calls
(e.g., where it would be passed into rhcs_hcp_machine_pool); the
rhcs_hcp_machine_pool resource in terraform-redhat/rhcs v1.7.6 does not support
node_drain_grace_period, so delete the variable declaration and any usages to
match the upstream provider schema.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d1c1b21f-cf6e-46b9-8537-29fc7eeba296

📥 Commits

Reviewing files that changed from the base of the PR and between 603866d and b5bdcd0.

📒 Files selected for processing (5)
  • README.md
  • modules/machine-pool/README.md
  • modules/machine-pool/tests/aws_node_pool.tftest.hcl
  • modules/machine-pool/variables.tf
  • variables.tf
✅ Files skipped from review due to trivial changes (2)
  • README.md
  • modules/machine-pool/tests/aws_node_pool.tftest.hcl
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/machine-pool/variables.tf
  • modules/machine-pool/README.md

Comment thread variables.tf
@amandahla amandahla force-pushed the OCM-24713-node-drain-grace-period branch from b5bdcd0 to 34eadb4 Compare June 8, 2026 18:00
@amandahla amandahla marked this pull request as ready for review June 8, 2026 18:02
Comment thread variables.tf
Comment thread versions.tf
Comment thread modules/machine-pool/tests/aws_node_pool.tftest.hcl Outdated
@amandahla amandahla force-pushed the OCM-24713-node-drain-grace-period branch from 34eadb4 to 22af8ce Compare June 9, 2026 14:36
@amandahla

Copy link
Copy Markdown
Member Author

@coderabbitai review the changes

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== Root variables.tf ===" 
cat variables.tf

echo ""
echo "=== modules/machine-pool/variables.tf ==="
cat modules/machine-pool/variables.tf

echo ""
echo "=== modules/machine-pool/main.tf ==="
cat modules/machine-pool/main.tf

echo ""
echo "=== modules/machine-pool/tests/aws_node_pool.tftest.hcl ==="
cat modules/machine-pool/tests/aws_node_pool.tftest.hcl

echo ""
echo "=== modules/machine-pool/README.md (node_drain section) ==="
grep -n "node_drain" modules/machine-pool/README.md

echo ""
echo "=== README.md (node_drain section) ==="
grep -n "node_drain" README.md

Length of output: 36546



@amandahla: Here is my review of the PR changes.


Review: OCM-24713node_drain_grace_period attribute

Overall

The implementation is clean and well-structured. The new attribute is correctly declared, validated, and threaded through (via the whole-object pass-through aws_node_pool = var.aws_node_pool in main.tf). The three new test cases cover the key scenarios. A few minor observations below.


modules/machine-pool/variables.tf & variables.tf

1. optional(number) accepts fractional values

Terraform's number type accepts floats (e.g., 60.5). The attribute semantics are "integer minutes," but there is no validation guard against a non-integer value like 1440.5. If the RHCS provider's API enforces integer-only values, Terraform would accept the input and then fail at apply time with a provider-level error instead of a clear validation message.

Consider adding a floor/ceiling check:

validation {
  condition = var.aws_node_pool.node_drain_grace_period == null ? true : (
    var.aws_node_pool.node_drain_grace_period >= 0 &&
    var.aws_node_pool.node_drain_grace_period <= 10080 &&
    floor(var.aws_node_pool.node_drain_grace_period) == var.aws_node_pool.node_drain_grace_period
  )
  error_message = "node_drain_grace_period must be a whole number between 0 and 10080 minutes (7 days)."
}

The same applies to the root variables.tf validation block.


modules/machine-pool/tests/aws_node_pool.tftest.hcl

2. No boundary-value tests (0 and 10080)

The valid_node_drain_grace_period_plan run tests 60, which is well within range. It would be good practice to also test the inclusive boundaries 0 (minimum) and 10080 (maximum) to confirm the >=/<= operators are correctly anchored. A single extra run with 0 is the most critical edge case (a "disable drain" semantic).

3. No test for a negative value

A test with e.g. node_drain_grace_period = -1 with expect_failures = [var.aws_node_pool] would explicitly document that negative inputs are rejected. This is especially useful given that 0 carries valid meaning (immediate eviction).

4. Inconsistent provider alias between valid_node_drain_grace_period_plan and node_drain_grace_period_null_plan

valid_node_drain_grace_period_plan uses rhcs.no_override while node_drain_grace_period_null_plan uses rhcs.with_override. The reason is sensible (a null computed field needs the mock default to be assertable), but a short inline comment would make the intent clear for future contributors:

# Uses with_override so the computed aws_node_pool object is fully known at plan time,
# allowing the null assertion to be evaluated.
run "node_drain_grace_period_null_plan" {

modules/machine-pool/main.tf

No changes needed — node_drain_grace_period flows automatically because aws_node_pool is passed as a whole object.


Documentation

Both README.md and modules/machine-pool/README.md are correctly regenerated and show the new node_drain_grace_period = optional(number) field.


Summary

Severity Item
⚠️ Minor number type allows floats; consider adding an integer guard in both validation blocks
💡 Suggestion Add boundary tests for 0 and 10080
💡 Suggestion Add a negative-value rejection test
💡 Suggestion Add a comment explaining why with_override is used for the null test

@amandahla

Copy link
Copy Markdown
Member Author

/retest-required

@amandahla

Copy link
Copy Markdown
Member Author

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain
🏁 Script executed:

#!/bin/bash
echo "=== Root variables.tf ===" 
cat variables.tf

echo ""
echo "=== modules/machine-pool/variables.tf ==="
cat modules/machine-pool/variables.tf

echo ""
echo "=== modules/machine-pool/main.tf ==="
cat modules/machine-pool/main.tf

echo ""
echo "=== modules/machine-pool/tests/aws_node_pool.tftest.hcl ==="
cat modules/machine-pool/tests/aws_node_pool.tftest.hcl

echo ""
echo "=== modules/machine-pool/README.md (node_drain section) ==="
grep -n "node_drain" modules/machine-pool/README.md

echo ""
echo "=== README.md (node_drain section) ==="
grep -n "node_drain" README.md

Length of output: 36546

@amandahla: Here is my review of the PR changes.

Review: OCM-24713node_drain_grace_period attribute

Overall

The implementation is clean and well-structured. The new attribute is correctly declared, validated, and threaded through (via the whole-object pass-through aws_node_pool = var.aws_node_pool in main.tf). The three new test cases cover the key scenarios. A few minor observations below.

modules/machine-pool/variables.tf & variables.tf

1. optional(number) accepts fractional values

Terraform's number type accepts floats (e.g., 60.5). The attribute semantics are "integer minutes," but there is no validation guard against a non-integer value like 1440.5. If the RHCS provider's API enforces integer-only values, Terraform would accept the input and then fail at apply time with a provider-level error instead of a clear validation message.

Consider adding a floor/ceiling check:

validation {
  condition = var.aws_node_pool.node_drain_grace_period == null ? true : (
    var.aws_node_pool.node_drain_grace_period >= 0 &&
    var.aws_node_pool.node_drain_grace_period <= 10080 &&
    floor(var.aws_node_pool.node_drain_grace_period) == var.aws_node_pool.node_drain_grace_period
  )
  error_message = "node_drain_grace_period must be a whole number between 0 and 10080 minutes (7 days)."
}

The same applies to the root variables.tf validation block.

modules/machine-pool/tests/aws_node_pool.tftest.hcl

2. No boundary-value tests (0 and 10080)

The valid_node_drain_grace_period_plan run tests 60, which is well within range. It would be good practice to also test the inclusive boundaries 0 (minimum) and 10080 (maximum) to confirm the >=/<= operators are correctly anchored. A single extra run with 0 is the most critical edge case (a "disable drain" semantic).

3. No test for a negative value

A test with e.g. node_drain_grace_period = -1 with expect_failures = [var.aws_node_pool] would explicitly document that negative inputs are rejected. This is especially useful given that 0 carries valid meaning (immediate eviction).

4. Inconsistent provider alias between valid_node_drain_grace_period_plan and node_drain_grace_period_null_plan

valid_node_drain_grace_period_plan uses rhcs.no_override while node_drain_grace_period_null_plan uses rhcs.with_override. The reason is sensible (a null computed field needs the mock default to be assertable), but a short inline comment would make the intent clear for future contributors:

# Uses with_override so the computed aws_node_pool object is fully known at plan time,
# allowing the null assertion to be evaluated.
run "node_drain_grace_period_null_plan" {

modules/machine-pool/main.tf

No changes needed — node_drain_grace_period flows automatically because aws_node_pool is passed as a whole object.

Documentation

Both README.md and modules/machine-pool/README.md are correctly regenerated and show the new node_drain_grace_period = optional(number) field.

Summary

Severity Item
⚠️ Minor number type allows floats; consider adding an integer guard in both validation blocks
To be addressed by the provider.
💡 Suggestion Add boundary tests for 0 and 10080
Nitpicking
💡 Suggestion Add a negative-value rejection test
Nitpicking
💡 Suggestion Add a comment explaining why with_override is used for the null test
Nitpicking

Comment thread tests/machine_pools.tftest.hcl Outdated
@olucasfreitas

Copy link
Copy Markdown
Contributor

The provider already models this as Int64, but the module still accepts 60.5 and only rejects it downstream in provider validation. Since Jira and the provider docs describe whole minutes, I’d still consider a simple integer guard worthwhile if we want the module’s fail-fast behavior to match the public contract.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Amanda Hager Lopes de Andrade Katz <amanda.katz@redhat.com>
@amandahla amandahla force-pushed the OCM-24713-node-drain-grace-period branch from 22af8ce to a7a0d4a Compare June 9, 2026 19:02
@amandahla

Copy link
Copy Markdown
Member Author

The provider already models this as Int64, but the module still accepts 60.5 and only rejects it downstream in provider validation. Since Jira and the provider docs describe whole minutes, I’d still consider a simple integer guard worthwhile if we want the module’s fail-fast behavior to match the public contract.

Good catch, I added one for integer and positive number

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
variables.tf (1)

389-422: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add root fail-fast validation for machine_pools[*].aws_node_pool.node_drain_grace_period.

Line 400 widens the public API, but this root variable block still has no validation for nullable-safe bounds (and whole-minute values). That pushes invalid inputs to downstream/module failures instead of failing at the root boundary.

Suggested patch
 variable "machine_pools" {
   type = map(object({
@@
   default     = {}
   description = "Provides a typed map to add multiple machine pools after cluster creation. Each key is an arbitrary label; each value aligns with the [machine-pool](./modules/machine-pool) submodule (required: name, subnet_id, openshift_version, aws_node_pool). Optional fields match that module's optional inputs; omit autoscaling to use a fixed replica count with autoscaling disabled."
+
+  validation {
+    condition = alltrue([
+      for _, mp in var.machine_pools :
+      mp.aws_node_pool.node_drain_grace_period == null ? true : (
+        mp.aws_node_pool.node_drain_grace_period >= 0 &&
+        mp.aws_node_pool.node_drain_grace_period <= 10080 &&
+        mp.aws_node_pool.node_drain_grace_period == floor(mp.aws_node_pool.node_drain_grace_period)
+      )
+    ])
+    error_message = "Each machine_pools.aws_node_pool.node_drain_grace_period must be null or a whole number between 0 and 10080 (minutes)."
+  }
 }

As per coding guidelines, “variables.tf: Add root validation blocks for cross-field map validation rules that users hit early; child modules may keep lifecycle precondition as a second line of defense” and “Variable validation with nullable values: ... use short-circuiting ... when the value is non-null.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@variables.tf` around lines 389 - 422, Add a validation block to the root
variable "machine_pools" that iterates its map values and enforces nullable-safe
bounds and whole-minute semantics for aws_node_pool.node_drain_grace_period: for
each entry (refer to variable "machine_pools" and field
aws_node_pool.node_drain_grace_period) short-circuit when the value is null,
otherwise require it to be a non-negative integer and divisible by 60 (e.g.,
using all(keys(var.machine_pools), k ->
(var.machine_pools[k].aws_node_pool.node_drain_grace_period == null) ||
(var.machine_pools[k].aws_node_pool.node_drain_grace_period >= 0 &&
var.machine_pools[k].aws_node_pool.node_drain_grace_period % 60 == 0))); include
a clear validation error_message describing the allowed values.

Source: Coding guidelines

🧹 Nitpick comments (1)
Makefile (1)

71-71: 💤 Low value

Consider updating the comment to reflect the new execution order.

The comment lists checks in the old order ("verify, verify-gen, lint, unit-tests, license-check, docs-lint") while the code now executes them as: license-check → docs-lint → verify-gen → unit-tests → lint → verify. Updating the comment to match execution order would improve clarity for future maintainers.

📝 Suggested comment update
-# Merge gate: verify, verify-gen, lint, unit-tests, license-check, docs-lint (fail-fast).
+# Merge gate: license-check, docs-lint, verify-gen, unit-tests, lint, verify (fail-fast, ordered by speed).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 71, Update the merge gate comment text on the Makefile line
that currently reads "verify, verify-gen, lint, unit-tests, license-check,
docs-lint (fail-fast)" so it matches the actual execution order: "license-check
→ docs-lint → verify-gen → unit-tests → lint → verify (fail-fast)"; edit the
comment string in the same location to replace the old sequence with the new
sequence exactly as shown so future maintainers see the correct order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@variables.tf`:
- Around line 389-422: Add a validation block to the root variable
"machine_pools" that iterates its map values and enforces nullable-safe bounds
and whole-minute semantics for aws_node_pool.node_drain_grace_period: for each
entry (refer to variable "machine_pools" and field
aws_node_pool.node_drain_grace_period) short-circuit when the value is null,
otherwise require it to be a non-negative integer and divisible by 60 (e.g.,
using all(keys(var.machine_pools), k ->
(var.machine_pools[k].aws_node_pool.node_drain_grace_period == null) ||
(var.machine_pools[k].aws_node_pool.node_drain_grace_period >= 0 &&
var.machine_pools[k].aws_node_pool.node_drain_grace_period % 60 == 0))); include
a clear validation error_message describing the allowed values.

---

Nitpick comments:
In `@Makefile`:
- Line 71: Update the merge gate comment text on the Makefile line that
currently reads "verify, verify-gen, lint, unit-tests, license-check, docs-lint
(fail-fast)" so it matches the actual execution order: "license-check →
docs-lint → verify-gen → unit-tests → lint → verify (fail-fast)"; edit the
comment string in the same location to replace the old sequence with the new
sequence exactly as shown so future maintainers see the correct order.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 10d68e85-af56-48b4-b6c6-44e2f105f47f

📥 Commits

Reviewing files that changed from the base of the PR and between 22af8ce and a7a0d4a.

📒 Files selected for processing (7)
  • Makefile
  • README.md
  • modules/machine-pool/README.md
  • modules/machine-pool/tests/aws_node_pool.tftest.hcl
  • modules/machine-pool/variables.tf
  • modules/machine-pool/versions.tf
  • variables.tf
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • modules/machine-pool/versions.tf
  • modules/machine-pool/variables.tf
  • modules/machine-pool/tests/aws_node_pool.tftest.hcl
  • modules/machine-pool/README.md

@amandahla amandahla requested a review from olucasfreitas June 9, 2026 20:49
@olucasfreitas

Copy link
Copy Markdown
Contributor

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm label Jun 9, 2026
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amandahla, olucasfreitas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [amandahla,olucasfreitas]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amandahla

Copy link
Copy Markdown
Member Author

/retest-required

1 similar comment
@amandahla

Copy link
Copy Markdown
Member Author

/retest-required

@amandahla

Copy link
Copy Markdown
Member Author

/override ci/prow/rosa-hcp-private
/override ci/prow/rosa-hcp-public

Failed due error " The vpc 'vpc-02763f495de8418b5' has dependencies and cannot be deleted.", not related to change.

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

@amandahla: Overrode contexts on behalf of amandahla: ci/prow/rosa-hcp-private, ci/prow/rosa-hcp-public

Details

In response to this:

/override ci/prow/rosa-hcp-private
/override ci/prow/rosa-hcp-public

Failed due error " The vpc 'vpc-02763f495de8418b5' has dependencies and cannot be deleted.", not related to change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 8f49326 into terraform-redhat:main Jun 10, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants