Skip to content

Conversation

@gberenice
Copy link
Member

@gberenice gberenice commented Jan 5, 2026

what

  • Updates the way we handle RPM lock files - we should wait on the actual lock file (fuser /var/lib/rpm/.rpm.lock) and common package processes (dnf, yum, rpm) instead of deleting the lock.
  • Bumps AWS provider.

why

  • With the previous approach:
    • We were racing the RPM database: dnf (or yum) sometimes starts on its own right after boot (makecache/updates). The script then also tries to import the Tailscale GPG key during dnf install, but the RPM DB is already locked → GPG check failed.
    • The mitigation was risky: deleting /var/lib/rpm/.rpm.lock can corrupt the RPM DB. Also, other holders (yum, rpm, or any process holding the lock) aren’t caught.
Screenshot 2026-01-05 at 11 58 21
  • AWS provider constraint wasn't updated properly.

references

Summary by CodeRabbit

  • New Features

    • Added new security input options for encrypted uploads and SSL requests enforcement.
  • Chores

    • Updated AWS provider requirement to version 6.0+.
    • Bumped module dependencies to latest versions.
  • Bug Fixes

    • Enhanced provisioning script robustness with improved error handling and retry logic for transient failures.
    • Added OS validation to ensure compatibility with Amazon Linux 2023.

✏️ Tip: You can customize this high-level summary in your review settings.

@gberenice gberenice requested a review from a team as a code owner January 5, 2026 14:33
@gberenice gberenice requested a review from oycyc January 5, 2026 14:33
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

This PR updates infrastructure-as-code documentation and shell scripts to reflect newer AWS provider versions (>=6.0), bumps module dependencies (ssm_policy 2.0.1→2.0.2, tailscale_subnet_router 1.4.0→1.8.0), adds input variables for encryption and SSL enforcement, and significantly refactors the EC2 userdata script with improved error handling, centralized configuration, OS guards, and modular retry logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • oycyc
  • westonplatter

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Issue #80 reports an 'Unsupported attribute' error for the region attribute, but the PR changes focus on RPM lock handling and AWS provider bumping, with no modifications to address the core region attribute resolution issue. Review whether the AWS provider bump from >= 5.0 to >= 6.0 resolves the region attribute error in the tailscale module, or if additional code changes are needed to fix issue #80.
Out of Scope Changes check ❓ Inconclusive Changes to userdata.sh.tmpl (RPM lock handling, journald tuning, retry helpers) and README.md updates are within scope; however, the connection between these changes and resolving the region attribute error in issue #80 is unclear. Clarify whether the AWS provider version bump alone resolves issue #80 or if the userdata script changes are expected to mitigate related infrastructure issues.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix!: safe rpm lock handling + AWS provider bump' accurately summarizes the two main changes in the PR: improved RPM lock handling and AWS provider version update.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/rpm-lock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
userdata.sh.tmpl (1)

51-77: Core RPM lock handling fix addresses the race condition effectively.

The wait_pkg_lock function properly waits on /var/lib/rpm/.rpm.lock using fuser and checks for active dnf/rpm processes—this aligns with the PR objective to avoid deleting the lock file.

Minor concern with the fallback logic (lines 60-61): When fuser is unavailable, checking file existence (-e) doesn't indicate the lock is held—the file may exist but be unlocked. Consider using lsof as an alternative, or accept this as a conservative "better safe than sorry" fallback.

Also, the PR description mentions checking for yum, which isn't in the pgrep list. On AL2023, yum typically symlinks to dnf, so this is likely fine.

Optional: Add lsof fallback if preferred
     if command -v fuser >/dev/null 2>&1; then
       if fuser /var/lib/rpm/.rpm.lock >/dev/null 2>&1; then busy=true; fi
+    elif command -v lsof >/dev/null 2>&1; then
+      if lsof /var/lib/rpm/.rpm.lock >/dev/null 2>&1; then busy=true; fi
     else
       # Best-effort if fuser is missing
       if [ -e /var/lib/rpm/.rpm.lock ]; then busy=true; fi
     fi
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee4b10d and ef99d95.

📒 Files selected for processing (3)
  • README.md
  • userdata.sh.tmpl
  • versions.tf
🧰 Additional context used
📓 Path-based instructions (1)
**/*.tf

⚙️ CodeRabbit configuration file

**/*.tf: You're a Terraform expert who has thoroughly studied all the documentation from Hashicorp https://developer.hashicorp.com/terraform/docs and OpenTofu https://opentofu.org/docs/.
You have a strong grasp of Terraform syntax and prioritize providing accurate and insightful code suggestions.
As a fan of the Cloud Posse / SweetOps ecosystem, you incorporate many of their best practices https://docs.cloudposse.com/best-practices/terraform/ while balancing them with general Terraform guidelines.

Files:

  • versions.tf
🧠 Learnings (3)
📚 Learning: 2024-11-21T13:30:01.588Z
Learnt from: gberenice
Repo: masterpointio/terraform-aws-tailscale PR: 41
File: main.tf:0-0
Timestamp: 2024-11-21T13:30:01.588Z
Learning: In this Terraform module (`main.tf`), read permissions (`ssm:GetParameter`) for SSM parameters are managed by the SSM Agent module (`masterpointio/ssm-agent/aws`), so adding `ssm:GetParameter` permissions in the custom `ssm_policy` module is unnecessary.

Applied to files:

  • README.md
📚 Learning: 2024-11-18T13:00:36.189Z
Learnt from: gberenice
Repo: masterpointio/terraform-spacelift-aws-integrations PR: 2
File: versions.tf:5-8
Timestamp: 2024-11-18T13:00:36.189Z
Learning: If a Terraform module does not include any AWS resources, there's no need to declare the AWS provider.

Applied to files:

  • README.md
  • versions.tf
📚 Learning: 2024-10-30T16:38:33.362Z
Learnt from: gberenice
Repo: masterpointio/terraform-spacelift-automation PR: 3
File: modules/spacelift-automation/variables.tf:0-0
Timestamp: 2024-10-30T16:38:33.362Z
Learning: Spacelift requires a specific Terraform version number; it does not support version constraints like "~> 1.7.0".

Applied to files:

  • README.md
  • versions.tf
🔇 Additional comments (8)
README.md (2)

160-176: Documentation updates are consistent with code changes.

The Requirements, Providers, and Modules sections correctly reflect:

  • AWS provider >= 6.0 (matches versions.tf)
  • Module version bumps for ssm_policy (2.0.2) and tailscale_subnet_router (1.8.0)

These appear to be auto-generated by the pre-commit terraform-docs hook.


196-197: New security-focused inputs are a good addition.

The allow_encrypted_uploads_only and allow_ssl_requests_only inputs enhance the S3 bucket security posture for session logging, supporting encryption and SSL enforcement requirements.

userdata.sh.tmpl (5)

1-6: Solid shell script initialization with strict error handling.

The set -Eeuo pipefail is the right choice for infrastructure scripts—it ensures errors are caught early. The logging setup with tee and logger provides good observability.


13-25: Robust OS compatibility guard with sensible fallback.

Good defensive practice to verify the script runs only on Amazon Linux 2023. The primary /etc/os-release check with a fallback to /etc/system-release handles edge cases well.


79-89: Clean retry helper with exponential backoff.

The retry function with exponential backoff (up to ~62s total) combined with wait_pkg_lock provides robust package installation. The pkg wrapper keeps the installation commands concise.


96-104: Tailscale installation flow is well-structured.

The sequence—install prerequisites, add repo, refresh cache, install—follows best practices. Using || true for makecache is appropriate since cache refresh shouldn't block installation.


117-135: Tailscale startup sequence handles readiness and secrets appropriately.

Good practices:

  • Readiness wait loop before tailscale up (40s max)
  • set +x to prevent authkey exposure in logs

Note on line 134: The || { echo "WARNING..."; } makes tailscale up failures non-fatal. This is reasonable for resilience (the instance stays functional for debugging), but transient network issues during provisioning could leave the router unconnected. Consider whether this aligns with your operational preferences—some teams prefer a hard failure here.

versions.tf (1)

7-7: AWS provider constraint is appropriate, but the issue description needs correction.

Updating to AWS provider >= 6.0 is a good move. In v6.0, the data.aws_region data source deprecated its name attribute in favor of region. If Issue #80 involves data.aws_region attribute access, ensure any references have been updated from .name to .region to remain compatible with v6+.

Copy link
Member

@Gowiem Gowiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gberenice question on provider bump!

aws = {
source = "hashicorp/aws"
version = ">= 5.0"
version = ">= 6.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gberenice you mentioned that the AWS Provider constraint wasn't updated properly -- when did that happen and what was the usage of AWS that changed that required the bump? Technically this is should be a major provider rev I believe, so I want to make sure we're doing this right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gowiem This happened within PR #76, and caused this recently reported issue: This object has no argument, nested block, or exported attribute named "region".

PR title is already configured for a major bump.

@gberenice gberenice merged commit ce80388 into main Jan 6, 2026
6 checks passed
@gberenice gberenice deleted the fix/rpm-lock branch January 6, 2026 10:00
gberenice added a commit that referenced this pull request Jan 6, 2026
🤖 I have created a release *beep* *boop*
---


##
[2.0.0](v1.11.0...v2.0.0)
(2026-01-06)


### ⚠ BREAKING CHANGES

* safe rpm lock handling + AWS provider bump
([#82](#82))

### Bug Fixes

* safe rpm lock handling + AWS provider bump
([#82](#82))
([ce80388](ce80388))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: masterpointbot[bot] <177651640+masterpointbot[bot]@users.noreply.github.com>
Co-authored-by: Veronika Gnilitska <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

This object has no argument, nested block, or exported attribute named "region".

3 participants