Add private-network VM install modes (DNS-01, self-signed)#1106
Add private-network VM install modes (DNS-01, self-signed)#1106RAVEENSR wants to merge 2 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 38 minutes and 26 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughAdds two new TLS modes ( ChangesNew TLS Modes for Advanced VM Installer
Sequence Diagram(s)sequenceDiagram
participant Operator
participant install_advanced as install-advanced.sh
participant lib_tls as lib-tls.sh
participant OpenSSL
participant Docker_LEGO as Docker / LEGO
participant Caddy
participant systemd
Operator->>install_advanced: run --config amp-config.env
rect rgba(100, 149, 237, 0.5)
Note over install_advanced: selfsigned mode
install_advanced->>lib_tls: generate_selfsigned_ca(cert_dir)
lib_tls->>OpenSSL: generate CA key + cert
lib_tls->>OpenSSL: generate leaf CSR + sign with SANs
OpenSSL-->>lib_tls: ca.crt, fullchain.pem, privkey.pem
lib_tls->>install_advanced: validate_cert → abort on failure
end
rect rgba(60, 179, 113, 0.5)
Note over install_advanced: letsencrypt-dns mode
install_advanced->>lib_tls: issue_dns01_cert(cert_dir)
lib_tls->>Docker_LEGO: docker run goacme/lego:v4.35.2 --dns provider run
Docker_LEGO-->>lib_tls: AMP_HOST_CONSOLE.crt/.key
lib_tls-->>install_advanced: fullchain.pem, privkey.pem
end
install_advanced->>Caddy: start container with cert bind-mount
Caddy-->>install_advanced: up
rect rgba(60, 179, 113, 0.5)
Note over install_advanced: letsencrypt-dns renewal
install_advanced->>lib_tls: install_renewal_timer(cert_dir, env_file)
lib_tls->>systemd: write amp-cert-renew.service + .timer
lib_tls->>systemd: daemon-reload + enable + start
end
install_advanced-->>Operator: Access URLs (+ CA import note for selfsigned)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
808e33b to
551b969
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
deployments/vm/lib-tls.sh (2)
49-65: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winAdd
serverAuthEKU and reconsider the 10-year leaf validity for the self-signed leaf.The leaf only carries
subjectAltName. Two interop gaps for theselfsignedmode:
- Missing
extendedKeyUsage = serverAuth(andkeyUsage). Apple platforms (Safari/macOS/iOS) reject TLS server certs withoutserverAutheven when the CA is trusted.- The default
days=3650leaf exceeds Apple's 398-day maximum validity for TLS leaf certificates, so such clients will reject the cert regardless of trust.If the target audience is browser/HTTPS clients, consider emitting
serverAuthEKU on the leaf and using a shorter leaf validity (long-lived CA, short-lived leaf).♻️ Proposed extension tweak
openssl x509 -req -in "${dir}/leaf.csr" -CA "${dir}/ca.crt" -CAkey "${dir}/ca.key" \ -CAcreateserial -days "$days" \ - -extfile <(printf 'subjectAltName=%s\n' "$san") \ + -extfile <(printf 'subjectAltName=%s\nbasicConstraints=CA:FALSE\nkeyUsage=digitalSignature,keyEncipherment\nextendedKeyUsage=serverAuth\n' "$san") \ -out "${dir}/leaf.crt" >/dev/null 2>&1 || die "leaf signing failed"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/vm/lib-tls.sh` around lines 49 - 65, The generate_selfsigned_ca function's leaf certificate generation is missing critical extensions required by Apple platforms. The openssl x509 command that creates the leaf certificate (which currently only includes subjectAltName via the extfile parameter) needs to be enhanced to also include extendedKeyUsage with serverAuth and appropriate keyUsage extensions in the same extfile. Additionally, the default leaf validity of 3650 days exceeds Apple's 398-day maximum limit for TLS certificates, so consider reducing the leaf certificate validity to a shorter duration (such as 90 or 365 days) while keeping the CA validity long, since the days parameter is currently being applied uniformly to both. Update the printf command that generates the extension file content to include the additional required extensions alongside the existing subjectAltName.
102-134: 🩺 Stability & Availability | 🔵 TrivialAdd validation for docker
--env-fileformat compatibility inamp-config.env.The template format is docker-compatible (plain
KEY=valuelines), but users could manually addexportprefixes or quoted values during editing. Docker's--env-fileparser silently treats these as literal variable names (e.g.,export FOO=barbecomes a var namedexport FOO), causing lego to receive malformed or missing credentials without obvious error. Consider adding a pre-flight check or validation function to ensure the file format matches docker's parser expectations before passing it to the renewal job.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/vm/lib-tls.sh` around lines 102 - 134, The render_renewal_units function uses the env_file parameter directly in the docker run command without validating that it conforms to Docker's --env-file format expectations. Create a validation function that checks the env_file for lines containing export prefixes or quoted values that Docker's parser will silently treat as literal variable names, causing credential delivery failures. Call this validation function before the env_file is passed to the docker run command in the ExecStart directive to ensure users receive explicit errors rather than silent credential failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deployments/vm/lib-tls.sh`:
- Around line 82-95: The issue_dns01_cert function only binds the cert_dir
directory into the lego container but does not mount file-based DNS provider
credentials that may be referenced in environment variables such as
GCE_SERVICE_ACCOUNT_FILE. Add logic to detect when credential environment
variables reference file paths and bind-mount those files (or their parent
directories) into the container using additional -v arguments in the docker run
command, similar to how envargs is already being passed. The same fix should
also be applied to the renewal service function at line 118.
- Around line 71-76: The _lego_cred_env_args function uses compgen -e to list
only exported variables, but the load_config function sources configuration
files without using set -a, so credential variables like AWS_ACCESS_KEY_ID and
CF_DNS_API_TOKEN are not exported. Fix this by ensuring variables are exported
when sourcing the configuration file: either modify the load_config function to
wrap the source command with set -a before and set +a after sourcing the config
file, or require that credential variables in amp-config.env use export
statements. Additionally, in the _lego_cred_env_args function, simplify the grep
regex pattern by removing the redundant AWS_REGION$ condition since the AWS_
prefix already matches it.
In `@deployments/vm/tests/run.sh`:
- Line 671: The assertion checking for the renew action uses a broad substring
match that can false-pass when matching text like "renewal" instead of just the
"renew" command token. Replace the current check using `has "$units" ' renew'`
with a more specific assertion that verifies the ExecStart= command line ends
with the "renew" action token rather than doing a simple substring match,
ensuring the lego action is correctly set to renew and not matching partial
words.
In `@documentation/docs/getting-started/on-a-vm.mdx`:
- Line 303: The selfsigned/TLS section header is ambiguous because it describes
a private install that needs outbound internet (line 303) but also mentions
"Fully offline" (line 395), which could be misinterpreted as requiring no
internet access. Reword the section header from its current phrasing to
explicitly state "offline TLS / no external CA" to clarify that "offline" refers
to not requiring an external Certificate Authority, not an air-gapped network
with no internet. Update both the header at line 303 and the related text at
line 395 to use this clearer terminology.
---
Nitpick comments:
In `@deployments/vm/lib-tls.sh`:
- Around line 49-65: The generate_selfsigned_ca function's leaf certificate
generation is missing critical extensions required by Apple platforms. The
openssl x509 command that creates the leaf certificate (which currently only
includes subjectAltName via the extfile parameter) needs to be enhanced to also
include extendedKeyUsage with serverAuth and appropriate keyUsage extensions in
the same extfile. Additionally, the default leaf validity of 3650 days exceeds
Apple's 398-day maximum limit for TLS certificates, so consider reducing the
leaf certificate validity to a shorter duration (such as 90 or 365 days) while
keeping the CA validity long, since the days parameter is currently being
applied uniformly to both. Update the printf command that generates the
extension file content to include the additional required extensions alongside
the existing subjectAltName.
- Around line 102-134: The render_renewal_units function uses the env_file
parameter directly in the docker run command without validating that it conforms
to Docker's --env-file format expectations. Create a validation function that
checks the env_file for lines containing export prefixes or quoted values that
Docker's parser will silently treat as literal variable names, causing
credential delivery failures. Call this validation function before the env_file
is passed to the docker run command in the ExecStart directive to ensure users
receive explicit errors rather than silent credential failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 393f3b00-bb03-4405-87eb-36ff4d76e812
📒 Files selected for processing (8)
deployments/vm/install-advanced.shdeployments/vm/install-vm.shdeployments/vm/lib-advanced.shdeployments/vm/lib-bootstrap.shdeployments/vm/lib-tls.shdeployments/vm/tests/preflight.shdeployments/vm/tests/run.shdocumentation/docs/getting-started/on-a-vm.mdx
The advanced VM installer only supported public-facing TLS (ACME
HTTP-01/TLS-ALPN-01 and an upstream-terminated mode), none of which
work on a VM with no public ingress. Add two modes for private
networks with internet egress, both of which only produce a cert and
then reuse the existing byoc serving path (Caddy cert mounts,
validate_cert) so lib-vm.sh is untouched:
- letsencrypt-dns: public-trusted certs via ACME DNS-01 (the CA reads
a TXT record and never connects to the VM, so split-horizon A
records are fine). Issued by a dockerized lego with a daily systemd
renewal timer.
- selfsigned: an offline local CA + leaf for fully air-gapped hosts.
load_config exports the sourced config so DNS-provider credentials
(CF_*, AWS_*, GCE_*, ...) actually reach the dockerized lego, which
forwards only exported vars via docker run -e; without this,
token-based providers would silently get no credentials.
Also folds in robustness fixes found during real-VM testing: the
advisory DNS pre-flight no longer reports success when no hostname
resolves; lego is pinned to v4.x (v5 dropped the global flags the
installer passes, breaking issuance); and bootstrap raises
fs.inotify.max_user_{instances,watches} so a clean k3d install does
not exhaust the default ceiling ("Too many open files").
All three TLS modes verified end-to-end on real GCP VMs.
Restructure the Advanced VM install docs around the network the VM lives on, so a reader follows one self-contained path instead of cross-referencing. Split into per-network tabs (public vs private), document the letsencrypt-dns and selfsigned modes inline, and update the intro to frame private-network installs as a first-class option.
551b969 to
8a943e5
Compare
Purpose
The VM installer has two entry points today — the simple installer (
install-vm.sh, sslip.io + Let's Encrypt on:443) and the advanced installer (install-advanced.shwithletsencrypt | byoc | upstream). Both assume the VM is reachable from the public internet. Many companies will not expose the console/API publicly and want Agent Manager reachable only from their private network. The blocker is TLS: publicletsencrypt(TLS-ALPN-01) needs the ACME CA to connect inbound to the VM on:443, which a private VM cannot accept.Related to #1017 (VM standalone install).
Resolves #1081
Goals
Add two TLS modes to the advanced installer so a VM with private ingress and internet egress can run Agent Manager without ever being publicly reachable:
letsencrypt-dns(recommended) — public-trusted certificates via the ACME DNS-01 challenge. The CA validates a DNS TXT record and never connects to the VM, so it works behind a firewall.Arecords may point at the VM's private IP (split-horizon).selfsigned— fully offline: a generated local CA + leaf for teams with neither a public DNS zone nor an internal CA.byoc(internal-CA cert) andupstream(internal LB) already cover the remaining private variants and are unchanged.Approach
Both new modes only produce a cert/key on disk and then reuse the existing
byocserving path (Caddyfile rendering, container cert mounts, andvalidate_cert's SAN/wildcard coverage checks) — solib-vm.shis untouched and the serving logic is shared.deployments/vm/lib-tls.sh: pure helpers (tls_san_list,build_lego_args,render_renewal_units) plus side-effecting issuance (issue_dns01_certvia the dockerizedgoacme/legoimage,generate_selfsigned_cavia openssl,install_renewal_timer).letsencrypt-dnsissues a cert covering every service host + the*.<AGENTS_BASE>wildcard, copies it to/opt/amp/certs/, and installs a dailysystemdtimer (lego renew+caddy reload).selfsignedgenerates a local CA + leaf and writes the CA to/opt/amp/certs/ca.crtfor distribution to client trust stores.validate_configlearns the two modes (letsencrypt-dnsrequiresDNS_PROVIDER+ACME_EMAIL;selfsignedrequires nothing extra), the--inittemplate documents them, and--dry-runpreviews the issuance/generation plan without doing it.User stories
As a platform operator at a security-conscious company, I can install Agent Manager on a private-network-only VM with trustworthy TLS — either public-trusted certificates via DNS-01 (no public ingress) or a generated local CA when fully offline.
Release note
The advanced VM installer now supports private-network-only deployments via two new TLS modes:
letsencrypt-dns(public-trusted certificates through the ACME DNS-01 challenge, with automatic renewal) andselfsigned(a generated local CA for fully offline installs).Documentation
Updated
documentation/docs/getting-started/on-a-vm.mdx: new "Private network (no public exposure)" section, the two new modes added to the TLS-modes and config-key tables, DNS split-horizon guidance for DNS-01, and troubleshooting entries.Training
N/A — no training content impact.
Certification
N/A — no impact on certification exams.
Marketing
N/A — no marketing content for this change.
Automation tests
Extended
deployments/vm/tests/run.sh(SAN list, lego arg builder, renewal-unit rendering,validate_configfor both modes,--inittemplate, and--dry-runfor both modes) anddeployments/vm/tests/preflight.sh(local-CA generation produces a cert whose SANs cover every host + the agent wildcard, verified viavalidate_cert+ openssl). Both suites pass; the changed scripts are shellcheck-clean.The suites are hermetic (no Docker/cluster). A live DNS-01 issuance against a real zone and a
selfsignedbrowser-trust check on a GCP VM are planned as an out-of-band smoke test, mirroring the earlier VM-install smoke tests (this PR is a draft pending that).Security checks
amp-config.envat install time and the renewal config copy is written0600; none are committed.Samples
N/A — no new samples. The docs include a DNS-01 (AWS Route 53) walkthrough.
Related PRs
Builds on the VM standalone (#1017) and advanced-install work in
deployments/vm/.Migrations (if applicable)
N/A — additive feature; no migration required. Switching
TLS_MODEon an existing install only re-renders Caddy and re-issues/regenerates the certificate.Test environment
Unit + preflight suites run locally on macOS (bash 3.2). The installer targets Linux VMs (Ubuntu/Debian, bash 4+); the real-VM smoke test is pending (see Automation tests).
Learning
Key insight: the ACME DNS-01 challenge validates a DNS TXT record rather than connecting to the host, so it can issue public-trusted (and wildcard) certificates for a server with no public ingress. Used
lego(single-binary ACME client, ~150 DNS providers) to keep the stockcaddy:2image and reuse the existingbyocserving path rather than building a custom Caddy image with DNS plugins.Summary by CodeRabbit
New Features
Improvements
Documentation