Add HTTPS support for Ironic vmedia server with CI trust chain#214
Add HTTPS support for Ironic vmedia server with CI trust chain#214
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds optional HTTPS support for the Ironic ISO/vmedia server: new CA/cert generation scripts, an HTTPS static server, playbook/task updates to conditionally start/use TLS and manage secrets, CI workflow steps/env to generate and export certs, deployment script updates to inject certs, and validation for the optional cert. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI Workflow
participant GenCA as generate_ironic_ca.sh
participant GenCert as generate_ironic_cert.sh
participant Deploy as deploy_phase.sh
participant Ansible as Ansible Playbooks
participant VMedia as https_server.py / http server
participant Ironic as Ironic/node
CI->>GenCA: run to create CA (ca.crt, ca.key)
GenCA-->>CI: CA artifacts
CI->>GenCert: run to create server cert/key (SAN = lzBmcIP)
GenCert-->>CI: ENCLAVE_IRONIC_CERT/KEY/CA
CI->>Deploy: inject cert/key/ca into phase_vars
Deploy->>Ansible: start playbooks with HTTPS vars
Ansible->>VMedia: start vmedia server (https_server.py) or http server
VMedia-->>Ansible: server ready on (https|http)://lzBmcIP:iso_port
Ansible->>Ironic: configure boot ISO URL (choose http/https)
Ironic->>VMedia: request agent.x86_64.iso
VMedia-->>Ironic: serve ISO
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
scripts/deployment/https_server.py (1)
6-9: Consider adding argument validation for clearer error messages.If arguments are missing,
IndexErrorwill be raised with an unclear message. Since the Ansible task controls invocation, this is low risk, but explicit validation would aid debugging.♻️ Proposed argument validation
+if len(sys.argv) != 5: + sys.exit(f"Usage: {sys.argv[0]} <port> <directory> <certfile> <keyfile>") + port = int(sys.argv[1]) directory = sys.argv[2] certfile = sys.argv[3] keyfile = sys.argv[4]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/deployment/https_server.py` around lines 6 - 9, Add explicit argument validation in https_server.py before using sys.argv to avoid IndexError: check that len(sys.argv) >= 5 (or use argparse) and if not, raise a clear error or print a usage message indicating required arguments (port, directory, certfile, keyfile). Place the check above the assignments to port, directory, certfile, keyfile so the script fails with a descriptive message instead of an IndexError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@playbooks/tasks/configure_hardware_ironic_setup.yaml`:
- Around line 19-29: The current PID-only liveness checks (tasks using
iso_server_pidfile and iso_server_check) must be made config-aware: when a PID
exists and the process is running, inspect the process command/args or bound
socket to determine if it was started with the current ironic_https_configured
value and current certs; if there is a mismatch (protocol or cert fingerprint
differs) treat the server as stale and restart it (terminate PID and start
fresh). Also update the PEM rewrite tasks (the tasks that rewrite PEM files) to
trigger the same restart flow after cert rotation so an existing HTTPS process
is restarted and begins serving the new certificate.
In `@playbooks/tasks/deploy_ironic.yaml`:
- Around line 38-40: The current check sets ironic_https_configured true if only
ironicHttpsCertificate exists, which later causes undefined-variable errors when
ironicHttpsKey is missing; update the ansible.builtin.set_fact that sets
ironic_https_configured to require both ironicHttpsCertificate and
ironicHttpsKey to be defined and non-empty (e.g., test both | defined and |
length > 0) so HTTPS is only enabled when both PEM values are present, and
ensure downstream tasks that dereference ironicHttpsKey (in
playbooks/tasks/configure_hardware_ironic_setup.yaml and this task) will only
run when ironic_https_configured is true.
In `@scripts/deployment/generate_ironic_ca.sh`:
- Around line 27-46: The script writes the CA private key into SUSHY_DIR and
makes it world-readable; move ca.key out of the mounted SUSHY_DIR into a
non-mounted, restricted location (e.g. a local secure dir under WORKING_DIR like
WORKING_DIR/ironic-ca or a secure temp dir), write ca.crt into SUSHY_DIR only,
set strict permissions on the private key (chmod 600) and avoid sudoing it to
644, and update the other script generate_ironic_cert.sh to reference the new
private-key path (keep ca.crt path unchanged for start_sushy_tools.sh). Ensure
you modify references to ca.key and ca.crt in generate_ironic_ca.sh and
generate_ironic_cert.sh accordingly.
In `@scripts/deployment/generate_ironic_cert.sh`:
- Around line 42-49: The script loads dev-scripts config using
ENCLAVE_CLUSTER_NAME but then computes LZ_VM_NAME from CLUSTER_NAME, causing
mismatched cluster selection; update the code so ENCLAVE_CLUSTER_NAME is the
single source of truth: ensure ENCLAVE_CLUSTER_NAME is initialized
(ENCLAVE_CLUSTER_NAME="${ENCLAVE_CLUSTER_NAME:-enclave-test}"), call
load_devscripts_config, and then derive LZ_VM_NAME from ENCLAVE_CLUSTER_NAME
(e.g., LZ_VM_NAME="${ENCLAVE_CLUSTER_NAME}_landingzone_0") instead of using
CLUSTER_NAME, and remove or sync CLUSTER_NAME to avoid divergence.
In `@scripts/deployment/https_server.py`:
- Around line 14-16: The SSLContext initialization using ssl.PROTOCOL_TLS_SERVER
allows older TLS versions; update the context created in the code that sets ctx
= ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) to explicitly enforce TLS 1.2+ by
setting the context's minimum_version (e.g., ctx.minimum_version =
ssl.TLSVersion.TLSv1_2) or equivalent API before calling ctx.load_cert_chain and
ctx.wrap_socket; adjust the code around ctx, load_cert_chain, and server.socket
= ctx.wrap_socket(server.socket, server_side=True) to ensure only TLSv1.2 and
higher are permitted.
---
Nitpick comments:
In `@scripts/deployment/https_server.py`:
- Around line 6-9: Add explicit argument validation in https_server.py before
using sys.argv to avoid IndexError: check that len(sys.argv) >= 5 (or use
argparse) and if not, raise a clear error or print a usage message indicating
required arguments (port, directory, certfile, keyfile). Place the check above
the assignments to port, directory, certfile, keyfile so the script fails with a
descriptive message instead of an IndexError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26c4be6b-b693-4f2d-bc12-00e3ede704bf
📒 Files selected for processing (12)
.github/workflows/e2e-deployment.ymlconfig/certificates.example.yamlplaybooks/tasks/configure_hardware_ironic_boot.yamlplaybooks/tasks/configure_hardware_ironic_cleanup.yamlplaybooks/tasks/configure_hardware_ironic_setup.yamlplaybooks/tasks/deploy_ironic.yamlscripts/deployment/deploy_phase.shscripts/deployment/generate_ironic_ca.shscripts/deployment/generate_ironic_cert.shscripts/deployment/https_server.pyscripts/infrastructure/start_sushy_tools.shvalidations.sh
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@playbooks/tasks/configure_hardware_ironic_setup.yaml`:
- Around line 25-29: The current check only tests PID existence; update the
"Check if ISO server process is running" task (the task that reads
iso_server_pidfile and registers iso_server_check) to validate the process
command before trusting the pidfile: read /proc/{{ iso_server_pidfile }}
contents to get pid, then inspect /proc/<pid>/cmdline or use ps -o args= for
that pid and confirm it matches expected identifiers (e.g., "http.server" or
"https_server.py") and only then set iso_server_check as authoritative; apply
the same validated-check logic to the later tasks that decide to kill or skip
starting the ISO server (the tasks that act on iso_server_check and the pidfile)
so you never kill or reuse a recycled/unrelated PID.
In `@playbooks/tasks/deploy_ironic.yaml`:
- Around line 66-82: Update the two containers.podman.podman_secret tasks for
"ironic-vmedia-cert" and "ironic-vmedia-key" to use force: true so Podman
actually updates secrets on reruns, and ensure the httpd container task that
currently uses state: started will be restarted when those secrets change—either
change that task to state: restarted (or set recreate: true if supported) or add
a notify/handler (e.g., "Restart httpd container") and have the secret tasks
notify it so the httpd container is recreated/restarted after secret rotation.
In `@scripts/deployment/generate_ironic_cert.sh`:
- Around line 109-128: The script generate_ironic_cert.sh currently exports only
ENCLAVE_IRONIC_CERT and ENCLAVE_IRONIC_KEY to GITHUB_ENV; also export the CA
certificate referenced by CA_CRT (the CA PEM at variable CA_CRT) as
ENCLAVE_IRONIC_CA in the same GITHUB_ENV/stdout blocks and ensure
phase_vars.yaml is populated with that CA value so downstream playbooks can
consume it; then update playbooks playbooks/tasks/deploy_ironic.yaml and
playbooks/tasks/configure_hardware_ironic_setup.yaml to stop using
validate_certs: false and instead point ansible.builtin.uri’s ca_path parameter
to the deployed CA file (from phase_vars ENCLAVE_IRONIC_CA) so HTTPS readiness
probes validate against the correct trust chain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 44b3ce9d-f89b-48b6-96e8-bfbcd2e52011
📒 Files selected for processing (6)
playbooks/tasks/configure_hardware_ironic_cleanup.yamlplaybooks/tasks/configure_hardware_ironic_setup.yamlplaybooks/tasks/deploy_ironic.yamlscripts/deployment/generate_ironic_ca.shscripts/deployment/generate_ironic_cert.shscripts/deployment/https_server.py
🚧 Files skipped from review as they are similar to previous changes (3)
- playbooks/tasks/configure_hardware_ironic_cleanup.yaml
- scripts/deployment/https_server.py
- scripts/deployment/generate_ironic_ca.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@playbooks/tasks/configure_hardware_ironic_setup.yaml`:
- Around line 150-159: The HTTPS verification task "Verify ISO is accessible via
temporary HTTPS server" currently unconditionally sets ca_path to "{{
ironic_ca_certfile | default(omit) }}", which can still pass an empty value;
change the task to only provide ca_path when a custom CA is actually set by
replacing the ca_path assignment with a conditional that uses omit when
ironic_ca_certfile is undefined or empty (e.g. use a Jinja expression testing
ironic_ca_certfile is defined and not empty, otherwise omit). Ensure the
variable name referenced is ironic_ca_certfile and mirror the same conditional
fix applied in deploy_ironic.yaml.
In `@playbooks/tasks/deploy_ironic.yaml`:
- Around line 66-79: The task sets ironic_ca_certfile whenever
ironic_https_configured is true but only writes the file when
ironicHttpsCaCertificate is provided, so ca_path may point to a non-existent
file; update the playbook to only define/set the ironic_ca_certfile variable
when ironicHttpsCaCertificate is defined and non-empty (i.e., add the same
conditional used for the copy task), and ensure any downstream use of ca_path
(the verification task that consumes ironic_ca_certfile) checks that
ironic_ca_certfile is defined before passing it to ca_path to avoid referencing
a missing file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 949748e8-b7db-4a41-b15f-aec0d7900689
📒 Files selected for processing (5)
playbooks/tasks/configure_hardware_ironic_cleanup.yamlplaybooks/tasks/configure_hardware_ironic_setup.yamlplaybooks/tasks/deploy_ironic.yamlscripts/deployment/deploy_phase.shscripts/deployment/generate_ironic_cert.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- playbooks/tasks/configure_hardware_ironic_cleanup.yaml
- scripts/deployment/deploy_phase.sh
- scripts/deployment/generate_ironic_cert.sh
88d14c5 to
2be96de
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/infrastructure/start_sushy_tools.sh (1)
243-251: LGTM - Consider adding a debug log for the skip case.The CA installation logic is correct: path mapping aligns with the container mount, the trust anchor location is appropriate for RHEL/Fedora-based images, and the
&&chain ensures atomicity. Withset -euo pipefail, a failure to install an expected CA will correctly abort the script rather than silently continuing with broken TLS.For observability, you might want to add an
elsebranch with a debug/info message when the CA file isn't found, which can help during troubleshooting:💡 Optional improvement for observability
if [ -f "${SUSHY_DIR}/ca.crt" ]; then info "Installing Ironic ISO server CA into sushy-tools container trust store..." sudo podman exec "$CONTAINER_NAME" bash -c \ "cp /root/sushy/ca.crt /etc/pki/ca-trust/source/anchors/ironic-iso-ca.crt && update-ca-trust" success "Ironic ISO server CA installed in sushy-tools trust store" +else + info "No Ironic ISO server CA found, skipping trust store update (HTTPS not configured)" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/infrastructure/start_sushy_tools.sh` around lines 243 - 251, Add an else branch to the existing CA-installation if-block so we log when the CA file is missing; specifically, in the block that checks [ -f "${SUSHY_DIR}/ca.crt" ] (which currently calls info, podman exec cp/update-ca-trust, and success), add an else that emits a concise info/debug message (using the same info function) like "CA not found at ${SUSHY_DIR}/ca.crt; skipping Ironic ISO CA installation for ${CONTAINER_NAME}" to improve observability when the install is skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@playbooks/tasks/deploy_ironic.yaml`:
- Around line 295-305: The HTTPS vmedia verification task "Verify HTTPS vmedia
server accessibility" uses a hardcoded port 8083 which doesn't exist; replace
the literal ":8083" on the URL with a configurable port variable (e.g.,
vmedia_https_port) and default it to the actual service port (use the
metal3-ironic HTTP_PORT value 6180 or the existing iso_server_port) so the URL
becomes "https://{{ lzBmcIP }}:{{ vmedia_https_port }} /" (ensure
vmedia_https_port is set or defaults to iso_server_port or 6180) and keep the
existing conditional use of ironic_ca_certfile and ironic_https_configured
unchanged.
---
Nitpick comments:
In `@scripts/infrastructure/start_sushy_tools.sh`:
- Around line 243-251: Add an else branch to the existing CA-installation
if-block so we log when the CA file is missing; specifically, in the block that
checks [ -f "${SUSHY_DIR}/ca.crt" ] (which currently calls info, podman exec
cp/update-ca-trust, and success), add an else that emits a concise info/debug
message (using the same info function) like "CA not found at
${SUSHY_DIR}/ca.crt; skipping Ironic ISO CA installation for ${CONTAINER_NAME}"
to improve observability when the install is skipped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 251c3d46-1675-480d-8ffc-e7fd92aca715
📒 Files selected for processing (12)
.github/workflows/e2e-deployment.ymlconfig/certificates.example.yamlplaybooks/tasks/configure_hardware_ironic_boot.yamlplaybooks/tasks/configure_hardware_ironic_cleanup.yamlplaybooks/tasks/configure_hardware_ironic_setup.yamlplaybooks/tasks/deploy_ironic.yamlscripts/deployment/deploy_phase.shscripts/deployment/generate_ironic_ca.shscripts/deployment/generate_ironic_cert.shscripts/deployment/https_server.pyscripts/infrastructure/start_sushy_tools.shvalidations.sh
✅ Files skipped from review due to trivial changes (2)
- config/certificates.example.yaml
- playbooks/tasks/configure_hardware_ironic_cleanup.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- playbooks/tasks/configure_hardware_ironic_boot.yaml
- scripts/deployment/https_server.py
- .github/workflows/e2e-deployment.yml
- scripts/deployment/generate_ironic_ca.sh
- scripts/deployment/generate_ironic_cert.sh
819e90b to
5efbfb3
Compare
5efbfb3 to
aa9f4df
Compare
83ad7a6 to
73d4157
Compare
d002c94 to
23d99ba
Compare
6ef8442 to
c73a165
Compare
Rename all http_server_* Ansible variables to protocol-neutral iso_server_* names and update default pid/log file paths accordingly. Introduce a Python HTTPS server (https_server.py) alongside the existing plain-HTTP fallback. Ironic playbooks now branch on ironic_https_configured to start the correct server, write TLS cert/key to temp files, and verify the ISO via the appropriate scheme. Cleanup removes the temp cert/key files and the corresponding podman secrets. deploy_ironic.yaml derives ironic_https_configured from ironicHttpsCertificate and mounts vmedia TLS secrets into the httpd container when configured. For CI, introduce a CA-based trust chain so sushy-tools (the BMC emulator) trusts the certificate used to serve ISOs: - generate_ironic_ca.sh: generates a CA before infrastructure starts and writes key+cert to the sushy-tools working directory. - generate_ironic_cert.sh: after the Landing Zone is up, reads lzBmcIP via SSH, generates a server key+CSR locally, signs with the CA, and exports cert+key to GITHUB_ENV. - start_sushy_tools.sh: installs the CA into the container trust store via `podman exec update-ca-trust` after health check. - deploy_phase.sh: injects ENCLAVE_IRONIC_HTTPS, ENCLAVE_IRONIC_CERT, and ENCLAVE_IRONIC_KEY into phase_vars.yaml as YAML block scalars. - e2e-deployment.yml: enables HTTPS in both jobs, adds generate_ironic_ca (before infra creation) and generate_ironic_cert (before Phase 7) steps, and tracks them in the failure-reporting step_names/step_order arrays. Signed-off-by: Rafa Porres Molina <rporresm@redhat.com> Assisted-by: Claude <noreply@anthropic.com>
c73a165 to
cd69c96
Compare
|
Tarball created: |
Introduce end-to-end HTTPS serving for Ironic virtual media ISOs,
including a CA-based trust chain for CI.
HTTPS vmedia support (Ansible)
ironic containers; set IRONIC_EXTERNAL_IP and VMEDIA_TLS_PORT so
configure-ironic.sh generates https:// virtual media URLs.
replace all hardcoded port references with ironic_vmedia_https_port.
the Ironic container, using a file:// URL for boot_iso.
block/rescue: on timeout, capture podman logs httpd and re-fail with a
descriptive message.
CA trust chain for CI
write key+cert to ironic-ca/ (never relabeled by container SELinux
mounts) and copy into the sushy-tools working directory.
via SSH, sign a server CSR with the CA, export cert+key to GITHUB_ENV.
podman exec update-ca-trust; log when CA is absent.
ENCLAVE_IRONIC_KEY into phase_vars.yaml as YAML block scalars; use
printf "%s\n" when feeding PEM values to while-read to prevent the
closing PEM marker from being silently dropped.
Test HTTPS in connected, HTTP in disconnected
only exercises the HTTP-only code path (port 6180, no certs injected).
Generate Ironic ISO server TLS certificate step in both jobs so the
step is skipped when HTTPS is not configured.
check so the disconnected run skips CA generation and logs an
informative message instead.
Signed-off-by: Rafa Porres Molina rporresm@redhat.com
Assisted-by: Claude noreply@anthropic.com
https://redhat.atlassian.net/browse/MGMT-22877
Summary by CodeRabbit
New Features
Chores
Tests/Validation