Skip to content

OCPBUGS-78589: baremetal: add serial console logging for bootstrap VM#10400

Open
elfosardo wants to merge 1 commit intoopenshift:mainfrom
elfosardo:bootstrap-vm-serial-logs
Open

OCPBUGS-78589: baremetal: add serial console logging for bootstrap VM#10400
elfosardo wants to merge 1 commit intoopenshift:mainfrom
elfosardo:bootstrap-vm-serial-logs

Conversation

@elfosardo
Copy link
Contributor

@elfosardo elfosardo commented Mar 17, 2026

The bootstrap VM created by the installer does not have a serial console
log file configured, making it impossible to diagnose boot failures when
the VM is unreachable via SSH. Master/worker VMs already have this via
dev-scripts.

Add a serial device with a log file at
/var/log/libvirt/qemu/-serial0.log, which is automatically
collected by the existing CI gather step. The serial target type is left
unset so that libvirt auto-selects the appropriate device for each
architecture. Append a console= kernel argument using the
architecture-appropriate device (ttyS0 for x86_64, ttyAMA0 for aarch64,
ttysclp0 for s390x, hvc0 for ppc64le) so that kernel and userspace
output is directed to the serial console.

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 17, 2026
@openshift-ci-robot
Copy link
Contributor

@elfosardo: This pull request references Jira Issue OCPBUGS-78589, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The bootstrap VM created by the installer does not have a serial console log file configured, making it impossible to diagnose boot failures when the VM is unreachable via SSH. Master/worker VMs already have this via dev-scripts. Add a serial device with a log file at /var/log/libvirt/qemu/-serial0.log, which is automatically collected by the existing CI gather step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e7b13def-ec12-42ba-8cad-719e8dcc9107

📥 Commits

Reviewing files that changed from the base of the PR and between d0e3df7 and a9540c0.

📒 Files selected for processing (2)
  • pkg/infrastructure/baremetal/bootstrap.go
  • pkg/infrastructure/baremetal/bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/infrastructure/baremetal/bootstrap_test.go
  • pkg/infrastructure/baremetal/bootstrap.go

Walkthrough

Add a per-domain serial device with logging to the bootstrap domain; pass architecture into live-ISO selection and conditionally append an architecture-mapped console=<device> kernel arg; tests updated to assert serial presence, log file and append flag, and that serial Target remains nil; per-architecture tests now always require serials.

Changes

Cohort / File(s) Summary
Bootstrap implementation
pkg/infrastructure/baremetal/bootstrap.go
Append a new DomainSerial with Log.File set to /var/log/libvirt/qemu/<name>-serial0.log and Log.Append = "on" to domainDef.Devices.Serials; introduce arch := capabilities.Host.CPU.Arch and pass it to getLiveISO; add an architecture→console device mapping and conditionally append console=<device> to kernel args when mapped. No other control-flow or error-path changes.
Unit tests
pkg/infrastructure/baremetal/bootstrap_test.go
Update tests to assert dom.Devices.Serials is non-empty, that the first serial includes a Log with expected File path and Append set to "on", and that the serial Target is nil; extend per-architecture tests to require serials regardless of graphics configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from honza and iurygregory March 17, 2026 10:02
@elfosardo
Copy link
Contributor Author

/cc @zaneb @tdomnesc

@openshift-ci openshift-ci bot requested review from tdomnesc and zaneb March 17, 2026 10:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/infrastructure/baremetal/bootstrap_test.go (1)

25-28: Add arch-specific assertions for serial target compatibility.

These assertions cover log wiring well, but they don’t guard against invalid serial target/model on non-x86 arches. Consider extending TestConfigureDomainArch (or adding a focused test) to validate serial target settings per arch.

As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/baremetal/bootstrap_test.go` around lines 25 - 28, The
current assertions in TestConfigureDomainArch only validate serial log wiring
(dom.Devices.Serials and dom.Devices.Serials[0].Log) but miss validating serial
target/model per architecture; update TestConfigureDomainArch (or add a focused
test) to assert that dom.Devices.Serials[0].Target.Name and
dom.Devices.Serials[0].Target.Model (or the equivalent fields on the Serial
struct) are set to the expected values for each supported arch (e.g., on x86
expect target "isa-serial" or model "chardev"/whatever the project standard is,
and on non-x86 assert the compatible target/model or absence thereof), iterating
or branching by the test's arch input to fail fast on incompatible combos and
keeping the existing log assertions (Serials, Log, Log.File, Log.Append).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/baremetal/bootstrap.go`:
- Around line 80-85: The serial target is being hardcoded to "isa-serial" which
breaks non-x86 bootstrap domains; update newDomain() (or the constructor that
builds the libvirtxml.DomainSerialTarget) to accept the guest architecture
(e.g., arch string) and conditionally set DomainSerialTarget.Type and
DomainSerialTargetModel.Name based on that architecture (or simply omit Type and
Model to let libvirt auto-select), ensuring you adjust all callers to pass the
architecture and change the DomainSerialTarget instantiation instead of always
using "isa-serial".

---

Nitpick comments:
In `@pkg/infrastructure/baremetal/bootstrap_test.go`:
- Around line 25-28: The current assertions in TestConfigureDomainArch only
validate serial log wiring (dom.Devices.Serials and dom.Devices.Serials[0].Log)
but miss validating serial target/model per architecture; update
TestConfigureDomainArch (or add a focused test) to assert that
dom.Devices.Serials[0].Target.Name and dom.Devices.Serials[0].Target.Model (or
the equivalent fields on the Serial struct) are set to the expected values for
each supported arch (e.g., on x86 expect target "isa-serial" or model
"chardev"/whatever the project standard is, and on non-x86 assert the compatible
target/model or absence thereof), iterating or branching by the test's arch
input to fail fast on incompatible combos and keeping the existing log
assertions (Serials, Log, Log.File, Log.Append).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59d69587-6b6e-4c00-b0f5-f1d17a074932

📥 Commits

Reviewing files that changed from the base of the PR and between b3e61d0 and 46881fd.

📒 Files selected for processing (2)
  • pkg/infrastructure/baremetal/bootstrap.go
  • pkg/infrastructure/baremetal/bootstrap_test.go

@elfosardo
Copy link
Contributor Author

/retest

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serial log seems to only contain logs from the bootloader. I think you also need to add console=ttyS0 to the kernel command line at line 261.

@elfosardo elfosardo force-pushed the bootstrap-vm-serial-logs branch from 46881fd to 68d4847 Compare March 18, 2026 09:29
@elfosardo elfosardo requested a review from zaneb March 18, 2026 09:30
@openshift-ci-robot
Copy link
Contributor

@elfosardo: This pull request references Jira Issue OCPBUGS-78589, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

The bootstrap VM created by the installer does not have a serial console
log file configured, making it impossible to diagnose boot failures when
the VM is unreachable via SSH.

Add a serial device with a log file at
/var/log/libvirt/qemu/-serial0.log, which is automatically
collected by the existing CI gather step. Append console=ttyS0 to the
kernel command line so that kernel and userspace output is directed to
the serial console.

Set the serial target type per architecture inside configureDomainArch
(isa-serial for x86_64, system-serial for aarch64, spapr-vio-serial for
ppc64le, sclp-serial for s390x) rather than hardcoding a single type, so
the bootstrap domain is valid on all supported platforms.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/baremetal/bootstrap.go`:
- Around line 258-262: The kernel console argument is hardcoded to
"console=ttyS0"; update the kargs construction in bootstrap.go to choose the
console string based on capabilities.Host.CPU.Arch (used elsewhere in this
file). Map architectures to devices: "x86_64" -> "console=ttyS0", "aarch64" ->
"console=ttyAMA0", "s390x" -> "console=ttysclp0", "ppc64le" -> "console=hvc0",
and fallback to "console=ttyS0" if unknown; append that chosen console to the
existing kargs (and keep the existing fips handling). Ensure you reference
capabilities.Host.CPU.Arch and the kargs variable when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b6f4e595-aa88-4704-959d-b494f79d6f05

📥 Commits

Reviewing files that changed from the base of the PR and between 46881fd and 68d4847.

📒 Files selected for processing (2)
  • pkg/infrastructure/baremetal/bootstrap.go
  • pkg/infrastructure/baremetal/bootstrap_test.go

@elfosardo elfosardo force-pushed the bootstrap-vm-serial-logs branch from 68d4847 to d0e3df7 Compare March 18, 2026 12:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/infrastructure/baremetal/bootstrap.go (1)

259-264: Consolidate arch mappings to avoid future drift.

consoleDevice and serialTargetType duplicate the same architecture keyspace in different functions. A shared config map (console + serial type together) would reduce maintenance risk when adding/changing architectures.

♻️ Suggested refactor sketch
+type archDeviceConfig struct {
+	ConsoleDev string
+	SerialType string
+}
+
+var archDeviceConfigs = map[string]archDeviceConfig{
+	"x86_64":  {ConsoleDev: "ttyS0",    SerialType: "isa-serial"},
+	"aarch64": {ConsoleDev: "ttyAMA0",  SerialType: "system-serial"},
+	"s390x":   {ConsoleDev: "ttysclp0", SerialType: "sclp-serial"},
+	"ppc64le": {ConsoleDev: "hvc0",     SerialType: "spapr-vio-serial"},
+}

Use archDeviceConfigs[arch].ConsoleDev in createLiveVolume and archDeviceConfigs[arch].SerialType in configureDomainArch.

As per coding guidelines, **: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 384-389

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/baremetal/bootstrap.go` around lines 259 - 264, The
arch-specific console and serial mappings are duplicated (consoleDevice and
serialTargetType); create a single shared map named archDeviceConfigs mapping
architecture string to a small struct with fields ConsoleDev and SerialType,
then update createLiveVolume to use archDeviceConfigs[arch].ConsoleDev and
update configureDomainArch to use archDeviceConfigs[arch].SerialType, removing
the duplicated maps and adding safe lookup/default behavior for unknown arch
values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/infrastructure/baremetal/bootstrap.go`:
- Around line 259-264: The arch-specific console and serial mappings are
duplicated (consoleDevice and serialTargetType); create a single shared map
named archDeviceConfigs mapping architecture string to a small struct with
fields ConsoleDev and SerialType, then update createLiveVolume to use
archDeviceConfigs[arch].ConsoleDev and update configureDomainArch to use
archDeviceConfigs[arch].SerialType, removing the duplicated maps and adding safe
lookup/default behavior for unknown arch values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 010bb5d5-49ec-4649-b558-60127db4cda3

📥 Commits

Reviewing files that changed from the base of the PR and between 68d4847 and d0e3df7.

📒 Files selected for processing (2)
  • pkg/infrastructure/baremetal/bootstrap.go
  • pkg/infrastructure/baremetal/bootstrap_test.go

@elfosardo
Copy link
Contributor Author

/retest

3 similar comments
@elfosardo
Copy link
Contributor Author

/retest

@elfosardo
Copy link
Contributor Author

/retest

@elfosardo
Copy link
Contributor Author

/retest

@elfosardo
Copy link
Contributor Author

@zaneb when you have a moment please check this again

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now we're capturing all of the startup messages and the login prompt.
If that's enough then this looks good, but if we want the whole journal then we apparently need to either set systemd.journald.forward_to_console on the kernel command line (although this can apparently cause performance issues, according to the man page for journald.conf) or set up a systemd service that runs journalctl -f and redirects to the serial port.

}
if t, ok := serialTargetType[arch]; ok && len(dom.Devices.Serials) > 0 {
dom.Devices.Serials[0].Target.Type = t
dom.Devices.Serials[0].Target.Model.Name = t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can gather in the libvirt docs you can just leave these blank and it will do the Right Thing for the platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I was being explicit here but we can leave it to automation

@elfosardo
Copy link
Contributor Author

So now we're capturing all of the startup messages and the login prompt. If that's enough then this looks good, but if we want the whole journal then we apparently need to either set systemd.journald.forward_to_console on the kernel command line (although this can apparently cause performance issues, according to the man page for journald.conf) or set up a systemd service that runs journalctl -f and redirects to the serial port.

@zaneb I was thinking that maybe we can consider that for a follow up, what we have now should be good for a first pass
we need to evaluate the two options, it's true that the getting the full journal is quite expensive in terms of performance, but I think adding a journalctl -f service adds more complexity as we also need to touch ignition config (which on the other hand could be a good thing in terms of control)

The bootstrap VM created by the installer does not have a serial console
log file configured, making it impossible to diagnose boot failures when
the VM is unreachable via SSH. Master/worker VMs already have this via
dev-scripts.

Add a serial device with a log file at
/var/log/libvirt/qemu/<name>-serial0.log, which is automatically
collected by the existing CI gather step. The serial target type is left
unset so that libvirt auto-selects the appropriate device for each
architecture. Append a console= kernel argument using the
architecture-appropriate device (ttyS0 for x86_64, ttyAMA0 for aarch64,
ttysclp0 for s390x, hvc0 for ppc64le) so that kernel and userspace
output is directed to the serial console.
@elfosardo elfosardo force-pushed the bootstrap-vm-serial-logs branch from d0e3df7 to a9540c0 Compare March 20, 2026 08:31
@elfosardo elfosardo requested a review from zaneb March 20, 2026 08:31
@zaneb
Copy link
Member

zaneb commented Mar 20, 2026

You can add a separate system ignition file like the one on line 233, so it's not too bad in terms of complexity to add a service. But this looks like a win for now.
/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2026
@elfosardo
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@elfosardo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ovn-two-node-fencing a9540c0 link false /test e2e-metal-ovn-two-node-fencing
ci/prow/e2e-metal-ipi-ovn-ipv6 a9540c0 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-metal-ipi-ovn-dualstack a9540c0 link false /test e2e-metal-ipi-ovn-dualstack

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants