-
Notifications
You must be signed in to change notification settings - Fork 425
daemon: Dump systemctl status rpm-ostreed on load failure #2642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Dump systemctl status rpm-ostreed on load failure #2642
Conversation
Prep for a future patch; `os` conflicts with the standard Go package.
This is yet another patch which should help debugging https://bugzilla.redhat.com/show_bug.cgi?id=1958812 like situations in the future. I also have coreos/rpm-ostree#2932 cooking but it's going to take a while to cycle down.
osImageURL, osVersion, err = nodeUpdaterClient.GetBootedOSImageURL() | ||
if err != nil { | ||
// If this fails for some reason, let's dump the unit status | ||
// into our logs to aid future debugging. | ||
cmd := exec.Command("systemctl", "status", "rpm-ostreed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does printing status will capture enough error logs or should we print here rpm-ostreed journal log from current boot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status will capture the last few lines of logs at the point in time this happens, which I think will be enough.
openshift/must-gather#244 will help for the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-agnostic-upgrade |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is a hackier an alternative to coreos#2932 that we can ship immediately because we won't block on SELinux policy. For historical reasons, the daemon ends up doing a lot of initialization before even claiming the DBus name. For example, it calls `ostree_sysroot_load()`. We also end up scanning the RPM database, and actually parse all the GPG keys in `/etc/pki/rpm-gpg` etc. This causes two related problems: - By doing all this work before claiming the bus name, we race against the (pretty low) DBus service timeout of 25s. - If something hard fails at initialization, the client can't easily see the error because it just appears in the systemd journal. The client will just see a service timeout. By explicitly using `systemctl start rpm-ostreed.service`, systemd does all of the error checking for us without involving `dbus-broker` as a middleman. Further, by using `systemctl status rpm-ostreed.service` on failure, we reuse systemd's nice rendering of the status of the unit instead of reinventing our own. This PR effectively replicates openshift/machine-config-operator#2642 in our code.
just to xref, coreos/rpm-ostree#2945 is driving this down into rpm-ostree, so eventually we'll want to remove this code. But right now, shipping this PR involved literally nothing more than me submitting it and having it merge, but changing rpm-ostree in RHEL will take much longer and involve a lot more exciting paperwork. |
This is a hackier an alternative to coreos#2932 that we can ship immediately because we won't block on SELinux policy. For historical reasons, the daemon ends up doing a lot of initialization before even claiming the DBus name. For example, it calls `ostree_sysroot_load()`. We also end up scanning the RPM database, and actually parse all the GPG keys in `/etc/pki/rpm-gpg` etc. This causes two related problems: - By doing all this work before claiming the bus name, we race against the (pretty low) DBus service timeout of 25s. - If something hard fails at initialization, the client can't easily see the error because it just appears in the systemd journal. The client will just see a service timeout. By explicitly using `systemctl start rpm-ostreed.service`, systemd does all of the error checking for us without involving `dbus-broker` as a middleman. Further, by using `systemctl status rpm-ostreed.service` on failure, we reuse systemd's nice rendering of the status of the unit instead of reinventing our own. This PR effectively replicates openshift/machine-config-operator#2642 in our code.
This is a hackier an alternative to #2932 that we can ship immediately because we won't block on SELinux policy. For historical reasons, the daemon ends up doing a lot of initialization before even claiming the DBus name. For example, it calls `ostree_sysroot_load()`. We also end up scanning the RPM database, and actually parse all the GPG keys in `/etc/pki/rpm-gpg` etc. This causes two related problems: - By doing all this work before claiming the bus name, we race against the (pretty low) DBus service timeout of 25s. - If something hard fails at initialization, the client can't easily see the error because it just appears in the systemd journal. The client will just see a service timeout. By explicitly using `systemctl start rpm-ostreed.service`, systemd does all of the error checking for us without involving `dbus-broker` as a middleman. Further, by using `systemctl status rpm-ostreed.service` on failure, we reuse systemd's nice rendering of the status of the unit instead of reinventing our own. This PR effectively replicates openshift/machine-config-operator#2642 in our code.
daemon: Rename
os
variable tohostos
Prep for a future patch;
os
conflicts with the standard Go package.daemon: Dump systemctl status rpm-ostreed on load failure
This is yet another patch which should help debugging
https://bugzilla.redhat.com/show_bug.cgi?id=1958812
like situations in the future.
I also have coreos/rpm-ostree#2932
cooking but it's going to take a while to cycle down.