Skip to content

tests/toolbox: adjust to be runnable on RHCOS#4017

Open
Rolv-Apneseth wants to merge 1 commit intocoreos:testing-develfrom
Rolv-Apneseth:toolbox_test
Open

tests/toolbox: adjust to be runnable on RHCOS#4017
Rolv-Apneseth wants to merge 1 commit intocoreos:testing-develfrom
Rolv-Apneseth:toolbox_test

Conversation

@Rolv-Apneseth
Copy link
Member

Closes openshift/os#1035

This will should enable running the basic toolbox test for RHCOS.

To explain a couple of the reasons it would not work in its current state in RHCOS, and the changes this PR makes to get around them:

  1. No machinectl command on RHCOS. I've instead used su - core to run commands as the core user. Also has the minor benefit of propagating error codes, though I opted to keep the explicit checks anyway.
  2. Grep and awk calls were looking only for fedora-toolbox-. I simply expanded the search to also accept rhel-toolbox-.
  3. The rhel-9.8 variant is using this toolbox - this is the one that adds the most complexity to this script. Commands for this one are completely different, the script uses root podman and requires extra configuration. I have a butane config to setup the tool as required, and I've defined and overridden functions to test the same functionality across this script and the modern toolbox binary.

I would love feedback, as I'm not sure all the decisions I made were the best ones. For example, I could see a case for keeping the machinectl path for systems which have it available, as it actually simulates an interactive login. Also, do we even need to test that older toolbox script? Either way, I'm happy to make any changes.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adapts the toolbox test to be runnable on RHCOS by replacing machinectl with su - core, expanding search patterns for toolboxes, and cleverly handling an older toolbox script via function overrides. The changes are well-structured and make the test more robust. I've identified a bug in the toolbox_count function that could cause the script to fail, and I've also suggested a refactoring to improve the clarity of the code that handles the older toolbox version. Overall, this is a great improvement.

@Rolv-Apneseth
Copy link
Member Author

Anybody know why that CI is failing?

./tmt-test-wrapper-inner.sh-script-00-1-default-0-default-0: line 1: tmt: command not found

Also, should these be pointing at a fork? I assume this was for testing / debugging:

git clone --recurse-submodules https://github.com/Roshan-R/fedora-coreos-config testcode
...
git -C testcode fetch https://github.com/Roshan-R/fedora-coreos-config kola-testing-farm:gluetool/kola-testing-farm

@Rolv-Apneseth
Copy link
Member Author

/retest

version: 1.6.0
storage:
files:
# Toolbox configuration for older streams using https://github.com/coreos/toolbox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this exactly?

Copy link
Member Author

@Rolv-Apneseth Rolv-Apneseth Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, wasn't sure how to word it. I suppose it's only RHCOS 9.x and older that use the coreos toolbox

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Toolbox configuration for RHCOS variants < 10.x using https://github.com/coreos/toolbox

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see. So this is wholly for the legacy toolbox?

TBH in that case I would just put it in the test itself for the legacy case:

cat <<EOF > /var/home/core/.toolboxrc
REGISTRY=quay.io
IMAGE=fedora/fedora-toolbox:latest
TOOLBOX_NAME=fedora-toolbox-latest
AUTHFILE=/dev/null
EOF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, RHCOS 10 also use the legacy coreos/toolbox implementation. So it's a RHCOS vs FCOS thing right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was testing it was only 9.8 that had it, not even c9s.

Here's the toolbox package metadata from the release browser for each (not sure where else to get a list of packages):

  • 9.8: [ "toolbox", "0", "0.1.2", "1.rhaos4.22.el9", "noarch" ]
  • 10.0: [ "toolbox", "0", "0.3", "1.el10", "x86_64" ]
  • c9s: [ "toolbox", "0", "0.3", "1.el9", "x86_64" ]
  • c10s: [ "toolbox", "0", "0.3", "1.el10", "x86_64" ]

Comment on lines +14 to +16
- path: /var/home/core/config.json
contents:
inline: "{}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the point of this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will double check but I believe when I was running it, the coreos toolbox would fail if the file didn't exist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it fails without that. Either:

Error: credential file is not accessible: faccessat /var/lib/kubelet/config.json: no such file or directory

Or, if specifying AUTHFILE:

Error: credential file is not accessible: faccessat /var/home/core/config.json: no such file or directory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so you mean coreos/toolbox fails if a credentials file doesn't exist? Does it work if you use AUTHFILE=/dev/null ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not:

Error: unable to copy from source docker://quay.io/fedora/fedora-toolbox:latest: initializing source docker://quay.io/fedora/fedora-toolbox:latest: getting username and password: reading JSON file "/dev/null": unmarshaling JSON at "/dev/null": unexpected end of JSON input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, this whole butane file is only for coreos/toolbox


# IMPORTANT: Commands are run indirectly via `su - core` to re-create the
# user environment needed for unprivileged podman functionality.
run_as_core()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use a similar function to this in a few other tests (search for runascoreuser). might be a good idea to put a function like this in commonlib.sh and then use that everywhere.

migt be good to point AI at this and it could do it in like 10s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note though.. that machinectl shell is a more proper way to get an environment that more closely models what a user session would have. Maybe we should continue to use machinectl shell here.

cc @travier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - hmm. I now see the comment above that mentions RHCOS doesn't have machinectl. I wonder if we should just add it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was a big part of the complication with getting it to run on rhcos. I did consider adding it to the common lib but I wanted to see if we were gonna keep the machinectl approach for systems that support it first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's see what @travier thinks on the machinectl in RHCOS thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we keep using machinectl on FCOS. I think I had tried with su, I don't remember why it did not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question was more: should we add machinectl to RHCOS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so if we don't need it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't need it for something else. I don't remember why it's in FCOS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had confirmed this works on both, which is why I didn't choose to split it here, but I don't mind keeping machinectl for FCOS if we think it's worth doing it the more "proper" way when we can

Fixes openshift/os#1035

RHCOS incompatabilities and adjustments made by this commit:
1. No `machinectl` command on RHCOS, so instead use `su - core` to run
commands as the core user. Also has the minor benefit of propagating
error codes, though I opted to keep the explicit checks anyway.
2. Grep and awk calls were looking only for `fedora-toolbox-`. Simply
expand the search to also accept `rhel-toolbox-`.
3. The rhel-9.8 variant is using [this
toolbox](https://github.com/coreos/toolbox). This is the one that adds
the most complexity to this test, as commands are completely different,
the script uses root podman and requires extra configuration.
Comment on lines +6 to +13
# RHEL registry requires credentials - use quay.io registry and a Fedora image for testing
- path: /var/home/core/.toolboxrc
contents:
inline: |
REGISTRY=quay.io
IMAGE=fedora/fedora-toolbox:latest
TOOLBOX_NAME=fedora-toolbox-latest
AUTHFILE=/var/home/core/config.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that will not test the same thing as the legacy coreos/toolbox behaves differently when there are special labels on the image and I think I remember those being there in the RHEL one but not the upstream containers/toolbox one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining, I don't really understand. Would I need to get it to run the RHEL images here then? How do I get the required authentication?

I assume this is what you're referring to with the labels

@travier
Copy link
Member

travier commented Mar 20, 2026

As this will never be used in FCOS, I think you should keep the current FCOS test as is and duplicate it in RHCOS instead. That way we will not have to undo the change here when we move away from the legacy one in RHCOS and we'll only have to delete it there.

@Rolv-Apneseth
Copy link
Member Author

As this will never be used in FCOS, I think you should keep the current FCOS test as is and duplicate it in RHCOS instead. That way we will not have to undo the change here when we move away from the legacy one in RHCOS and we'll only have to delete it there.

But then any changes/additions will need to be made in 2 places even though they'll eventually both be using containers/toolbox right?

For example, I was thinking we should add at least a line in this test checking toolbox works when $SHELL is unset or not exported, like that issue we had on openshift nodes running rhcos 10.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EL9: Fix toolbox test

3 participants