Skip to content

Conversation

@gabriel-samfira
Copy link

On flatcar, files in /run are recreated after the initrd pivots to the full system and executes init. This change adds a unit file that writes the file post-boot. Files written in /run during ignition run will be clobbered.

Fixes #480

@gabriel-samfira gabriel-samfira force-pushed the fix-flatcar-deployments branch 2 times, most recently from 8abbd06 to 9f6a122 Compare January 16, 2025 14:25
@mnaser
Copy link
Member

mnaser commented Jan 23, 2025

@gabriel-samfira do you happen to know why the flatcar CI fails? :(

@gabriel-samfira
Copy link
Author

@mnaser looking

@gabriel-samfira
Copy link
Author

gabriel-samfira commented Jan 23, 2025

The logs don't give much of a hint as to what the problem might be. I will try to run the tests on a local OpenStack deployment, connect to the VMs and debug.

Might take a few days. Need to finish some documentation.

@gabriel-samfira
Copy link
Author

Hi @mnaser

I tried to run the integration tests using the

./hack/stack.sh
export KUBE_TAG=v1.27.4
./hack/run-integration-tests.sh

But I seem to be running into errors like:

Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager keystoneauth1.exceptions.http.BadRequest: Invalid input for field/attribute trust. Value: {'roles': [{'id': '08cfd96596a445c493bca5af45593ea0', 'name': 'load-balancer_member', 'domain_id': None, 'description': None, 'options': {}, 'links': {'self': 'http://194.104.235.18/identity/v3/roles/08cfd96596a445c493bca5af45593ea0'}}, {'id': 'e463dc2438804143b2f2f44881beb0b5', 'name': 'reader', 'domain_id': None, 'description': None, 'options': {'immutable': True}, 'links': {'self': 'http://194.104.235.18/identity/v3/roles/e463dc2438804143b2f2f44881beb0b5'}}, {'id': '751a70d5b6ba4dd492ef0d2ee5aeb99e', 'name': 'member', 'domain_id': None, 'description': None, 'options': {'immutable': True}, 'links': {'self': 'http://194.104.235.18/identity/v3/roles/751a70d5b6ba4dd492ef0d2ee5aeb99e'}}, {'id': '12ba90b05b224b569812dbf10a2b6f06', 'name': 'anotherrole', 'domain_id': None, 'description': None, 'options': {}, 'links': {'self': 'http://194.104.235.18/identity/v3/roles/12ba90b05b224b569812dbf10a2b6f06'}}], 'delegation_depth': 0, 'id': '2ca7c13e7d7b4f568bd8d81cc096b136', 'trustor_user_id': '8285cc483d2c4899a3aa8e5efa331d88', 'trustee_user_id': '81b0e24a4d82446eb679433b2b004641', 'project_id': '04a0323b42a54819a0ecf51d8a3d8fc2', 'impersonation': True, 'expires_at': None, 'remaining_uses': None, 'deleted_at': None, 'redelegated_trust_id': None, 'redelegation_count': 0, 'roles_links': {'self': 'http://194.104.235.18/identity/v3/2ca7c13e7d7b4f568bd8d81cc096b136/roles', 'next': None, 'previous': None}, 'links': {'self': 'http://194.104.235.18/identity/v3/OS-TRUST/trusts/2ca7c13e7d7b4f568bd8d81cc096b136'}}. Additional properties are not allowed ('delegation_depth' was unexpected) (HTTP 400) (Request-ID: req-760632b6-16f8-4faa-ba32-f0e844f81e3f)
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager 
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager During handling of the above exception, another exception occurred:
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager 
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager Traceback (most recent call last):
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager   File "/opt/stack/magnum/magnum/conductor/handlers/common/trust_manager.py", line 34, in create_trustee_and_trust
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager     trust = osc.keystone().create_trust(
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager   File "/opt/stack/magnum/magnum/common/keystone.py", line 218, in create_trust
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager     raise exception.TrustCreateFailed(
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager magnum.common.exception.TrustCreateFailed: Failed to create trust for trustee 81b0e24a4d82446eb679433b2b004641.
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR magnum.conductor.handlers.common.trust_manager 
Feb 11 11:44:09 test-bm2 magnum-conductor[2282404]: ERROR oslo_messaging.rpc.server [None req-b780c8d2-0194-42a4-82f6-614da8c9689b demo demo] Exception during message handling: magnum.common.exception.TrusteeOrTrustToClusterFailed: Failed to create trustee or trust for Cluster: 8de65e9d-8c3b-436e-841f-761926eaaab5

What version of devstack/openstack are you running in the CI? Using main branches seems to not work even for basic functionality at the moment.

On flatcar, files in /run are recreated after the initrd pivots to the
full system and executes init. This change adds a unit file that writes
the file post-boot. Files written in /run during ignition run will be
clobbered.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@gabriel-samfira gabriel-samfira force-pushed the fix-flatcar-deployments branch 2 times, most recently from 271891b to 83f06a3 Compare February 12, 2025 18:09
Flatcar images seem to be GPT non compliant. See:

https://bugs.launchpad.net/nova/+bug/2091114

for details.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@gabriel-samfira
Copy link
Author

gabriel-samfira commented Feb 12, 2025

Hi @mnaser

There were 2 issues:

  • The flatcar image seems to have an invalid GPT partition table according to Nova. This was a change in nova that validates the image. We can work around it
  • The v1.27.15 flatcar image does not exist, but curl was lacking the --fail flag, so it downloaded a file with the text NoSuckKey and uploaded that to glance.

I think the latest available image for flatcar is v1.27.4. Switching to that made things work in my local env. Could we build a new image for Flatcar?

@gabriel-samfira
Copy link
Author

the rocky tests seem to fail due to the same reason (image URL returns 404)

@mnaser
Copy link
Member

mnaser commented Feb 26, 2025

@gabriel-samfira thanks for all the catches and updates.

#494

I'm working on using more strongly typed things with unit tests for all features, I will try to incorporate your change here:

https://github.com/vexxhost/magnum-cluster-api/blob/ff11ab3475555966d846338f16d9120d2905a84a/src/features/operating_system.rs

More specifically

additional_config: Some(serde_yaml::to_string(&Config {
systemd: Some(Systemd {
units: Some(vec![
Unit {
name: "[email protected]".into(),
enabled: Some(true),
dropins: None,
contents: None,
mask: None,
},
Unit {
name: "kubeadm.service".into(),
enabled: Some(true),
dropins: Some(vec![
Dropin {
name: "10-flatcar.conf".into(),
contents: Some(
indoc!(r#"
[Unit]
Requires=containerd.service coreos-metadata.service
After=containerd.service coreos-metadata.service
[Service]
EnvironmentFile=/run/metadata/flatcar
"#).into(),
),
},
]),
contents: None,
mask: None,
}
]),
}),
..Default::default()
}).unwrap()),

Also, for the flatcar image builds, so we've restored to building images here:

https://github.com/vexxhost/capo-image-elements

Do you know what needs to be done for the Flatcar image in order to be able to be built with diskimage-builder, to avoid the whole dance with packer? Otherwise, if you know the image is being tested/shipped somewhere else like in CAPO, we can use that one too..

@gabriel-samfira
Copy link
Author

Do you know what needs to be done for the Flatcar image in order to be able to be built with diskimage-builder, to avoid the whole dance with packer?

I think the easiest way would be to use the systemd-sysext enablemet in Flatcar to ensure we have the needed binaries. This can be done via ignition on boot or we can bundle the sysext images directly in the OS image.

Otherwise, if you know the image is being tested/shipped somewhere else like in CAPO, we can use that one too.

Will have a look if there are any public images we can consume.

@mnaser
Copy link
Member

mnaser commented Feb 26, 2025

If we do it on boot, that means we are potentially blocked by network issues, so ideally it would be good if we can bundle it, and even better if we can in diskimage-builder..

@mnaser
Copy link
Member

mnaser commented Mar 6, 2025

@gabriel-samfira I plan on integrating this change with the rust-y side of things which will make this a lot cleaner and testable, but was curious if you have some updates on the image side, perhaps sysext is the way to go for now but i'm not sure if that has limitations.

@gabriel-samfira
Copy link
Author

I am testing the sysext approach today. Will post an update as soon as I have one.

@gabriel-samfira
Copy link
Author

I can confirm that using sysext works great. To create an image suitable for CAPO, the steps are as follows:

sudo apt-get install bzip2 docker.io git
git clone https://github.com/flatcar/sysext-bakery.git
cd sysext-bakery
./bake_flatcar_image.sh --vendor openstack --fetch kubernetes:kubernetes-v1.31.4-x86-64.raw

The resulting image can be uploaded to OpenStack and can be used to deploy a fully functional cluster.

The script is actually quite simple. It downloads the sysext, mounts the root partition of the flatcar image, creates a folder for the sysext, copies over the sysext and enables it.

I am not sure if it's worth adding this to diskimage-builder, but if that is desirable, I think it can be done relatively simple.

@mnaser
Copy link
Member

mnaser commented Mar 12, 2025

I think it would be really neat if we could cause I'd like to maintain elements in the capo-image-elements repo to be able to build any sort of base image.

The longer term goal is all those images get updated automatically after they undergo validation through MCAPI then they can be "promoted".

@okozachenko1203
Copy link
Member

@gabriel-samfira could you resolve the conflicts? we reimplement python parts using rust. You can find how configure-kube-proxy.sh is configured in rust implementation https://github.com/vexxhost/magnum-cluster-api/blob/main/src/features/mod.rs#L193

@gabriel-samfira
Copy link
Author

gabriel-samfira commented Aug 18, 2025

Hi @okozachenko1203

It might take a while as I'm currently in crunch mode on a couple of projects and don't have access to a test deployment. The fix is fairly straight forward. We need a systemd unit file to create /run/kubeadm/configure-kube-proxy.sh (and any other files that are stored in /run and needed) as Flatcar uses ignition to configure the system and ignition runs in initrd before the system is fully booted. As a result, the /run partition is overwritten (well, technically it's a tmpfs that gets remounted, so it's wiped) before init is executed inside the final system. So anything inside /run just disappears.

The simpler (and cleaner - I think) solution would be to just place that script somewhere else (like somewhere in /opt/ perhaps?). What do you think?

@okozachenko1203
Copy link
Member

Hi @okozachenko1203

It might take a while as I'm currently in crunch mode on a couple of projects and don't have access to a test deployment. The fix is fairly straight forward. We need a systemd unit file to create /run/kubeadm/configure-kube-proxy.sh (and any other files that are stored in /run and needed) as Flatcar uses ignition to configure the system and ignition runs in initrd before the system is fully booted. As a result, the /run partition is overwritten (well, technically it's a tmpfs that gets remounted, so it's wiped) before init is executed inside the final system. So anything inside /run just disappears.

The simpler (and cleaner - I think) solution would be to just place that script somewhere else (like somewhere in /opt/ perhaps?). What do you think?

that sounds reasonable
i agree with you to keep in /opt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flatcar fails to start due to missing /run/kubeadm/configure-kube-proxy.sh

3 participants