Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay.d/15fcos: add a migration script to move to OCI images #3355

Open
wants to merge 1 commit into
base: testing-devel
Choose a base branch
from

Conversation

jbtrystram
Copy link
Contributor

@jbtrystram jbtrystram commented Feb 7, 2025

to simplify testing for coreos/fedora-coreos-tracker#1823
ship a script to fake the ostree origin to appear like it's on an
OCI deployement.

Just ship the migration script for now, without the systemd unit, to allow testing.

@jbtrystram jbtrystram changed the title [testing] deploy container images by default overlay.d/15fcos: add a migration script to move to OCI images Feb 7, 2025
@jbtrystram
Copy link
Contributor Author

There is at least one missing bit here : I found out that when faking the origin file, rpm-ostree --status doesn't pick up the change immediately.
Something like rpm-ostree useoverlay does it, but obviously it's not great. I tried to restart the service rpm-ostreed, found a refresh subcommand, but no luck.

@dustymabe
Copy link
Member

dustymabe commented Feb 27, 2025

One thing I notice that isn't great is it still says CustomOrigin even after the update to the container from the registry:

[core@cosa-devsh ~]$ rpm-ostree status 
State: idle
Deployments:
● quay.io/fedora/fedora-coreos@sha256:40c0e515ed93cb8a9d5a773fb5a206b13e2edda02d92c6011a84399ceb31ed03
             CustomOrigin: Fedora CoreOS testing stream
                  Version: 41.20250215.1.0 (2025-02-17T11:44:02Z)

  ostree-remote-image:fedora:registry:quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918
             CustomOrigin: Fedora CoreOS testing stream
                  Version: 41.20250130.1.0 (2025-01-31T20:20:28Z)

is that expected?

@dustymabe
Copy link
Member

Also on the system I see a bunch of these messages. Not sure if we can do anything about them:

Feb 27 22:08:58 cosa-devsh rpm-ostree[3591]: failed to query container image base metadata: Reading manifest data from commit: Missing ostree.manifest-digest metadata on merge commit

@jbtrystram
Copy link
Contributor Author

Also on the system I see a bunch of these messages. Not sure if we can do anything about them:

Feb 27 22:08:58 cosa-devsh rpm-ostree[3591]: failed to query container image base metadata: Reading manifest data from commit: Missing ostree.manifest-digest metadata on merge commit

I guess that's a byproduct of faking out the origin file

@jbtrystram
Copy link
Contributor Author

One thing I notice that isn't great is it still says CustomOrigin even after the update to the container from the registry:
...
is that expected?

Yes. Because otherwise we cannot communicate which stream the node is following, as the digest does not show that information.

@jbtrystram jbtrystram force-pushed the oci-migration-script branch from f330048 to 526b944 Compare March 3, 2025 09:46
@jbtrystram
Copy link
Contributor Author

jbtrystram commented Mar 3, 2025

@dustymabe @jlebon I tweaked this script adding some logging, handling failures and allowing to override the cincinnati address for proxied environments.

I also stop rpm-ostree and zincati while overwriting the origin file to avoid a race or conflict.

@jbtrystram jbtrystram force-pushed the oci-migration-script branch from 526b944 to 9e5f50e Compare March 3, 2025 09:53
@jbtrystram
Copy link
Contributor Author

jbtrystram commented Mar 3, 2025

I just found out ostree admin set-origin. manpage
So i experimented with:

ostree admin set-origin fedora-coreos ostree-remote-image:fedora:registry:$imgref --index 0 -s custom-url=$imgref  -s custom-description="Fedora CoreOS testing stream"

Instead of rewriting the origin file manually.

The command does not errors out but I don't see any changes ?

@dustymabe
Copy link
Member

One thing I notice that isn't great is it still says CustomOrigin even after the update to the container from the registry:
...
is that expected?

Yes. Because otherwise we cannot communicate which stream the node is following, as the digest does not show that information.

Hmm. So basically we are changing the origin file forever here and what's reported isn't necessarily what's true any more. i.e. if someone runs this script today it will end up with:

[origin]
container-image-reference=ostree-remote-image:fedora:registry:quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918
custom-url=quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918
custom-description=Fedora CoreOS testing stream

i.e. custom-description=Fedora CoreOS testing stream isn't true anymore if the user rebases to stable or next, right?

Maybe we need to find some way to sanitize the origin file once we're done with all of this?

@dustymabe
Copy link
Member

Talked a little more with @jlebon today. We don't feel we need to block the releases tomorrow on getting this in because people can just download/run the script if they want. At the same time if we feel this is ready we also won't block it from going into the release either since it would only get run if someone knew about it and executed it.

@jbtrystram
Copy link
Contributor Author

Hmm. So basically we are changing the origin file forever here and what's reported isn't necessarily what's true any more. i.e. if someone runs this script today it will end up with:

[origin]
container-image-reference=ostree-remote-image:fedora:registry:quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918
custom-url=quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918
custom-description=Fedora CoreOS testing stream

i.e. custom-description=Fedora CoreOS testing stream isn't true anymore if the user rebases to stable or next, right?

Maybe we need to find some way to sanitize the origin file once we're done with all of this?

I updated the script to insert $stream instead of the hardcoded "testing" in the custom description, it was a leftover, thanks for catching that.

However i am not sure about "changing the origin file forever". Wouldn't a manual rebase create a new origin file with the deployement ? I don't think rpm-ostree will forward custom origin fields to a new origin.
I am going to test that

@jbtrystram
Copy link
Contributor Author

jbtrystram commented Mar 12, 2025

With coreos/zincati#1273 changes , dropping the following in run/zincati/booted-status-override.json triggers the migration :

             {
                "booted": true,
                "container-image-reference": "ostree-remote-image:fedora:registry:quay.io/fedora/fedora-coreos:$STREAM",
                "container-image-reference-digest" : "sha256:$DIGEST",
                "base-commit-meta": {
                    "fedora-coreos.stream": "$STREAM"
                },
                "checksum": "oci-rebase",
                "version": "$VERSION"
            }

Without having to change the origin file so the rpm-ostree status will stay true.

[root@cosa-devsh core]# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; update staged: 41.20250302.2.0; reboot delayed due to active user sessions
Deployments:
  ostree-remote-image:fedora:docker://quay.io/fedora/fedora-coreos:testing
                   Digest: sha256:7bb94d1516fc6e368eb9cf07a4fe23ca7378e2e1ddb76c6a1902f75cb5c9203a
                  Version: 41.20250302.2.0 (2025-03-04T16:52:41Z)
                     Diff: 147 upgraded

● fedora:fedora/x86_64/coreos/testing
                  Version: 41.20241215.2.0 (2024-12-17T00:07:38Z)
                   Commit: 0e93f07d8d11856eb0773bde01757337320b26eb823cb5faff2c89d2849edb0a
             GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1

@jbtrystram jbtrystram force-pushed the oci-migration-script branch from fda85ac to 36c64e4 Compare March 12, 2025 14:41
@jbtrystram
Copy link
Contributor Author

Ok updated the script to do that and tested.
It's way less invasive now.

Also on the system I see a bunch of these messages. Not sure if we can do anything about them:

Feb 27 22:08:58 cosa-devsh rpm-ostree[3591]: failed to query container image base metadata: Reading manifest data from commit: Missing ostree.manifest-digest metadata on merge commit

This does not happens anymore.

@jbtrystram jbtrystram force-pushed the oci-migration-script branch 2 times, most recently from 4c98f34 to f73de15 Compare March 13, 2025 11:01
@dustymabe
Copy link
Member

ok with the latest updates from the coreos/zincati#1273 I'm seeing this error when running the migration script as part of an ExecStartPre=

Mar 13 13:45:39 cosa-devsh systemd[1]: zincati.service: start operation timed out. Terminating.
Mar 13 13:45:39 cosa-devsh systemd[1]: zincati.service: Failed with result 'timeout'.
Mar 13 13:45:39 cosa-devsh systemd[1]: Failed to start zincati.service - Zincati Update Agent.
Mar 13 13:45:49 cosa-devsh systemd[1]: zincati.service: Scheduled restart job, restart counter is at 1.
Mar 13 13:45:49 cosa-devsh systemd[1]: Starting zincati.service - Zincati Update Agent...
Mar 13 13:45:49 cosa-devsh coreos-oci-rebase[4105]: Saving rpm-ostree status.
Mar 13 13:45:49 cosa-devsh coreos-oci-rebase[4105]: Fetching cincinnati update graph for stream next on x86_64
Mar 13 13:45:50 cosa-devsh coreos-oci-rebase[4105]: Found OCI image quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918 in the update graph that matches the local deployment version.
Mar 13 13:45:50 cosa-devsh coreos-oci-rebase[4105]: Writing a status override file for the booted deployment in /run/zincati/booted-status-override.json
Mar 13 13:45:50 cosa-devsh coreos-oci-rebase[4105]: Zincati will rebase to an OCI image for the next update.
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::cli::agent] starting update agent (zincati 0.0.29)
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::cincinnati] Cincinnati service: https://updates.coreos.fedoraproject.org
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::cli::agent] agent running on node 'c26ec99392314d71a26afd4a59238a18', in update group 'default'
Mar 13 13:45:50 cosa-devsh zincati[4136]: [DEBUG zincati::metrics] started metrics service on Unix-domain socket '/run/zincati/public/metrics.promsock'
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::update_agent::actor] registering as the update driver for rpm-ostree
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Loaded sysroot 
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Locked sysroot 
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Initiated txn Deploy for client(dbus:1.87 unit:zincati.service uid:981): /org/projectatomic/rpmostree1/fedora_coreos
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Process [pid: 4156 uid: 981 unit: zincati.service] connected to transaction progress
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Txn Deploy on /org/projectatomic/rpmostree1/fedora_coreos successful
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Unlocked sysroot
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Process [pid: 4156 uid: 981 unit: zincati.service] disconnected from transaction progress
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: In idle state; will auto-exit in 61 seconds
Mar 13 13:45:50 cosa-devsh zincati[4136]: [DEBUG zincati::update_agent::actor] no other local finalized deployments found; no update targets will be excluded.
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::update_agent::actor] initialization complete, auto-updates logic enabled
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::strategy] update strategy: immediate
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::update_agent::actor] reached steady state, periodically polling for updates
Mar 13 13:45:50 cosa-devsh systemd[1]: Started zincati.service - Zincati Update Agent.
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::cincinnati] current release detected as not a dead-end
Mar 13 13:45:50 cosa-devsh zincati[4136]: [DEBUG zincati::cincinnati] MOTD updated with no dead-end state
Mar 13 13:45:50 cosa-devsh zincati[4136]: [INFO  zincati::update_agent::actor] target release '41.20250302.1.0' selected, proceeding to stage it
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Loaded sysroot 
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Locked sysroot 
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Initiated txn Rebase for client(dbus:1.95 unit:zincati.service uid:981): /org/projectatomic/rpmostree1/fedora_coreos
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Process [pid: 4203 uid: 981 unit: zincati.service] connected to transaction progress
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Txn Rebase on /org/projectatomic/rpmostree1/fedora_coreos failed: Old and new refs are equal: ostree-remote-image:fedora:docker://quay.io/fedora/fedora-coreos:next
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Unlocked sysroot
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: Process [pid: 4203 uid: 981 unit: zincati.service] disconnected from transaction progress
Mar 13 13:45:50 cosa-devsh rpm-ostree[3183]: In idle state; will auto-exit in 61 seconds
Mar 13 13:45:50 cosa-devsh zincati[4136]: [ERROR zincati::update_agent::actor] failed to stage deployment: rpm-ostree deploy failed:
Mar 13 13:45:50 cosa-devsh zincati[4136]:     error: Old and new refs are equal: ostree-remote-image:fedora:docker://quay.io/fedora/fedora-coreos:next
Mar 13 13:45:50 cosa-devsh zincati[4136]:     
Mar 13 13:46:52 cosa-devsh systemd[1]: rpm-ostreed.service: Deactivated successfully.
Mar 13 13:46:52 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 22.362s CPU time, 452.1M memory peak.

This is the file that got created:

[core@cosa-devsh ~]$ cat /run/zincati/booted-status-override.json 
{
    "booted": true,
    "container-image-reference": "ostree-remote-image:fedora:registry:quay.io/fedora/fedora-coreos@sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918",
    "container-image-reference-digest" : "sha256:fcd6c0e85b1f80ba23b01d280db9f3e273ba9e4bfde9d00820d5141404ae0918",
    "base-commit-meta": {
        "fedora-coreos.stream": "next"
    },
    "checksum": "c1fed7e59bad26ced87cfad72681259a886579c35e41de8ad6ebfeb79c49297c",
    "version": "41.20250130.1.0"
}

To simplify testing for coreos/fedora-coreos-tracker#1823
This script write a Zincati status override containing a fake rpm-ostree
status output to appear like it's on an OCI deployement.

Zincati will look under the well-known path of /run/zincati/booted-status-override.json.
The content will trigger the OCI code path in Zincati.

This will later run as ExecStartPre in the zincati.service environment

Just ship the migration script for now, without the zincati service
changes, to allow testing.

See coreos/fedora-coreos-tracker#1823 (comment)
Requires coreos/zincati#1273
@jbtrystram jbtrystram force-pushed the oci-migration-script branch from f73de15 to c75c447 Compare March 13, 2025 14:35
@jbtrystram
Copy link
Contributor Author

ok with the latest updates from the coreos/zincati#1273 I'm seeing this error when running the migration script as part of an ExecStartPre=
...

Just for reference this was fixed with coreos/zincati@2ff39a6

@dustymabe
Copy link
Member

Let's reference the (newly created) coreos/fedora-coreos-tracker#1890 issue in commit messages for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants