Skip to content

Conversation

@MggMuggins
Copy link
Contributor

@MggMuggins MggMuggins commented Mar 21, 2025

Description

Fixes LP#2083029

When using systemd-networkd as a backend, netplan try backs up /run/systemd/network before applying the new configuration. If the user fails to confirm the new configuration before the timeout is up, Netplan restores the old configuration by moving the backup into /run/systemd/network and invokes networkctl reload to apply the known-good configuration.

However, networkd will not/cannot read config files if their ownership is not root:systemd-network.

In effect, netplan try does not revert network changes at all when using networkd.

Additionally, following the Ubuntu patches to NetworkManager in 23.10 (see netplan everywhere), netplan generate is called (via DBus) when NetworkManager refreshes its config. This "reverts the revert" from netplan try, so restoring the old config doesn't work with NetworkManager in Ubuntu >23.10 either.

This PR adds an autopkg test for netplan try so as to catch these regressions before they are released.

Implementation Notes

I took advantage of the stamp that exists when netplan try is running. There doesn't appear to me to be a way to prevent NetworkManager from calling generate on reload.

I considered that this may be prone to races if NetworkManager takes its time reloading the configuration; however, it looks like netplan try calls command_apply with sync=True, so it shouldn't delete the stamp until after NetworkManager has finished loading its config.

Test Plan

Using the default netplan configuration in a LXD instance:

network:
    version: 2
    ethernets:
        enp5s0:
            dhcp4: true

Replace the configuration with a static IP and run netplan try --timeout 3, and allow the timeout to elapse.

network:
    version: 2
    ethernets:
        enp5s0:
            dhcp4: false
            addresses:
              - 10.58.215.80/24

Before the fix in Jammy, the static IP remains assigned to enp5s0 after the revert. In Noble and later, the interface has the IPv4 removed, but a new dhcp lease is not acquired.

networkctl status --all will report the device as unmanaged. In Jammy:

● 2: enp5s0
                     Link File: /usr/lib/systemd/network/99-default.link
                  Network File: /run/systemd/network/10-netplan-enp5s0.network
                          Type: ether
                         State: routable (unmanaged)
                  Online state: online

Checklist

  • Runs make check successfully.
  • Retains code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad. LP#2083029

Test failures look like Launchpad having trouble.

@MggMuggins
Copy link
Contributor Author

I opened this PR prematurely; the current changes do not fix the bug. I would mark this as a draft but I don't have permission to do so. Still working on this...

@slyon slyon added community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments. Canonical by Canonical employees outside the Netplan team labels Mar 24, 2025
@MggMuggins MggMuggins force-pushed the fix-try branch 2 times, most recently from 36a7101 to 77f321f Compare March 25, 2025 17:31
@MggMuggins
Copy link
Contributor Author

MggMuggins commented Mar 25, 2025

This is ready for review.

Test failures:

  • HTTP request failures (look transient)
  • Debian testing autopkgtes

The integration tests pass when run manually in a VM/container (Noble, Oracular, Trixie).

I was able to reproduce the Debian testing failure using autopkgtest locally. I ran the Ubuntu autopkgtests locally; they fail too but with an error message:

test_netplan_try (__main__.TestNetworkManager.test_netplan_try) ... eth43 eth43 Error: NetworkManager is not running.

I'm willing to continue chasing these but I'd appreciate a review of this PR to make sure that this is a reasonable set of fixes before I sink more time into the CI. Thanks!

@MggMuggins MggMuggins changed the title Fix netplan try revert when using systemd-networkd Fix netplan try revert Mar 25, 2025
@slyon
Copy link
Collaborator

slyon commented Mar 31, 2025

@MggMuggins Thank you very much for your work on this. I retriggered some of the test cases that failed for odd reasons. The scenarios autopkgtests are now failing for Ubuntu & Debian. This seems related to the netplan try command. Can you please investigate what's happening there?

@MggMuggins
Copy link
Contributor Author

Yeah, this is definitely a bug in my tests; fixing these is my priority for this morning. Thanks for your patience.

@MggMuggins
Copy link
Contributor Author

Tests are green.

Also added a commit to fix debug logging of the merged config.

@slyon slyon changed the title Fix netplan try revert Fix netplan try revert (LP: #2083029) Apr 10, 2025
@slyon slyon self-requested a review April 10, 2025 13:32
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much this contribution! This is an excellent analysis and test coverage. Much appreciated! It solves an important issue with netplan try.

I left some minor comments inline. Let me know what you think.

I think this might also solve https://bugs.launchpad.net/netplan/+bug/2027583

@MggMuggins MggMuggins force-pushed the fix-try branch 2 times, most recently from 194ca7b to 76ae3f6 Compare April 11, 2025 21:43
Signed-off-by: Wesley Hershberger <[email protected]>
systemd-networkd will not read .netdev and .network files unless the
group is set to 'systemd-network'.

Fixes LP#2083029

Signed-off-by: Wesley Hershberger <[email protected]>
Generate may be called by NetworkManager (via Dbus) in Ubuntu >23.10.

While `netplan try` is waiting for revert, there is no situation where
the generator should be called.

Signed-off-by: Wesley Hershberger <[email protected]>
Read from the beginning of the file instead of the end.

Signed-off-by: Wesley Hershberger <[email protected]>
@MggMuggins
Copy link
Contributor Author

Thanks for the review! Test failure should clear on a re-run.

Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for addressing my remarks. Test are now all green. I just pushed a little rename, as suggested by @MggMuggins (b87c3a5).

LGTM!

@slyon slyon merged commit 84302d7 into canonical:main Apr 14, 2025
16 checks passed
@MggMuggins MggMuggins deleted the fix-try branch April 14, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Canonical by Canonical employees outside the Netplan team community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants