Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvforwarder as a systemd service #1003

Open
wants to merge 1 commit into
base: release-4.17
Choose a base branch
from

Conversation

vyasgun
Copy link

@vyasgun vyasgun commented Jan 21, 2025

  • Create a tap device using nmcli with a hardcoded mac address
  • Start gvforwarder systemd service which will use this device

Based on the following code:
#673
cfergeau@03a4054

Copy link

openshift-ci bot commented Jan 21, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Jan 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gbraad for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vyasgun
Copy link
Author

vyasgun commented Jan 21, 2025

/test all

@cfergeau
Copy link
Contributor

I'd recommend also picking up the changes from https://github.com/cfergeau/snc/commits/gvisor_service/ which update the unit files used in the PR to use a unit file close to https://github.com/containers/gvisor-tap-vsock/tree/main/contrib/systemd

With the current code, I still have this question/concern #673 (comment)

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from fb9c40e to 6278601 Compare January 21, 2025 09:54
@vyasgun
Copy link
Author

vyasgun commented Jan 22, 2025

/retest

1 similar comment
@vyasgun
Copy link
Author

vyasgun commented Jan 23, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 04595ac to 24924c0 Compare January 23, 2025 06:21
@vyasgun
Copy link
Author

vyasgun commented Jan 23, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 24924c0 to 327901a Compare January 24, 2025 06:40
@vyasgun
Copy link
Author

vyasgun commented Jan 24, 2025

/retest

1 similar comment
@vyasgun
Copy link
Author

vyasgun commented Jan 24, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 327901a to 5264a19 Compare January 27, 2025 11:23
@vyasgun
Copy link
Author

vyasgun commented Jan 27, 2025

/retest

2 similar comments
@vyasgun
Copy link
Author

vyasgun commented Jan 28, 2025

/retest

@vyasgun
Copy link
Author

vyasgun commented Feb 3, 2025

/retest

@vyasgun
Copy link
Author

vyasgun commented Feb 6, 2025

/retest

2 similar comments
@vyasgun
Copy link
Author

vyasgun commented Feb 6, 2025

/retest

@vyasgun
Copy link
Author

vyasgun commented Feb 6, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from ccc602e to 4e0a92e Compare February 7, 2025 08:47
@vyasgun
Copy link
Author

vyasgun commented Feb 7, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 4e0a92e to a999631 Compare February 7, 2025 13:15
@vyasgun
Copy link
Author

vyasgun commented Feb 7, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from a999631 to d1501b4 Compare February 10, 2025 10:26
@vyasgun
Copy link
Author

vyasgun commented Feb 10, 2025

/retest

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch 2 times, most recently from b83b47e to 53cf03b Compare February 10, 2025 13:10
@vyasgun vyasgun changed the title [WIP] [Spike] gvforwarder as a systemd service gvforwarder as a systemd service Feb 10, 2025
@vyasgun vyasgun marked this pull request as ready for review February 10, 2025 13:17
@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 53cf03b to 6cf746e Compare February 10, 2025 13:41
Copy link
Contributor

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot for putting this into shape/testing it.

@vyasgun
Copy link
Author

vyasgun commented Feb 11, 2025

/test e2e-snc

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from 6cf746e to abece15 Compare February 11, 2025 03:37
- Create a tap device using nmcli with a hardcoded mac address
- Start gvforwarder systemd service which will use this device

Signed-off-by: vyasgun <[email protected]>
@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from abece15 to cf5affc Compare February 11, 2025 03:38
# when tap device is in use.
${SSH} core@${VM_IP} 'sudo bash -x -s' <<EOF
nmcli connection add type tun ifname tap0 con-name tap0 mode tap autoconnect yes 802-3-ethernet.cloned-mac-address 5A:94:EF:E4:0C:EE
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to ${SSH} core@${VM_IP} 'sudo nmcli connection add type tun ifname tap0 con-name tap0 mode tap autoconnect yes 802-3-ethernet.cloned-mac-address 5A:94:EF:E4:0C:EE'?

Copy link
Collaborator

@gbraad gbraad Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short; you are questioning why this needs to be wrapped in a sudo bash -x -s.
Would otherwise an error occur? I do not see characters that would be wrongly interpreted by the host shell (like zsh could do).

@anjannath How was this solved for the self-sufficient bundle?

Copy link
Member

@anjannath anjannath Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not being changed for the self-sufficient bundle, its been tested with the existing situation which is that there is a container image which runs the gvforwarder and that container also has a dhcp client script which configures the interface using the dhcp service from gvisor-tap-vsock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can also scp the NetworkManger config file to /etc/NetworkManager/system-connections instead of running nmcli commands, there's a config file in: https://github.com/containers/gvisor-tap-vsock/blob/main/contrib/networkmanager/vsock0.nmconnection

[connection]
id=tap0
type=tun
autoconnect=true
interface-name=tap0

[tun]
mode=2

[802-3-ethernet]
cloned-mac-address=5A:94:EF:E4:0C:EE

[ipv4]
method=auto

[proxy]

@cfergeau
Copy link
Contributor

@praveenkumar can you take a look at this PR? you also looked into this in the past.

Copy link
Collaborator

@gbraad gbraad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would approve, but I question the use of the command... first why invoked like this... though, even ... why is the creation of the tap not part of a systemd unit by itself? As in that case you can depend on it...

systemctl daemon-reload
systemctl enable gvisor-tap-vsock.service
systemctl enable [email protected]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I like how this can be targeted with %i, but this 'depends' on actions performed previously by creating this device.

For this increment, this would work. But most likely would change with the self-sufficient bundle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nmcli command creates a Network Manager configuration file in the bundle, the self-sufficient bundle should be similar from that perspective?

# when tap device is in use.
${SSH} core@${VM_IP} 'sudo bash -x -s' <<EOF
nmcli connection add type tun ifname tap0 con-name tap0 mode tap autoconnect yes 802-3-ethernet.cloned-mac-address 5A:94:EF:E4:0C:EE
EOF
Copy link
Collaborator

@gbraad gbraad Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short; you are questioning why this needs to be wrapped in a sudo bash -x -s.
Would otherwise an error occur? I do not see characters that would be wrongly interpreted by the host shell (like zsh could do).

@anjannath How was this solved for the self-sufficient bundle?

@cfergeau
Copy link
Contributor

why is the creation of the tap not part of a systemd unit by itself? As in that case you can depend on it...

In a way the creation of the tap is part of a systemd unit, it's added to NetworkManager configuration files, which is started through systemd.
The gv-user-network unit has dependencies on the network device:

After=NetworkManager.service
BindsTo=sys-devices-virtual-net-%i.device
After=sys-devices-virtual-net-%i.device

tee /etc/systemd/system/[email protected] <<TEE
[Unit]
Description=gvisor-tap-vsock Network Traffic Forwarder
After=NetworkManager.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run this after NetworkManager-wait-online.service beacuse as per man NetworkManager-wait-online.service this make sure that delays reaching the network-online target until NetworkManager reports that the startup is completed on the D-Bus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure NetworkManager-wait-online.service can complete before gv-user-network service is started, so I'm not sure it would work to order them the opposite way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see this gv-user-network service is running the forwarder command for an existing interface and NetworkManager-wait-online service make sure that interface exist because it waits until all the network profile is enabled. Or my understanding is wrong here?

Copy link
Contributor

@cfergeau cfergeau Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gv-user-network unit file has:

BindsTo=sys-devices-virtual-net-%i.device
After=sys-devices-virtual-net-%i.device

This makes sure the interface exists before the gv-user-network unit tries to start.

man NetworkManager-wait-online.service says:

  • Startup is not complete as long as NetworkManager profiles are in an activating state.
  • When a device reaches the activate state depends on its configuration. For example, with a profile that has both IPv4 and IPv6 enabled, by default, NetworkManager considers the device as fully activated already when only one of the address families is ready.
    [...]

From this, it is not clear when a tun interface reaches the activate state. If NM waits until it gets an IP for example, this means gvforwarder must be running before the interface "activates", in which case it's problematic to order it after NetworkManager-wait-online.service. Maybe a tun interface is activated before getting an IP, in which case it would be less problematic, but I don't know from just reading the man page.

However, ordering the unit after NetworkManager and after the tun device is available seems enough to me, I don't think ordering it after NetworkManager-wait-online.service brings us anything useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear when a tun interface reaches the activate state.

Yes and looking at the CI failure I think it hit the network failure so let's stick to NetworkManager.service only instead NetworkManager-wait-online.service

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and we should keep the After value as it is

@vyasgun vyasgun force-pushed the spike/gvforwarder-service branch from f4bab4e to cf5affc Compare February 26, 2025 10:17
Copy link

openshift-ci bot commented Feb 26, 2025

@vyasgun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-snc cf5affc link true /test e2e-snc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review
Development

Successfully merging this pull request may close these issues.

5 participants