Skip to content

Commit 674fd6c

Browse files
committed
confd: allow both ends of a veth pair to be assigned to containers
Previously at least one end of a veth pair had to remain in the host namespace, because that end created and destroyed the pair. Assigning both ends to containers left no one to create it. Select a deterministic primary end so exactly one side creates the pair. When the primary is itself a container interface, create the pair in the host namespace before the container starts; CNI host-device then moves each end into its container. Teardown is deferred to the container removal script so the pair does not linger and block re-creation. Drop the now-obsolete limitation notes from the documentation and YANG, and add a regression test connecting two containers over a veth pair. Fixes: #941 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
1 parent 0229057 commit 674fd6c

14 files changed

Lines changed: 240 additions & 27 deletions

File tree

doc/ChangeLog.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ All notable changes to the project are documented in this file.
3030

3131
### Fixes
3232

33+
- Fix #941: a VETH pair can now connect two containers directly, with both
34+
ends assigned to containers.
3335
- Enabling IP masquerading in the firewall no longer enables IP forwarding on
3436
all interfaces. This has been an issue ever since the firewall support was
3537
introduced in v25.10.0

doc/container.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -668,11 +668,9 @@ set:
668668

669669
For an example of both, see the next section.
670670

671-
> [!IMPORTANT]
672-
> **VETH Pair Limitation:** When using VETH pairs with containers, at least
673-
> one side of the pair must remain in the host namespace. It is currently
674-
> not possible to create VETH pairs where both ends are assigned to different
675-
> containers. One end must always be accessible from the host.
671+
> [!TIP]
672+
> Both ends of a VETH pair may be assigned to containers, connecting two
673+
> containers directly without involving the host namespace.
676674
677675
[^3]: Something which the container bridge network type does behind the
678676
scenes with one end of an automatically created VETH pair.

src/confd/src/cni.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,16 @@ int cni_netdag_gen_iface(struct dagger *net, const char *ifname,
414414
return -EIO;
415415

416416
fprintf(fp, "container -a -f delete network %s >/dev/null\n", ifname);
417+
418+
/* If this end belongs to a veth pair, the kernel keeps the pair
419+
* alive after CNI host-device returns the interface to the host
420+
* namespace. Remove it here, once the container is gone, so the
421+
* pair does not linger and block a later re-creation. Tolerant:
422+
* the peer's teardown may already have removed it.
423+
*/
424+
if (lydx_get_child(dif, "veth"))
425+
fprintf(fp, "ip link del dev %s 2>/dev/null || true\n", ifname);
426+
417427
fclose(fp);
418428

419429
if (cni_type == IFT_BRIDGE)

src/confd/src/if-veth.c

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,23 +21,28 @@
2121
bool veth_is_primary(struct lyd_node *cif)
2222
{
2323
struct lyd_node *peer, *veth;
24+
bool self_cni, peer_cni;
2425
const char *peername;
2526

2627
veth = lydx_get_child(cif, "veth");
2728
peername = lydx_get_cattr(veth, "peer");
2829
peer = lydx_find_by_name(lyd_parent(cif), "interface", peername);
2930

30-
/* At the moment, CNI code relies on one side of the pair
31-
* remaining in the host namespace, and that that interface
32-
* takes care of creating the pair.
31+
self_cni = lydx_get_child(cif, "container-network") != NULL;
32+
peer_cni = lydx_get_child(peer, "container-network") != NULL;
33+
34+
/* When exactly one end is handed to a container (CNI host-device),
35+
* the other end stays in the host namespace and creates the pair.
3336
*/
34-
if (lydx_get_child(cif, "container-network"))
35-
return false;
36-
if (lydx_get_child(peer, "container-network"))
37-
return true;
37+
if (self_cni != peer_cni)
38+
return peer_cni;
3839

39-
return strcmp(lydx_get_cattr(cif, "name"),
40-
lydx_get_cattr(veth, "peer")) < 0;
40+
/* Neither or both ends are container interfaces: pick a stable
41+
* primary by name so exactly one end creates the pair. When both
42+
* ends are containers the pair is still created in the host
43+
* namespace first, then moved into each container by CNI host-device.
44+
*/
45+
return strcmp(lydx_get_cattr(cif, "name"), peername) < 0;
4146
}
4247

4348
int ifchange_cand_infer_veth(sr_session_ctx_t *session, const char *path)

src/confd/src/interfaces.c

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -518,6 +518,14 @@ static int veth_gen_del(struct lyd_node *dif, FILE *sh)
518518
if (!veth_is_primary(dif))
519519
return 0;
520520

521+
/* When the primary end is itself a container interface it currently
522+
* lives in the container's namespace, so a host-namespace delete here
523+
* would fail and abort the teardown. Its removal is handled after the
524+
* container is gone, see cni_netdag_gen_iface().
525+
*/
526+
if (lydx_get_child(dif, "container-network"))
527+
return 0;
528+
521529
return link_gen_del(dif, sh);
522530
}
523531

@@ -571,6 +579,28 @@ static int netdag_gen_iface_del(struct dagger *net, struct lyd_node *dif,
571579
return 0;
572580
}
573581

582+
/*
583+
* Both ends of a veth pair can be handed to containers, leaving no
584+
* host-side interface to create the pair. Have the primary end create it
585+
* in the host namespace early (NETDAG_INIT_PHYS, before the container is
586+
* (re)started); CNI host-device then moves each end into its container.
587+
*/
588+
static int veth_gen_host(struct dagger *net, struct lyd_node *dif, struct lyd_node *cif)
589+
{
590+
const char *ifname = lydx_get_cattr(cif, "name");
591+
FILE *ip;
592+
int err;
593+
594+
ip = dagger_fopen_net_init(net, ifname, NETDAG_INIT_PHYS, "init.ip");
595+
if (!ip)
596+
return -EIO;
597+
598+
err = veth_gen(dif, cif, ip);
599+
fclose(ip);
600+
601+
return err;
602+
}
603+
574604
static sr_error_t netdag_gen_iface_timeout(struct dagger *net, const char *ifname, const char *iftype)
575605
{
576606
if (!strcmp(iftype, "infix-if-type:ethernet")) {
@@ -604,8 +634,13 @@ static sr_error_t netdag_gen_iface(sr_session_ctx_t *session, struct dagger *net
604634

605635
if ((err = cni_netdag_gen_iface(net, ifname, dif, cif))) {
606636
/* error or managed by CNI/podman */
607-
if (err > 0)
637+
if (err > 0) {
608638
err = 0; /* done, nothing more to do here */
639+
640+
if (op == LYDX_OP_CREATE && lydx_get_child(cif, "veth") &&
641+
veth_is_primary(cif))
642+
err = veth_gen_host(net, dif, cif);
643+
}
609644
goto err;
610645
}
611646

src/confd/yang/confd/infix-if-container.yang

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,7 @@ submodule infix-if-container {
5959

6060
identity host {
6161
base container-network;
62-
description "Host device, e.g., one end of a VETH pair or other host interface.
63-
64-
Note: When using VETH pairs, at least one side must remain in the
65-
host namespace. Both ends of a VETH pair cannot be assigned to
66-
different containers.";
62+
description "Host device, e.g., one end of a VETH pair or other host interface.";
6763
}
6864

6965
/*

src/confd/yang/confd/infix-if-veth.yang

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,7 @@ submodule infix-if-veth {
1313

1414
organization "KernelKit";
1515
contact "kernelkit@googlegroups.com";
16-
description "Linux virtual Ethernet pair extension for ietf-interfaces.
17-
18-
Note: When using VETH pairs with containers, at least one side
19-
of the pair must remain in the host namespace. Both ends of a
20-
VETH pair cannot be assigned to different containers.";
16+
description "Linux virtual Ethernet pair extension for ietf-interfaces.";
2117

2218
revision 2023-06-05 {
2319
description "Initial revision.";

test/case/containers/all.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@
1818
- name: Container with VETH Pair
1919
case: veth/test.py
2020

21+
- name: VETH Pair Between Two Containers
22+
case: internal_link/test.py
23+
2124
- name: Container Volume Persistence
2225
case: volume/test.py
2326

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
test.adoc
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
=== VETH Pair Between Two Containers
2+
3+
ifdef::topdoc[:imagesdir: {topdoc}../../test/case/containers/internal_link]
4+
5+
==== Description
6+
7+
Verify that a VETH pair can connect two containers directly, with *both*
8+
ends handed to containers and neither remaining in the host namespace.
9+
10+
....
11+
.------------. .------------.
12+
| left | | right |
13+
| veth0a ===|========= veth ===========|=== veth0b |
14+
'------------' 10.0.0.1 10.0.0.2 '------------'
15+
....
16+
17+
The pair is created in the host namespace then each end is moved into
18+
its container when starting up. Connectivity is verified by pinging
19+
across the pair, from inside one container's network namespace to the
20+
other end's address.
21+
22+
==== Topology
23+
24+
image::topology.svg[VETH Pair Between Two Containers topology, align=center, scaledwidth=75%]
25+
26+
==== Sequence
27+
28+
. Set up topology and attach to target DUT
29+
. Create VETH pair with both ends assigned to containers
30+
. Verify both containers have started
31+
. Verify {LEFT} reaches {RIGHT} over the internal VETH pair
32+
33+

0 commit comments

Comments
 (0)