Skip to content

Commit 5e99ea7

Browse files
Add support for multi-process port sharing with CIBIR. (#5798)
## Description Fixes #5795 The XDP datapath can be configured to intercept packets based on QUIC Connection ID instead of local port. This behavior existed in MsQuic but was not heavily exercised until recently. One issue was that MsQuic always attempted to reserve UDP / TCP sockets for each application server process. But for multiple server processes that may want to share a single port, we would run into port collision errors. This PR adds support for CIBIR across multiple processes on the same port and document the behavior ## Potential options to allow for multi-process port sharing: - **Option 1:** MsQuic delegates port protection to applications and provides best practice recommendations. > **Analysis:** Deferring the responsibility of port protection and isolation to the application has the upside of enabling the most potential scenarios but could also be a footgun. - **Option 2:** MsQuic makes sure *some* persistent reservation exists at a port. > **Analysis:** Note that LookUpPersistentReservation does not require admin privileges, but CreatePersistentReservation does require admin. This is useful in that if any reservation exists on a port, we can reasonably trust that an admin knew what they were doing when they created it. safety, and ensure the consumers of CIBIR must know what they are doing. - **Option 3:** MsQuic creates per-proc sockets with SIO_CPU_AFFINITY, but does not reserve the port. > **Analysis:** If another unrelated app creates a socket with SIO_CPU_AFFINITY, then they can bind to the CIBIR shared port. But for all other apps, trying to bind a socket to a CIBIR port will result in a collision. ## Option chosen: 1 MsQuic's stance is that the application takes responsibility for book-keeping and protecting sharing shared local ports when using XDP + CIBIR. - Multiple MsQuic processes in Cibir+XDP mode can share a local port for **server sockets only.** - Applications should also not assume the shared port is safe from other non-Msquic processes binding to it. >MsQuic will NOT make an OS port reservation for server sockets when CIBIR+XDP is enabled. Clients on the other hand, MsQuic will always make OS port reservations. - Applications using server sockets + CIBIR/XDP must specify a well-known local port. ## What changed - Server sockets with XDP+CIBIR both enabled/available will skip OS port reservation and OS socket creation to rely on XDP. > any failures plumbing xdp rules will bubble up as a socket creation error to the app. Can't fall back to OS sockets. - Client sockets with XDP+CIBIR both enabled/available will still do OS port reservation and socket creation but rely on XDP. > any failures plumbing xdp rules will silently fall back to using OS sockets. CIBIR transport negotiation can still work without XDP. - Server sockets with CIBIR enabled but XDP not available/enabled will do OS port reservation and fall back to OS sockets - Client sockets with CIBIR enabled but XDP not available/enabled will do OS port reservation and fall back to OS sockets ## Port protection options - Windows has the https://learn.microsoft.com/en-us/windows/win32/api/iphlpapi/nf-iphlpapi-createpersistentudpportreservation API, to allow sysadmins to pre-allocate a block of ports and disallow other applications from binding to it. - A well known CIBIR registry key can be used to detail shared ports, and sysadmins can coordinate their system such that other apps will not bind to those ports. - ALE policies; applications can configure WFP to block certain ports from being binded to by other apps. ## Testing A new DataPathTest was added. ## Documentation Settings.md
1 parent cbfea02 commit 5e99ea7

27 files changed

Lines changed: 862 additions & 632 deletions

docs/CIBIR.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# CIBIR
2+
3+
## What is it
4+
5+
See the [draft IETF](https://datatracker.ietf.org/doc/html/draft-banks-quic-cibir) for CIBIR.
6+
7+
When CIBIR is used, rather than programming [XDP](./XDP.md) to filter and demux packets based on on address and port number,
8+
XDP with CIBIR will instead filter and de-mux packets based on address, port number, and QUIC connection ID.
9+
10+
What CIBIR allows for is 2 or more separate server processes to share a single
11+
port on the same machine, as long as their CIBIR ID is different.
12+
13+
## CIBIR port sharing logic
14+
- Applications must provide a well-known local port for server sockets when using CIBIR and XDP.
15+
- **IMPORTANT:** MsQuic will **NOT** reserve an OS port for server sockets when both CIBIR and XDP is enabled and available.
16+
- Client sockets can never share ports, so MsQuic will reserve an OS port in that scenario.
17+
- The responsibility of book-keeping shared ports and ensuring robust protection for those shared ports is delegated to the application.
18+
19+
20+
## Port protection recommendations for shared ports
21+
22+
### Option 1: Persistent port reservations (Recommended)
23+
24+
MsQuic strongly recommends applications leverage the Windows [persistent port reservations API](https://learn.microsoft.com/en-us/windows/win32/api/iphlpapi/nf-iphlpapi-createpersistentudpportreservation) to secure shared CIBIR ports prior to serving multi-process CIBIR traffic on a shared port.
25+
- One time setup by a system admin to create the persistent reservation.
26+
- A good option for book-keeping persistent port reservations is via registry keys.
27+
- Persistent port reservations survive reboots, allowing for robust protection in the event of crashes.
28+
- Having a persistent reservation makes sure CIBIR ports are taken out of the ephemeral port pool and forbids sockets from binding to it unless it is associated with a persistent reservation token, which can only happen in an elevated process.
29+
- This way, an unsuspecting application process won't get accidently assigned an ephemeral port that collides with a CIBIR port.
30+
31+
### Option 2: WFP ALE (Application Layer Enforcement) filters
32+
33+
As an alternative, applications can use the [Windows Filtering Platform (WFP)](https://learn.microsoft.com/en-us/windows/win32/fwp/windows-filtering-platform-start-page) to create ALE filters that block unauthorized bind attempts to CIBIR ports.
34+
35+
ALE filters operate at the [bind and connect authorization layers](https://learn.microsoft.com/en-us/windows/win32/fwp/ale-layers) (`FWPM_LAYER_ALE_AUTH_RECV_ACCEPT_V4/V6`, `FWPM_LAYER_ALE_RESOURCE_ASSIGNMENT_V4/V6`). A filter can be configured to block any process from binding to a specific UDP port unless it matches an allowed application path or security descriptor.

docs/Settings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ These parameters are accessed by calling [GetParam](./api/GetParam.md) or [SetPa
169169
|-------------------------------------------|---------------------------|-----------|-----------------------------------------------------------|
170170
| `QUIC_PARAM_LISTENER_LOCAL_ADDRESS`<br> 0 | QUIC_ADDR | Get-only | Get the full address tuple the server is listening on. |
171171
| `QUIC_PARAM_LISTENER_STATS`<br> 1 | QUIC_LISTENER_STATISTICS | Get-only | Get statistics specific to this Listener instance. |
172-
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | The CIBIR well-known idenfitier. |
172+
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | Sets a [CIBIR](./CIBIR.md) (CID-Based Identification and Routing) well-known identifier. |
173173
| `QUIC_PARAM_DOS_MODE_EVENTS`<br> 2 | BOOLEAN | Both | The Listener opted in for DoS Mode event. |
174174
| `QUIC_PARAM_LISTENER_PARTITION_INDEX`<br> (preview) | uint16_t | Both | The partition to use for listener callback events and incoming connections. |
175175

docs/XDP.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# MsQuic over XDP
2+
3+
To avoid confusion, "XDP" refers to [XDP-for-windows](https://github.com/microsoft/xdp-for-windows).
4+
MsQuic does not support Linux XDP as a datapath.
5+
6+
## What is XDP
7+
8+
XDP enables received packets to completely bypass the OS networking stack.
9+
10+
Applications can subscribe to XDP ring buffers to post packets to send,
11+
and process packets that are received through AF_XDP sockets.
12+
13+
Additionally, applications can program XDP to determine the
14+
logic for which packets to filter for, and what to do with them.
15+
16+
For instance: "drop all packets with a UDP header and destination port
17+
42."
18+
19+
## Port reservation logic
20+
21+
The type of logic MsQuic programs into XDP looks like:
22+
"redirect all packets with a destination port X to an AF_XDP socket."
23+
24+
This runs into the issue of **packet stealing.** If there was an unrelated process
25+
that binds an OS socket to the same port MsQuic used to program XDP, XDP will steal
26+
that traffic from underneath it.
27+
28+
Which is why MsQuic will always create an OS UDP socket on the same port as the AF_XDP
29+
socket to play nice with the rest of the stack.
30+
31+
There are *exceptions* to this port reservation.
32+
33+
- Sometimes, MsQuic may create a TCP OS socket instead, or both TCP and UDP (see [QTIP](./QTIP.md)).
34+
- Sometimes, MsQuic may NOT create any OS sockets at all (see [CIBIR](./CIBIR.md)).
35+
36+
37+
## MsQuic over XDP general architecture:
38+
39+
```mermaid
40+
flowchart TB
41+
42+
%% =========================
43+
%% NIC + RSS
44+
%% =========================
45+
NIC["NIC interface"]
46+
47+
RSS1["RSS queue"]
48+
RSS2["RSS queue"]
49+
50+
NIC --> RSS1
51+
NIC --> RSS2
52+
53+
%% =========================
54+
%% XDP FILTER ENGINE
55+
%% =========================
56+
subgraph XDP_ENGINE["XDP FILTER ENGINE"]
57+
58+
XDP_PROG1["XDP::XDP program"]
59+
XDP_PROG2["XDP::XDP program"]
60+
61+
XDP_RULES["XDP::XDP RULES"]
62+
63+
AFXDP1["AF_XDP Socket"]
64+
AFXDP2["AF_XDP Socket"]
65+
66+
RSS1 -->|packet data| XDP_PROG1
67+
RSS2 -->|packet data| XDP_PROG2
68+
69+
XDP_PROG1 --> XDP_RULES
70+
XDP_PROG2 --> XDP_RULES
71+
72+
XDP_RULES --> AFXDP1
73+
XDP_RULES --> AFXDP2
74+
75+
end
76+
77+
%% =========================
78+
%% PACKET DEMUX
79+
%% =========================
80+
DEMUX["Packet DE-MUX logic"]
81+
82+
AFXDP1 --> DEMUX
83+
AFXDP2 --> DEMUX
84+
85+
%% =========================
86+
%% CXPLAT SOCKET POOL
87+
%% =========================
88+
subgraph CXPLAT_POOL["CXPLAT SOCKET POOL HASH TABLE"]
89+
90+
CX1["CXPLAT Socket"]
91+
CX2["CXPLAT Socket"]
92+
CX3["CXPLAT Socket"]
93+
CX4["CXPLAT Socket"]
94+
95+
end
96+
97+
DEMUX --> CX1
98+
DEMUX --> CX2
99+
DEMUX --> CX3
100+
DEMUX --> CX4
101+
102+
%% =========================
103+
%% FIND BINDING LOGIC
104+
%% =========================
105+
BIND["FIND BINDING LOGIC"]
106+
107+
CX1 --> BIND
108+
CX2 --> BIND
109+
CX3 --> BIND
110+
CX4 --> BIND
111+
112+
%% =========================
113+
%% MSQUIC OBJECTS
114+
%% =========================
115+
subgraph MSQUIC_OBJECTS["MSQUIC OBJECTS"]
116+
117+
CONN1["Connection"]
118+
CONN2["Connection"]
119+
CONN3["Connection"]
120+
LIST1["Listener"]
121+
LIST2["Listener"]
122+
123+
end
124+
125+
BIND --> CONN1
126+
BIND --> CONN2
127+
BIND --> CONN3
128+
BIND --> LIST1
129+
BIND --> LIST2
130+
```

src/core/connection.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6721,11 +6721,14 @@ QuicConnParamSet(
67216721
memcpy(Connection->CibirId + 1, Buffer, BufferLength);
67226722

67236723
QuicTraceLogConnInfo(
6724-
CibirIdSet,
6724+
CibirIdSetInfo,
67256725
Connection,
6726-
"CIBIR ID set (len %hhu, offset %hhu)",
6726+
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
67276727
Connection->CibirId[0],
6728-
Connection->CibirId[1]);
6728+
Connection->CibirId[1],
6729+
(unsigned long long)QuicCibirIdToUint64(
6730+
Connection->CibirId + 2,
6731+
Connection->CibirId[0]));
67296732

67306733
return QUIC_STATUS_SUCCESS;
67316734
}

src/core/listener.c

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -835,11 +835,14 @@ QuicListenerAcceptConnection(
835835

836836
if (Connection->CibirId[0] != 0) {
837837
QuicTraceLogConnInfo(
838-
CibirIdSet,
838+
CibirIdSetInfo,
839839
Connection,
840-
"CIBIR ID set (len %hhu, offset %hhu)",
840+
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
841841
Connection->CibirId[0],
842-
Connection->CibirId[1]);
842+
Connection->CibirId[1],
843+
(unsigned long long)QuicCibirIdToUint64(
844+
Connection->CibirId + 2,
845+
Connection->CibirId[0]));
843846
}
844847

845848
if (!QuicConnGenerateNewSourceCid(Connection, TRUE)) {
@@ -885,11 +888,14 @@ QuicListenerParamSet(
885888
memcpy(Listener->CibirId + 1, Buffer, BufferLength);
886889

887890
QuicTraceLogVerbose(
888-
ListenerCibirIdSet,
889-
"[list][%p] CIBIR ID set (len %hhu, offset %hhu)",
891+
ListenerCibirIdSetInfo,
892+
"[list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
890893
Listener,
891894
Listener->CibirId[0],
892-
Listener->CibirId[1]);
895+
Listener->CibirId[1],
896+
(unsigned long long)QuicCibirIdToUint64(
897+
Listener->CibirId + 2,
898+
Listener->CibirId[0]));
893899

894900
return QUIC_STATUS_SUCCESS;
895901
}

src/generated/linux/connection.c.clog.h

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -853,21 +853,27 @@ tracepoint(CLOG_CONNECTION_C, LocalInterfaceSet , arg1, arg3);\
853853

854854

855855
/*----------------------------------------------------------
856-
// Decoder Ring for CibirIdSet
857-
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu)
856+
// Decoder Ring for CibirIdSetInfo
857+
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
858858
// QuicTraceLogConnInfo(
859-
CibirIdSet,
859+
CibirIdSetInfo,
860860
Connection,
861-
"CIBIR ID set (len %hhu, offset %hhu)",
861+
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
862862
Connection->CibirId[0],
863-
Connection->CibirId[1]);
863+
Connection->CibirId[1],
864+
(unsigned long long)QuicCibirIdToUint64(
865+
Connection->CibirId + 2,
866+
Connection->CibirId[0]));
864867
// arg1 = arg1 = Connection = arg1
865868
// arg3 = arg3 = Connection->CibirId[0] = arg3
866869
// arg4 = arg4 = Connection->CibirId[1] = arg4
870+
// arg5 = arg5 = (unsigned long long)QuicCibirIdToUint64(
871+
Connection->CibirId + 2,
872+
Connection->CibirId[0]) = arg5
867873
----------------------------------------------------------*/
868-
#ifndef _clog_5_ARGS_TRACE_CibirIdSet
869-
#define _clog_5_ARGS_TRACE_CibirIdSet(uniqueId, arg1, encoded_arg_string, arg3, arg4)\
870-
tracepoint(CLOG_CONNECTION_C, CibirIdSet , arg1, arg3, arg4);\
874+
#ifndef _clog_6_ARGS_TRACE_CibirIdSetInfo
875+
#define _clog_6_ARGS_TRACE_CibirIdSetInfo(uniqueId, arg1, encoded_arg_string, arg3, arg4, arg5)\
876+
tracepoint(CLOG_CONNECTION_C, CibirIdSetInfo , arg1, arg3, arg4, arg5);\
871877

872878
#endif
873879

src/generated/linux/connection.c.clog.h.lttng.h

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -912,27 +912,35 @@ TRACEPOINT_EVENT(CLOG_CONNECTION_C, LocalInterfaceSet,
912912

913913

914914
/*----------------------------------------------------------
915-
// Decoder Ring for CibirIdSet
916-
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu)
915+
// Decoder Ring for CibirIdSetInfo
916+
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
917917
// QuicTraceLogConnInfo(
918-
CibirIdSet,
918+
CibirIdSetInfo,
919919
Connection,
920-
"CIBIR ID set (len %hhu, offset %hhu)",
920+
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
921921
Connection->CibirId[0],
922-
Connection->CibirId[1]);
922+
Connection->CibirId[1],
923+
(unsigned long long)QuicCibirIdToUint64(
924+
Connection->CibirId + 2,
925+
Connection->CibirId[0]));
923926
// arg1 = arg1 = Connection = arg1
924927
// arg3 = arg3 = Connection->CibirId[0] = arg3
925928
// arg4 = arg4 = Connection->CibirId[1] = arg4
929+
// arg5 = arg5 = (unsigned long long)QuicCibirIdToUint64(
930+
Connection->CibirId + 2,
931+
Connection->CibirId[0]) = arg5
926932
----------------------------------------------------------*/
927-
TRACEPOINT_EVENT(CLOG_CONNECTION_C, CibirIdSet,
933+
TRACEPOINT_EVENT(CLOG_CONNECTION_C, CibirIdSetInfo,
928934
TP_ARGS(
929935
const void *, arg1,
930936
unsigned char, arg3,
931-
unsigned char, arg4),
937+
unsigned char, arg4,
938+
unsigned long long, arg5),
932939
TP_FIELDS(
933940
ctf_integer_hex(uint64_t, arg1, (uint64_t)arg1)
934941
ctf_integer(unsigned char, arg3, arg3)
935942
ctf_integer(unsigned char, arg4, arg4)
943+
ctf_integer(uint64_t, arg5, arg5)
936944
)
937945
)
938946

src/generated/linux/datapath_winuser.c.clog.h

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,52 @@ tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathTestSetIpv6TrafficClassFailed , arg2
159159

160160

161161

162+
/*----------------------------------------------------------
163+
// Decoder Ring for DatapathCibirWarning
164+
// [data][%p] CIBIR detected, %s
165+
// QuicTraceLogWarning(
166+
DatapathCibirWarning,
167+
"[data][%p] CIBIR detected, %s",
168+
Socket,
169+
"Skipping OS port reservation for this server socket.");
170+
// arg2 = arg2 = Socket = arg2
171+
// arg3 = arg3 = "Skipping OS port reservation for this server socket." = arg3
172+
----------------------------------------------------------*/
173+
#ifndef _clog_4_ARGS_TRACE_DatapathCibirWarning
174+
#define _clog_4_ARGS_TRACE_DatapathCibirWarning(uniqueId, encoded_arg_string, arg2, arg3)\
175+
tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathCibirWarning , arg2, arg3);\
176+
177+
#endif
178+
179+
180+
181+
182+
/*----------------------------------------------------------
183+
// Decoder Ring for DatapathCibirIdUsed
184+
// [data][%p] Using CIBIR ID (len %hhu, id 0x%llx)
185+
// QuicTraceLogWarning(
186+
DatapathCibirIdUsed,
187+
"[data][%p] Using CIBIR ID (len %hhu, id 0x%llx)",
188+
Socket,
189+
Config->CibirIdLength,
190+
(unsigned long long)QuicCibirIdToUint64(
191+
Config->CibirId,
192+
Config->CibirIdLength));
193+
// arg2 = arg2 = Socket = arg2
194+
// arg3 = arg3 = Config->CibirIdLength = arg3
195+
// arg4 = arg4 = (unsigned long long)QuicCibirIdToUint64(
196+
Config->CibirId,
197+
Config->CibirIdLength) = arg4
198+
----------------------------------------------------------*/
199+
#ifndef _clog_5_ARGS_TRACE_DatapathCibirIdUsed
200+
#define _clog_5_ARGS_TRACE_DatapathCibirIdUsed(uniqueId, encoded_arg_string, arg2, arg3, arg4)\
201+
tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathCibirIdUsed , arg2, arg3, arg4);\
202+
203+
#endif
204+
205+
206+
207+
162208
/*----------------------------------------------------------
163209
// Decoder Ring for DatapathRecvEmpty
164210
// [data][%p] Dropping datagram with empty payload.

0 commit comments

Comments
 (0)