-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathexample.conf
More file actions
203 lines (186 loc) · 9.88 KB
/
example.conf
File metadata and controls
203 lines (186 loc) · 9.88 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# PacketFrame example config.
#
# Interface names, VLAN IDs, and prefixes in this file are illustrative —
# they match the reference EFG deployment in SPEC.md and will not apply
# verbatim to any other host. Substitute your own.
#
# Grammar is documented in SPEC.md §6. Unknown directives are fatal;
# every interface in an `attach` directive must exist at startup.
global
metrics-textfile /var/lib/node_exporter/textfile/packetframe.prom
log-level info
bpffs-root /sys/fs/bpf/packetframe
state-dir /var/lib/packetframe/state
# Pause between per-iface attaches so each link settles before the
# next attach touches the driver. SPEC.md §11.8 — on some drivers
# XDP attach briefly bounces the link; if multiple attach ifaces
# share a bridge master, bouncing two ports inside one STP
# reconvergence window has been observed to trigger L2 loops. 0s
# disables; recommended >=2s whenever multiple ifaces share a bridge.
attach-settle-time 2s
module fast-path
# Attach native XDP to every interface carrying forwarded traffic we want
# on the fast path. Omit HA links, out-of-band management, and anything
# terminating tunnels locally. See SPEC.md §4.3.
attach eth0 native
attach eth2 native
attach eth3 native
attach eth4 native
attach eth5 native
# Allowlist: matches if src OR dst falls in any prefix (SPEC.md §4.2).
# Asymmetric (dst-only) would leave reply traffic on conntrack.
allow-prefix 23.191.200.0/24
allow-prefix6 2001:db8::/48
# Dry-run mode: program counts but does not redirect. Live-updatable via
# `packetframe reconfigure` once v0.1 ships reconfigure.
dry-run on
# Circuit breaker: denominator is matched traffic, not rx_total
# (SPEC.md §4.9). Trips when drop_unreachable + err_fib_other exceeds
# 1% of matched over 5 consecutive 5-second samples.
circuit-breaker drop-ratio 0.01 of matched window 5s threshold 5
# MSS clamping for fast-pathed TCP SYN/SYN-ACK packets (v0.2.4+,
# SPEC.md §4.x — closes the §11.4 iptables-bypass gap). Standard
# iptables `-A FORWARD ... TCPMSS --set-mss N` rules don't fire on
# XDP-redirected traffic because bpf_redirect_map skips netfilter;
# this directive runs the equivalent mutation inline before the
# redirect.
#
# Lookup precedence (most specific wins, lower-if-higher policy):
# 1. mss-clamp <cidr> via <iface> <mtu> (prefix + egress iface)
# 2. mss-clamp <cidr> <mtu> (prefix, any egress)
# 3. mss-clamp via <iface> <mtu> (egress iface, any prefix)
# 4. mss-clamp <mtu> (global default)
#
# Prefix matches src OR dst (mirrors allow-prefix semantics) so a
# single rule covers both directions of a flow. Clamped on both SYN
# and SYN-ACK so each end's announced MSS is constrained per-direction.
# See docs/runbooks/mss-clamp.md for MSS vs MTU math + troubleshooting.
#
# mss-clamp via eth2 1360 # outbound: leaving WAN
# mss-clamp 23.191.201.0/24 via eth2 1360 # outbound, scoped to one customer
# mss-clamp 1360 # global fallback for all matched
# Driver workaround for the pre-Linux-v6.8 rvu-nicpf native XDP bug
# (SPEC.md §11.1(c); upstream fix is commit 04f647c8e456). Values:
# auto — detect rvu-nicpf via /sys and apply only on native attaches
# (default; safe across driver families).
# on — force-apply regardless of driver.
# off — never apply (set this after upgrading to Linux v6.8+ or a
# kernel with the fix backported).
# driver-workaround rvu-nicpf-head-shift auto
# --- Option F custom FIB. Cutting over to `custom-fib` requires
# bird's BMP protocol dialing this station AND bird's kernel
# export dropping BGP routes. See docs/runbooks/custom-fib.md for
# the full cutover sequence and rollback procedure — do not flip
# this in production without validating on staging first.
# Forwarding path selector. Values:
# kernel-fib — (default) use bpf_fib_lookup() against the kernel
# FIB. Today's behavior; the permanent rollback
# option.
# custom-fib — consult the module's own LPM-trie FIB populated
# from BMP. Requires `route-source bmp ...`.
# compare — run both lookups, forward via kernel result, bump
# disagreement counters. Pre-cutover validation
# mode; expect 2× the per-packet FIB cost. Temporary.
# forwarding-mode kernel-fib
# Route-source feed for the custom FIB. Pick exactly one when
# forwarding-mode is `custom-fib` or `compare`. Two kinds:
#
# bgp <addr>:<port> local-as <asn> peer-as <asn> [router-id <ipv4>]
# iBGP listener — bird dials in via `protocol bgp packetframe
# { neighbor <addr> port <port> as <asn>; ... }`. Bird's
# export filter runs after best-path so we get bird's selected
# paths verbatim. Recommended for production today (bird 2.x /
# 3.x master don't ship RFC 9069 Loc-RIB BMP).
#
# bmp <addr>:<port> [require-loc-rib]
# BMP station — only safe to drive forwarding from if the
# emitter sends RFC 9069 Loc-RIB peer_type=3 frames (FRR has
# this; bird 2.x/3.x does not). The `require-loc-rib` flag
# hard-rejects pre/post-policy frames so misconfiguration
# fails loudly instead of silent wrong-forwarding.
#
# route-source bgp 127.0.0.1:1179 local-as 401401 peer-as 401401 router-id 103.17.154.7
# route-source bmp 127.0.0.1:6543 require-loc-rib
# ECMP default hash mode (3/4/5-tuple). Applied globally today;
# per-group overrides could come later if bird ever surfaces a
# hint via BMP. Rarely needs changing from the default 5.
# ecmp-default-hash-mode 5
# Custom-FIB map sizes. **Parsed but not yet runtime-applied** —
# aya and the kernel allocate BPF maps at compile-time sizes set
# in crates/modules/fast-path/bpf/src/maps.rs. Changing these in
# the config has no effect until the BPF ELF is rebuilt with the
# matching constant. Defaults (2²¹ v4 / 2²⁰ v6) comfortably cover
# the current DFZ.
# fib-v4-max-entries 2097152
# fib-v6-max-entries 1048576
# nexthops-max-entries 8192
# ecmp-groups-max-entries 1024
# Connected fast-path (v0.2.1+). Declare connected/local prefixes
# that bird sees as `direct1`-origin — the ones under `birdc show
# route protocol direct1`. Without these, custom-fib mode falls
# through to kernel slow path for every inbound packet to a
# connected destination because the iBGP-supplied NEXT_HOP for a
# direct route is a self-IP that the neighbour resolver can't map
# to a useful MAC.
#
# When declared, NetlinkNeighborResolver walks the kernel's ARP
# table for hosts in <cidr> reachable via <iface>, and synthesizes
# per-/32 RouteEvent::Add events into FibProgrammer with state=Resolved
# and the host's real MAC. The /32 wins over the /24 in LPM, and
# XDP redirects directly to the host. Recovers ~57pp of bypass for
# matched_dst_only on a typical EFG. No-op when omitted.
#
# local-prefix 23.191.200.0/24 via br1337 # customer LAN
# local-prefix 10.88.1.0/24 via br88 # Ceph internal network
# local-prefix 10.10.1.0/24 via br0 # other internal LAN
# Optional `arp-scavenge` tail flag (v0.2.1+): probe every host IP
# in the prefix at startup so quiet LANs (storage networks where
# hosts only do intra-/24 L2 traffic) populate the kernel ARP cache
# and the per-/32 fast-path lights up. Capped at /22 (≤ 1024 hosts)
# to avoid kernel ARP storms. Off by default.
#
# SAFETY (v0.2.2+): probes go ONLY on the operator-declared `via
# <iface>`, never through kernel route lookup. ARP traffic cannot
# escape that iface's L2 broadcast domain. **Do NOT declare
# arp-scavenge on an IX-attached iface** — broadcasting ARP onto
# an IX violates IX ToS (MANRS, anti-DoS rules). For internal LANs
# only (storage, management, customer LANs).
# local-prefix 10.88.1.0/24 via br88 arp-scavenge
# Synthetic IPv4 default route (v0.2.1, issue #31). Custom-FIB only
# has the prefixes bird's iBGP feed advertised. For destinations
# bird doesn't have specific routes for (RFC 1918, CGNAT,
# test-net, anything outside DFZ), the LPM lookup misses and the
# packet falls to slow path → eats a conntrack entry and gets
# dropped upstream anyway. With `fallback-default`, the resolver
# injects a /0 catch-all into the FIB; XDP redirects directly to
# upstream instead. Same upstream behavior, just no kernel /
# conntrack involvement. Recovers ~25% conntrack pressure on a
# busy customer (Tor exit relay measured).
#
# `<iface>` must match the iface carrying default-route traffic
# (typically the upstream/transit peer's iface). `<ipv4>` is the
# peer's IP — must already be a known kernel ARP entry (or be
# one of the BGP nexthops the resolver already knows about).
#
# fallback-default via eth3 nexthop 194.110.60.50
# XDP-time bogon block (v0.2.1, issue #33). When a packet's dst
# falls in any `block-prefix` AND the packet is otherwise allowlist-
# matched, the program returns XDP_DROP rather than XDP_PASS-to-
# kernel. Drops bogon-bound traffic (RFC 1918, CGNAT, test-net) at
# the earliest possible point — saves the skb allocation, netfilter
# walk, and conntrack capacity that would otherwise be burned on
# packets that just get RST'd upstream anyway. Empty list = no
# behavior change. Operator opts in.
#
# Refusing to start when a `block-prefix` overlaps an `allow-prefix`
# or `local-prefix` (config bug — would silently drop traffic to
# declared customer prefixes).
#
# block-prefix 10.0.0.0/8 # RFC 1918
# block-prefix 172.16.0.0/12 # RFC 1918
# block-prefix 192.168.0.0/16 # RFC 1918
# block-prefix 100.64.0.0/10 # CGNAT (Tailscale, mobile carriers)
# block-prefix 169.254.0.0/16 # link-local
# block-prefix 192.0.2.0/24 # TEST-NET-1
# block-prefix 198.51.100.0/24 # TEST-NET-2
# block-prefix 203.0.113.0/24 # TEST-NET-3