Skip to content

tcpstats(4): add TCP connection statistics character device#2079

Closed
randomizedcoder wants to merge 1 commit intofreebsd:mainfrom
randomizedcoder:tcpstats
Closed

tcpstats(4): add TCP connection statistics character device#2079
randomizedcoder wants to merge 1 commit intofreebsd:mainfrom
randomizedcoder:tcpstats

Conversation

@randomizedcoder
Copy link

PR: Add tcpstats(4) — TCP Connection Statistics Character Device

Summary

This PR adds tcpstats, a kernel module that creates a read-only character device /dev/tcpstats for streaming per-connection TCP statistics to userspace. When read, the device iterates every TCP connection in a single kernel pass and emits fixed-size 320-byte records via uiomove().

Why not existing tools?

  • netstat -an — userspace tool, no per-connection metrics (cwnd, RTT, retransmits, ECN), high syscall overhead
  • siftr(4) — logs to file on packet events, not on-demand snapshots; no filtering; no structured binary output
  • tcp_blackbox — ring-buffer trace logging for debugging, not real-time monitoring
  • kern.ipc.tcp_pcblist sysctl — returns xinpcb structs but requires complex userspace parsing and multiple syscalls

tcpstats provides a purpose-built, zero-copy, filterable, binary-stable interface for TCP connection monitoring.

Design Overview

Video

YouToob video: https://youtu.be/e7uPr9q4Lmg

Original repo

This kernel module was originally developed and extensively tested in this repo:
https://github.com/randomizedcoder/bsd-xtcp

Architecture

  • Character device: /dev/tcpstats (compact default fields), /dev/tcpstats-full (all fields)
  • Single-pass iteration: Uses INP_ALL_ITERATOR with INPLOOKUP_RLOCKPCB for consistent snapshot
  • Fixed-size records: 320-byte struct tcp_stats_record with 52-byte spare for future expansion
  • Named profiles: Up to 16 sysctl-created filter profiles, each with a /dev/tcpstats/<name> device node
  • VNET-aware: Uses CURVNET_SET/CURVNET_RESTORE for jail compatibility

Security Model (5 layers)

  1. File permissions: 0440 root:network
  2. Write rejection: EPERM on write open
  3. Credential visibility: cr_canseeinpcb() per connection
  4. FD limit: dev.tcpstats.max_open_fds (default 16) → EMFILE
  5. Reader limit: dev.tcpstats.max_concurrent_readers (default 32) → EBUSY

DoS Protections

  • Concurrent FD limit (tunable via sysctl)
  • Concurrent reader limit (tunable via sysctl)
  • Read timeout: dev.tcpstats.max_read_duration_ms (default 5000ms)
  • Per-FD rate limiting: dev.tcpstats.min_read_interval_ms
  • Signal interruptibility: checks SIGPENDING(curthread) every 1024 sockets
  • Voluntary preemption: kern_yield(PRI_USER) every 1024 sockets

ABI Stability

  • Record size is _Static_assert'd to 320 bytes at compile time
  • Protocol version field (tsr_version) allows future evolution
  • Filter struct version field (TSF_VERSION) for forward compatibility
  • 52-byte spare in record, 16-byte spare in filter struct

Filter System

The filter parser is a dual-compile module (kernel and userspace) supporting:

  • Port filters: Up to 8 local/remote ports, network byte order, duplicate detection
  • State filters: Exclude or include-only modes with 11 TCP states + 3 aliases
  • Address filters: IPv4 CIDR and IPv6 prefix matching with host-bit validation
  • IP version: ipv4_only / ipv6_only flags with conflict detection
  • Field selection: 10 field groups with bitwise composition
  • Format: Compact (default fields) or full (all fields)

All directives are ANDed. Empty/zero fields mean "match any".

Static Analysis Results

The following analysis tools were run on the original out-of-tree module with zero warnings:

Tool Version Result
GCC -Wall -Wextra -Werror 13.2 PASS
GCC -fanalyzer 13.2 PASS
clang scan-build 17 PASS
clang-tidy 17 PASS
cppcheck 2.13 PASS
semgrep latest PASS
flawfinder 2.0 PASS
clang-format (check) 17 PASS

Memory Safety Verification

Tool Platform Result
Valgrind memcheck (filter parser) FreeBSD 14.3 0 errors
AddressSanitizer (filter parser) FreeBSD 15.0 PASS
UndefinedBehaviorSanitizer (filter parser) FreeBSD 15.0 PASS

ATF Tests

Filter Parser Tests (tests/sys/netinet/tcpstats_filter_test.c)

37 ATF C test cases covering the userspace/kernel dual-compile filter parser:

Category Tests Coverage
Empty/whitespace 2 Empty string, whitespace-only input
Port parsing 4 Single, multiple, max (8), both directions
State parsing 4 Exclude, exclude multiple, include-only, case insensitivity
IPv4 parsing 3 Exact match, /24 CIDR, /0 wildcard
IPv6 parsing 3 Loopback, compressed, link-local /10
Flags/format/fields 4 ipv4_only flag, format=full, field groups, full combo
Structural rejections 4 Non-printable chars, unknown directive, missing/empty value
Port rejections 5 Zero, overflow (65536), leading zero, duplicate, >8 ports
State rejections 2 Unknown state, exclude+include conflict
IPv4 rejections 2 Host bits set in CIDR, bad octet
IPv6 rejections 2 Host bits set in prefix, multiple ::
Conflict rejections 2 ipv4_only+ipv6_only, IPv4 addr + ipv6_only

Compiled in userspace linking against tcp_statsdev_filter.c and libprivateatf-c.

Kernel Module Lifecycle Tests (tests/sys/netinet/tcpstats_test.sh)

3 ATF shell test cases (require root):

Test Case Verifies
kmod_load_unload kldload, kldstat -q -m, /dev/tcpstats char device exists, kldunload
dev_readable dd if=/dev/tcpstats succeeds after load
sysctl_exists max_open_fds, max_concurrent_readers, max_read_duration_ms, reads_total, active_fds all present

24-Hour Soak Test Results

Configuration

  • 10,000 concurrent TCP connections maintained throughout
  • Tested on both FreeBSD 14.3-RELEASE and 15.0-RELEASE
  • 576 samples collected over 24 hours (one every 2.5 minutes)

Memory

  • M_TCPSTATS malloc type: Use=0, Memory=0 at every sample
  • Zero memory leaks across 576 samples on both platforms

Stability

  • FreeBSD 15.0: 0 health failures
  • FreeBSD 14.3: 1 transient non-fatal event (self-recovered)

Counters

  • 8.3 million records emitted
  • 11 million sockets visited
  • 0 uiomove() errors
  • 0 signal interrupts
  • Invariant visited == emitted + sum(skipped) held at every sample

Performance Benchmarks

Filter Parser

  • Simple filters: 3-8ns per call
  • Complex filters (full combo): ~820ns per call
  • IPv6 prefix parsing: ~400ns per call

Read Path

  • 1,000 connections: ~275µs per full read
  • 10,000 connections: ~4.4ms per full read
  • Scales linearly with connection count

Concurrent Readers

  • Up to 32 simultaneous readers tested
  • No lock contention (read-only iteration with read lock)

DTrace Probe Verification

7 SDT probes registered and firing (when compiled with -DTCPSTATS_DTRACE):

Probe Args Verified
tcpstats:::read-entry uio_resid, filter_flags Yes
tcpstats:::read-done error, records_emitted, elapsed_ns Yes
tcpstats:::filter-skip inpcb_ptr, reason_code Yes
tcpstats:::filter-match inpcb_ptr Yes
tcpstats:::fill-done elapsed_ns, record_size Yes
tcpstats:::profile-create name Yes
tcpstats:::profile-destroy name Yes

VM Build & Test Results

Built and tested on two FreeBSD VMs with partial source trees (/usr/src/sys only). Module sources were rsynced and build system files patched in-place.

Build

Platform Kernel Compiler Warnings Result
FreeBSD 15.0-RELEASE amd64 GENERIC cc (clang 19), -Werror, gnu17 0 tcpstats.ko (48184 bytes)
FreeBSD 14.4-RELEASE amd64 GENERIC cc (clang 18), -Werror, gnu99 0 tcpstats.ko
FreeBSD 14.3-RELEASE amd64 GENERIC cc (clang 18), -Werror, gnu99 0 tcpstats.ko (48528 bytes)

Smoke Test (load / read / unload)

All three platforms:

  • kldload succeeds, kldstat -v shows module loaded
  • /dev/tcpstats created as cr--r----- root:network
  • All sysctls registered under dev.tcpstats.*
  • dd if=/dev/tcpstats bs=320 count=100 reads valid binary records with correct header
  • kldunload succeeds cleanly
  • dmesg shows tcpstats: loaded (TCP_STATS_VERSION=1, TSF_VERSION=2) / tcpstats: unloaded
  • Zero panics, faults, or error messages in dmesg

ATF Filter Parser Tests (37 test cases)

Platform Passed Failed
FreeBSD 15.0-RELEASE 37/37 0
FreeBSD 14.4-RELEASE 37/37 0
FreeBSD 14.3-RELEASE 37/37 0

ATF Shell Tests (3 test cases)

Platform Passed Failed
FreeBSD 15.0-RELEASE 3/3 0
FreeBSD 14.4-RELEASE 3/3 0
FreeBSD 14.3-RELEASE 3/3 0

Summary

Test FreeBSD 15.0 FreeBSD 14.4 FreeBSD 14.3
Build (tcpstats.ko, -Werror) PASS PASS PASS
Load / unload cycle PASS PASS PASS
/dev/tcpstats created PASS PASS PASS
Sysctls registered PASS PASS PASS
Read binary records PASS PASS PASS
Filter parser ATF (37 tests) 37/37 37/37 37/37
Shell lifecycle ATF (3 tests) 3/3 3/3 3/3
dmesg errors None None None

Files Changed

New Files (9)

File Description
sys/netinet/tcp_statsdev.c Main module: cdev, read/ioctl, record fill, sysctl, DTrace
sys/netinet/tcp_statsdev.h Public header: struct tcp_stats_record, ioctls
sys/netinet/tcp_statsdev_filter.c Filter string parser (dual kernel/userspace)
sys/netinet/tcp_statsdev_filter.h Filter parser API
sys/modules/tcpstats/Makefile Module build file
share/man/man4/tcpstats.4 Man page (mdoc format)
tests/sys/netinet/tcpstats_filter_test.c ATF C tests for filter parser
tests/sys/netinet/tcpstats_test.sh ATF shell tests for kmod lifecycle
PR_SUBMISSION.md This document

Modified Files (5)

File Change
sys/modules/Makefile Added tcpstats to SUBDIR
sys/conf/files Added tcp_statsdev.c and tcp_statsdev_filter.c
sys/conf/options Added TCPSTATS option
share/man/man4/Makefile Added tcpstats.4 to MAN list
tests/sys/netinet/Makefile Added ATF test entries and build config

@github-actions
Copy link

github-actions bot commented Mar 16, 2026

Thank you for taking the time to contribute to FreeBSD!

Some of files have special handling:

Important

@concussious wants to review changes to share/man/

Important

@ngie-eign wants to review changes to tests

Add a loadable kernel module that creates a read-only character device
/dev/tcpstats for streaming per-connection TCP statistics to userspace.
Each read iterates every TCP connection in a single kernel pass and emits
fixed-size 320-byte records via uiomove(9).

Features:
- Single-pass INP_ALL_ITERATOR with INPLOOKUP_RLOCKPCB for consistent snapshots
- Filter system supporting ports, TCP states, IPv4/IPv6 CIDR, field selection
- Named filter profiles via sysctl with per-profile device nodes
- Five-layer security model (permissions, credential checks, resource limits)
- DoS protections: FD/reader limits, read timeout, signal interruptibility
- VNET-aware for jail compatibility
- Optional DTrace SDT probes and sysctl statistics counters

Tested on FreeBSD 14.3, 14.4, and 15.0 (amd64 GENERIC) with zero
compiler warnings under -Werror. Includes 37 ATF C tests for the filter
parser and 3 ATF shell tests for kmod lifecycle verification.

Signed-off-by: Dave Seddon <dave.seddon.ca@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@randomizedcoder
Copy link
Author

Doh!
image

@spmzt spmzt requested review from glebius and spmzt March 16, 2026 09:18
@spmzt
Copy link
Member

spmzt commented Mar 16, 2026

Have you considered importing this module as a port?
I strongly believe it would be more appropriate for this module to be under ports rather than in the tree.

@bms
Copy link

bms commented Mar 16, 2026

Is there not some indirect functional overlap here with BBLog and generic-ebpf (potential future merge candidate) ? I realise that has been called out in the initial comments for the submission, but the Project has other concerns regarding the authorship and provenance of this submission.

I intend to extend BBLog to do Delayed ACK profiling. It isn't clear if there is benefit here in either direction. I eyeballed bsd-xtcp very quickly and don't see profiling for the D-ACK vs Nagle implosion going on. xnu itself does already do this profiling.

So this submission might be better termed a "socket scraper", really.

@randomizedcoder
Copy link
Author

Thanks for the replies guys.

Oh, sorry, I didn't realize ports also has kernel modules, but I see it does. I'm not really sure on the "rules" for the decision for a module to be in freebsd-src, or in the freebsd-ports. I'm happy to close this and move it to ports if that makes more sense, but I wouldn't really call this a "port", cos this is only for freebsd. As you probably know, Linux uses netlink for the tcp-diag stuff, so this isn't a "port" from Linux.

@spmzt
Copy link
Member

spmzt commented Mar 16, 2026

Oh, sorry, I didn't realize ports also has kernel modules, but I see it does. I'm not really sure on the "rules" for the decision for a module to be in freebsd-src, or in the freebsd-ports. I'm happy to close this and move it to ports if that makes more sense, but I wouldn't really call this a "port", cos this is only for freebsd. As you probably know, Linux uses netlink for the tcp-diag stuff, so this isn't a "port" from Linux.

You're right, sometimes it's a gray area when deciding between ports and src.
However, ports are not meant for portable/application kernel modules only.
In fact, we have many FreeBSD-specific kernel modules in the ports:
pkg search kmod
A rule of thumb, if it doesn't need to be in the src tree, it probably shouldn't be there.

I suggest you create a port instead. Please refer to the Porter's Handbook:
https://docs.freebsd.org/en/books/porters-handbook/

Add me into the CC of your PR on bugzilla.
I'll look into that.

randomizedcoder added a commit to randomizedcoder/freebsd-ports that referenced this pull request Mar 17, 2026
Kernel module providing system-wide TCP socket statistics via /dev/tcpstats.
Streams fixed-size 320-byte records with per-connection TCP metrics including
addresses, ports, TCP state, congestion control parameters, RTT measurements,
retransmit counts, ECN statistics, and process attribution.

Follows feedback from freebsd-src PR freebsd/freebsd-src#2079 to move
to ports.

Tested on FreeBSD 15.0, 14.4, and 14.3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@randomizedcoder
Copy link
Author

Closing this PR in favor of freebsd/freebsd-ports#497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants