You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace fs_usage with FIFO (named-pipe) bait as the sole read sensor for v1. The agent plants each bait as a named pipe and holds the write end; any process that opens the bait to read it unblocks our open() — that open is the detection. Pure shell, unprivileged, cross-platform, and no kdebug — which dissolves the single-consumer collision that originally motivated this issue (#94/#95).
eslogger is deferred to a documented follow-up (see bottom), to be added only if/when we want scan-path coverage or race-free attribution. fs_usage is removed.
Why FIFO is sufficient for the real threat (verified)
The headline threat (Shai-Hulud-class npm worms) reads canonical credential paths by name (~/.aws/credentials, ~/.npmrc, ~/.config/gcloud/..., azureProfile.json) via readFileSync and cloud-SDK config loaders. Verified locally:
A blind readFileSync on a FIFO drains our served content instantly.
existsSync returns true for a FIFO, and an existsSync-guarded read still drains it — the typical stealer pattern.
The serving loop + grep -F path prefilter is pure shell, no dependency.
The worm also runs trufflehog, which skips non-regular files (if !entry.Type().IsRegular() { continue }, confirmed in trufflehog source — scanDir, Chunks, scanSymlink), so it walks past a FIFO. This does not matter for canonical-path bait, because the worm reads those paths by name, independently of the trufflehog sweep. The trufflehog-skip only blinds bait discoverable solely by scanning a non-canonical path — which is not Thumper's placement strategy. (Symlink tricks don't help: trufflehog applies the regular-file test to the resolved target.)
Known blind spots (accepted for v1)
A worm that calls statSync(path).isFile() before reading skips a FIFO (isFile() is false for a pipe; verified). Atypical for credential stealers.
mmap-based readers can't read a FIFO (mmap → ENODEV). Most readers use read().
Scan-only discovery of a non-canonical decoy path (trufflehog, above).
All three are covered by the deferred eslogger follow-up (a real ES sensor sees the open of a plain regular file regardless of type or discovery method, with inline attribution). Not in v1.
Serving model + integration
plant(): mkfifo "$path" instead of curl -o "$path"; fetch the bait content once and cache it; the watcher writes the cached content into the pipe on each read. Keep the existing traversal/symlink/overwrite guards.
verify_planted(): treat [ -p "$p" ] as planted. It already only stat-checks (never reads), so it does not self-trigger.
Serving loop replaces watch_fs_usage(): while :; do <snapshot reader pid>; printf '%s' "$content" > "$fifo"; fire ...; done.
Reuse fire / debounce / is_noise unchanged.
PID/user attribution (best-effort)
After our open(O_WRONLY) unblocks, the reader is parked in read() with the pipe still open. Snapshot lsof "$path" (or fuser) before writing content to grab the reader's pid, then feed the existing user_of_pid. This is less racy than today's post-hoc lookup (the reader is guaranteed to still hold the pipe). Degrade gracefully to "reader unknown" when lsof misses.
Agent atime fallback sensor is broken #28 (atime fallback broken): atime becomes the sole last-resort fallback (when mkfifo is unsupported, e.g. some network mounts), so fixing its stat-format bug matters more, not less.
Reading a planted FIFO bait fires the HMAC callback — pure shell, unprivileged — on macOS and Linux.
Two agents on one host both detect reads of their own bait simultaneously (no kdebug contention).
Agent exit removes its FIFOs (no orphaned, blocking pipes).
The callback includes the reader's pid/user when obtainable, degrading gracefully when not.
Deferred: eslogger follow-up (not v1)
For scan-path coverage + race-free attribution, add eslogger open as an optional macOS sensor: macOS 13+, root, JSON parsed dependency-free via plutil -extract <keypath> raw -o - -, multi-client (no kdebug). Verified feasible on macOS 26.5.1 (event type open, plutil extraction of event.open.file.path / process.executable.path / pid). TCC/Full Disk Access in the launchd/MDM deploy path is the open question. Tracked separately when prioritized.
The "normal file" gap (a worm that statSync().isFile()-guards, mmaps, or only scan-discovers a path walks past a FIFO — blind spots #1–#3 above) and the pid-attribution goal are now handled by a layered sensor rather than FIFO-alone. Each layer was empirically validated on macOS 26.5.1:
Primary detection — atime tripwire on a REGULAR-FILE bait. Re-armable (touch -a -t <past>; APFS relatime bumps atime on read; poll stat). Catches every reader type incl. the FIFO blind spots, and satisfies "normal file." Honors all sensor constraints (regular file, no kdebug, no mount, no privilege). No pid — detection only. Depends on Agent atime fallback sensor is broken #28 (atime is now a primary layer, not a last resort).
Deterministic pid — FIFO companion. Planted alongside the regular-file bait. Agent polls the write end (open(O_WRONLY|O_NONBLOCK) → ENXIO until a reader is parked); on rendezvous the reader is parked in read() with its fd open, and an inode/realpath scan of full lsof output yields the exact reader pid+name. Validated 8/8 deterministic. Correctness bug found: lsof -t <path> does not match FIFOs (returns nothing) — attribution must scan full lsof and match the bait by inode/realpath, excluding the agent's own pid.
Best-effort pid shortlist — churn-ledger + bait constellation. A lightweight proc_listpids-delta ledger (no kdebug/ESF) records short-lived processes; when an atime bait trips, the processes newly-spawned-and-alive across the read window form a suspect set, and intersecting the suspect sets of several constellation baits narrows it. Validated: 100% capture of the reader, but a ~9-process shortlist under realistic concurrent churn (NOT a unique pid) — rank within it by stealer-shape heuristics (interpreter, open-file count, live outbound socket, parent lineage). Honest limit: a pre-existing long-lived reader emits no new-process signal and can't be attributed this way.
Net: detection on a normal file under all constraints (atime); a deterministic pid for the dominant raw-read worm (FIFO companion); a best-effort shortlist for the rest — with no FDA / kext / ESF / kdebug / mount. The "read + pid under all six constraints" grail is provably impossible on macOS (every read-with-pid kernel hook — ESF, kdebug, DTrace, OpenBSM, Kauth/MACF, FSE_ACCESS_GRANTED — is FDA/entitlement/kext-gated or dead; verified across ~44 empirical + literature investigations). This layered design is the validated maximum.
Update (2026-06-27): definitive mechanisms only
Product decision: only definitive mechanisms. Layer 3 (churn-ledger best-effort shortlist) and the bait-constellation intersection are dropped — they capture the reader but only within a ~9-candidate shortlist, never a definitive pid, and a tripwire must not guess. PR #161 closed.
Decision (v1)
Replace
fs_usagewith FIFO (named-pipe) bait as the sole read sensor for v1. The agent plants each bait as a named pipe and holds the write end; any process that opens the bait to read it unblocks ouropen()— that open is the detection. Pure shell, unprivileged, cross-platform, and no kdebug — which dissolves the single-consumer collision that originally motivated this issue (#94/#95).esloggeris deferred to a documented follow-up (see bottom), to be added only if/when we want scan-path coverage or race-free attribution.fs_usageis removed.Why FIFO is sufficient for the real threat (verified)
The headline threat (Shai-Hulud-class npm worms) reads canonical credential paths by name (
~/.aws/credentials,~/.npmrc,~/.config/gcloud/...,azureProfile.json) viareadFileSyncand cloud-SDK config loaders. Verified locally:readFileSyncon a FIFO drains our served content instantly.existsSyncreturns true for a FIFO, and anexistsSync-guarded read still drains it — the typical stealer pattern.grep -Fpath prefilter is pure shell, no dependency.The worm also runs trufflehog, which skips non-regular files (
if !entry.Type().IsRegular() { continue }, confirmed in trufflehog source —scanDir,Chunks,scanSymlink), so it walks past a FIFO. This does not matter for canonical-path bait, because the worm reads those paths by name, independently of the trufflehog sweep. The trufflehog-skip only blinds bait discoverable solely by scanning a non-canonical path — which is not Thumper's placement strategy. (Symlink tricks don't help: trufflehog applies the regular-file test to the resolved target.)Known blind spots (accepted for v1)
statSync(path).isFile()before reading skips a FIFO (isFile()is false for a pipe; verified). Atypical for credential stealers.mmap-based readers can't read a FIFO (mmap→ENODEV). Most readers useread().All three are covered by the deferred
esloggerfollow-up (a real ES sensor sees theopenof a plain regular file regardless of type or discovery method, with inline attribution). Not in v1.Serving model + integration
plant():mkfifo "$path"instead ofcurl -o "$path"; fetch the bait content once and cache it; the watcher writes the cached content into the pipe on each read. Keep the existing traversal/symlink/overwrite guards.verify_planted(): treat[ -p "$p" ]as planted. It already only stat-checks (never reads), so it does not self-trigger.watch_fs_usage():while :; do <snapshot reader pid>; printf '%s' "$content" > "$fifo"; fire ...; done.fire/ debounce /is_noiseunchanged.PID/user attribution (best-effort)
After our
open(O_WRONLY)unblocks, the reader is parked inread()with the pipe still open. Snapshotlsof "$path"(orfuser) before writing content to grab the reader's pid, then feed the existinguser_of_pid. This is less racy than today's post-hoc lookup (the reader is guaranteed to still hold the pipe). Degrade gracefully to "reader unknown" when lsof misses.Operational risks to handle in v1
open(O_RDONLY)blocks indefinitely — can hang legit/attacker readers and breaks the illusion. Mitigations: watcher liveness + restart (Heartbeat reports healthy while the watcher is dead — endpoint blind but shown alive #98/Dead watcher is never restarted in live-sync mode — no sensor crash recovery #99), and a teardown that removes the FIFOs on exit so a dead agent leaves no hanging pipes. Consider a non-blocking/timeout serving strategy.EPIPE; trap/ignoreSIGPIPEin the serving loop.mkfifothe same path → collision (ties into Singleton lock is per-install, not per-host — two agents run on one host and collide on fs_usage #95's per-host question).Relationship to related sensor issues
mkfifoinstead of on the trace.)mkfifois unsupported, e.g. some network mounts), so fixing its stat-format bug matters more, not less.Acceptance
Deferred: eslogger follow-up (not v1)
For scan-path coverage + race-free attribution, add
eslogger openas an optional macOS sensor: macOS 13+, root, JSON parsed dependency-free viaplutil -extract <keypath> raw -o - -, multi-client (no kdebug). Verified feasible on macOS 26.5.1 (event typeopen, plutil extraction ofevent.open.file.path/process.executable.path/ pid). TCC/Full Disk Access in the launchd/MDM deploy path is the open question. Tracked separately when prioritized.Update (2026-06-26): validated layered attribution design
The "normal file" gap (a worm that
statSync().isFile()-guards,mmaps, or only scan-discovers a path walks past a FIFO — blind spots #1–#3 above) and the pid-attribution goal are now handled by a layered sensor rather than FIFO-alone. Each layer was empirically validated on macOS 26.5.1:touch -a -t <past>; APFS relatime bumps atime on read; pollstat). Catches every reader type incl. the FIFO blind spots, and satisfies "normal file." Honors all sensor constraints (regular file, no kdebug, no mount, no privilege). No pid — detection only. Depends on Agent atime fallback sensor is broken #28 (atime is now a primary layer, not a last resort).open(O_WRONLY|O_NONBLOCK)→ENXIOuntil a reader is parked); on rendezvous the reader is parked inread()with its fd open, and an inode/realpath scan of fulllsofoutput yields the exact reader pid+name. Validated 8/8 deterministic. Correctness bug found:lsof -t <path>does not match FIFOs (returns nothing) — attribution must scan fulllsofand match the bait by inode/realpath, excluding the agent's own pid.proc_listpids-delta ledger (no kdebug/ESF) records short-lived processes; when an atime bait trips, the processes newly-spawned-and-alive across the read window form a suspect set, and intersecting the suspect sets of several constellation baits narrows it. Validated: 100% capture of the reader, but a ~9-process shortlist under realistic concurrent churn (NOT a unique pid) — rank within it by stealer-shape heuristics (interpreter, open-file count, live outbound socket, parent lineage). Honest limit: a pre-existing long-lived reader emits no new-process signal and can't be attributed this way.Net: detection on a normal file under all constraints (atime); a deterministic pid for the dominant raw-read worm (FIFO companion); a best-effort shortlist for the rest — with no FDA / kext / ESF / kdebug / mount. The "read + pid under all six constraints" grail is provably impossible on macOS (every read-with-pid kernel hook — ESF, kdebug, DTrace, OpenBSM, Kauth/MACF,
FSE_ACCESS_GRANTED— is FDA/entitlement/kext-gated or dead; verified across ~44 empirical + literature investigations). This layered design is the validated maximum.Update (2026-06-27): definitive mechanisms only
Product decision: only definitive mechanisms. Layer 3 (churn-ledger best-effort shortlist) and the bait-constellation intersection are dropped — they capture the reader but only within a ~9-candidate shortlist, never a definitive pid, and a tripwire must not guess. PR #161 closed.
The two definitive layers remain: