Summary
When the fs_usage watcher pipeline dies, the agent logs nothing. watch_fs_usage is fs_usage | grep | while read …; done and then return 0:
agent/thumper_agent.sh:454 $cmd 2>/dev/null | grep --line-buffered -F "$@" | while read -r line; do
agent/thumper_agent.sh:470 return 0
If fs_usage exits (e.g. it lost the kdebug trace to another agent — see fs_usage-singleton issue), the while read loop hits EOF, the pipeline ends, the backgrounded watcher subshell exits, and watch_fs_usage returns 0 as if nothing happened. No error, no warning, no "watcher stopped" line anywhere.
Evidence (observed live)
thumper-dev's full log after its watcher was knocked out:
[11:16:58] heartbeat every 60s (pid 99333)
[11:16:58] watching 1 bait file(s) via fs_usage
That is the last line. The watcher, fs_usage, and grep are all gone, but the log gives zero indication the sensor stopped.
Impact
An operator has no way to know read detection has stopped. The endpoint looks fine. Diagnosing this required manually walking the process tree.
Proposed direction
- After the pipeline ends, log loudly at error level:
watcher exited unexpectedly (fs_usage stopped) with the exit status of fs_usage.
- Distinguish "fs_usage never started / not permitted" (
return 1, line 450) from "fs_usage started then died" — both are currently invisible in the sync-loop path.
- Consider treating an immediate exit as fatal-or-fallback rather than
return 0.
Related: fs_usage-singleton, dead-watcher-not-restarted, exit-logs.
Summary
When the fs_usage watcher pipeline dies, the agent logs nothing.
watch_fs_usageisfs_usage | grep | while read …; doneand thenreturn 0:If
fs_usageexits (e.g. it lost the kdebug trace to another agent — see fs_usage-singleton issue), thewhile readloop hits EOF, the pipeline ends, the backgrounded watcher subshell exits, andwatch_fs_usagereturns 0 as if nothing happened. No error, no warning, no "watcher stopped" line anywhere.Evidence (observed live)
thumper-dev's full log after its watcher was knocked out:That is the last line. The watcher,
fs_usage, andgrepare all gone, but the log gives zero indication the sensor stopped.Impact
An operator has no way to know read detection has stopped. The endpoint looks fine. Diagnosing this required manually walking the process tree.
Proposed direction
watcher exited unexpectedly (fs_usage stopped)with the exit status of fs_usage.return 1, line 450) from "fs_usage started then died" — both are currently invisible in the sync-loop path.return 0.Related: fs_usage-singleton, dead-watcher-not-restarted, exit-logs.