Skip to content

fs_usage watcher dies silently — no log when the sensor pipeline stops #96

Description

@LiorFink00

Summary

When the fs_usage watcher pipeline dies, the agent logs nothing. watch_fs_usage is fs_usage | grep | while read …; done and then return 0:

agent/thumper_agent.sh:454   $cmd 2>/dev/null | grep --line-buffered -F "$@" | while read -r line; do
agent/thumper_agent.sh:470   return 0

If fs_usage exits (e.g. it lost the kdebug trace to another agent — see fs_usage-singleton issue), the while read loop hits EOF, the pipeline ends, the backgrounded watcher subshell exits, and watch_fs_usage returns 0 as if nothing happened. No error, no warning, no "watcher stopped" line anywhere.

Evidence (observed live)

thumper-dev's full log after its watcher was knocked out:

[11:16:58] heartbeat every 60s (pid 99333)
[11:16:58] watching 1 bait file(s) via fs_usage

That is the last line. The watcher, fs_usage, and grep are all gone, but the log gives zero indication the sensor stopped.

Impact

An operator has no way to know read detection has stopped. The endpoint looks fine. Diagnosing this required manually walking the process tree.

Proposed direction

  • After the pipeline ends, log loudly at error level: watcher exited unexpectedly (fs_usage stopped) with the exit status of fs_usage.
  • Distinguish "fs_usage never started / not permitted" (return 1, line 450) from "fs_usage started then died" — both are currently invisible in the sync-loop path.
  • Consider treating an immediate exit as fatal-or-fallback rather than return 0.

Related: fs_usage-singleton, dead-watcher-not-restarted, exit-logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions