Is your feature request related to a problem? Please describe
Filing this separately from #1398: that thread tracks the user-facing symptom, and @rrva's comment proposes a denylist of common config/doc extensions. This issue proposes a different mechanism — an allowlist derived from Serena's own Language enum — that addresses the same symptom by deriving from existing capability instead of curating a list.
When I'm working in a project, the agent has to read non-code files all the time just to figure out what's going on — CLAUDE.md, README, design docs, CHANGELOG, lockfiles, license files, dotfiles, IDE config, build manifests. The serena-hooks remind hook counts every one of those reads toward its threshold and after the third in a row it issues a permissionDecision="deny" telling the agent to use Serena's symbolic tools (find_symbol, get_symbols_overview, find_referencing_symbols) instead. But those tools don't work on non-code files. There are no symbols in a markdown or license file for the LSP/tree-sitter layer to parse. The advice can't be followed.
What I'm asking for: let the agent read non-code files directly without the hook treating those reads as evidence of drift. The hook is doing the right thing on actual source files. It just shouldn't be firing on files where its recommendation has no corresponding capability.
Two consequences worth flagging. The agent stops paying attention to denies, since every misfire is a tool turn burned on a recommendation that can't apply — which is exactly the drift the hook is supposed to catch. Worse, when a deny fires it both resets the counter (hooks.py:298) and disables the hook entirely for the next 120 seconds (_MIN_DENY_INTERVAL_SECONDS at hooks.py:132, gating logic at hooks.py:268). So a misfire on a doc read silently gives the agent a 2-minute window in which actual drift on real source files would be invisible to the hook.
Describe the solution you'd like
Let the agent read non-code files (Markdown, plain text, lockfiles, dotfiles, license files, JSON/YAML/TOML configs, etc.) directly without the remind hook counting those reads toward its threshold. Concretely, a Read should only count when the file is one Serena can actually navigate symbolically — i.e. when its path matches a non-experimental language in the Language enum.
The predicate already exists in the codebase. Language.get_source_fn_matcher() at src/solidlsp/ls_config.py:210 returns a FilenameMatcher for each supported language, and Language.iter_all(include_experimental=False) at line 160 iterates the non-experimental ones. So the check fits in roughly four lines:
def _is_symbolically_navigable(file_path: str) -> bool:
return any(
lang.get_source_fn_matcher().is_relevant_filename(file_path)
for lang in Language.iter_all(include_experimental=False)
)
In ToolUseCounter.update() at src/serena/hooks.py:187, before incrementing the counters, call this predicate on the Read's file_path and return early if it's False. The tool_input payload is already captured into self._tool_input at hooks.py:57 and never read afterwards, so the wiring is small.
Scoping the change to Read only is deliberate. Grep fans out across many files of mixed types and has no clean "is this grep symbolically navigable" answer. Read is also where the false positives concentrate — every config-file open, every README consult, every CLAUDE.md read. Scoping to Read also sharpens the rest of the hook toward Grep, which is closer to its original intent (catching grep-heavy drift on code).
Semantically: the hook fires only when its recommendation could actually be followed. New non-code file types don't need to be enumerated; they automatically don't trigger. New supported languages flow through whenever they're added to the Language enum.
Per-Read cost is a fnmatch loop over the non-experimental languages' file patterns — worst case ~80-100 checks (a non-matching path traverses every matcher). Probably doesn't register against the existing pickle I/O the hook already does, but worth measuring in the PR.
Describe alternatives you've considered
The denylist proposed by @rrva in #1398 addresses the same symptom and would also solve it. The decisive difference: an extension denylist fails open. Any new non-code extension that isn't on the list (.tfvars, .proto, .csv, .dockerfile, .editorconfig, etc.) will keep producing the same false-positive nudge until someone files another ticket. An extension allowlist derived from Language fails closed: anything not explicitly known to be symbolically navigable is excluded automatically. The maintenance burden also lives in one place that's already being maintained for other reasons — whenever a new language is added, its file matcher ships with it.
Bumping the thresholds (3 → 5, etc.) just delays the same problem. Doc reads shouldn't count at all because the recommendation can't apply to them. Tried it before filing.
Disabling the remind hook entirely defeats the purpose. The hook is genuinely useful when it fires on actual source files. The fix shouldn't be to turn it off.
Additional context
A trace from one of my own recent sessions, showing the deny firing inside a burst of pure markdown reads — and the agent ignoring the recommendation because there's nothing else it can do:
20:01:03 Read docs/plans/<redacted>.md
20:01:12 Read docs/plans/<redacted>.md
20:01:16 Read docs/plans/<redacted>.md
20:01:17 DENY "Too many consecutive read calls without using symbolic tools."
20:01:20 Read docs/plans/<redacted>.md ← retry, 3 seconds after the deny
20:01:24 Read docs/plans/<redacted>.md
... 4 more Reads of the same file in the next 14 seconds ...
Three consecutive Reads of the same docs/plans/*.md file — the agent paging through a long implementation plan. Every Read in the burst is on a markdown file where symbolic navigation has no meaning, so the deny's recommendation to use find_symbol / get_symbols_overview / find_referencing_symbols has no possible response. The counter resets, the agent re-issues the same Read 3 seconds later, then reads the same file five more times in the next ~14 seconds. Net effect of the deny is one wasted tool turn; the agent's behavior is unchanged because there's no symbolic alternative to pivot to. The same pattern recurs across multiple unrelated projects in my session logs.
Markdown, JSON, YAML, and TOML are in the Language enum (lines 116-134), but flagged experimental and listed in is_experimental() at lines 173-188. So Language.iter_all(include_experimental=False) excludes them, which is the behavior we want. Users on default Serena setups don't have a Markdown LSP running, so the nudge would be wrong on .md reads in that case. Users who explicitly opt into Markdown/YAML/JSON/TOML LSP via project.yml are self-selecting for the nudge to apply.
The lightweight-import constraint at hooks.py:17 should be OK with this change. solidlsp.ls_config only pulls in the Language enum and FilenameMatcher; the actual language servers live in solidlsp.language_servers.* and aren't loaded by ls_config unless Language.get_ls_class() is called, which the predicate doesn't touch (the if TYPE_CHECKING: from solidlsp import SolidLanguageServer at line 11 confirms it).
There is a new coupling implied here: hooks.py would gain a dependency on solidlsp.ls_config.Language. If you'd rather keep the hook independent of the LSP config layer, a colocated SYMBOLIC_EXTENSIONS constant in hooks.py that mirrors the matchers is an acceptable alternative — slightly less DRY, fully decoupled. Either works; just flagging the coupling so it's an explicit design choice rather than a buried one.
ReadFileTool's own docstring at src/serena/tools/file_tools.py:27-28 already frames this correctly:
"Generally, symbolic operations like find_symbol or find_referencing_symbols should be preferred if you know which symbols you are looking for."
The "if you know which symbols you are looking for" qualifier is exactly the condition the remind hook should be checking, and currently isn't.
Zero new config knobs introduced by this change. Happy to PR — small, ~10 LOC plus tests covering Python/markdown/license files/novel non-code extensions/experimental-language opt-in.
Is your feature request related to a problem? Please describe
Filing this separately from #1398: that thread tracks the user-facing symptom, and @rrva's comment proposes a denylist of common config/doc extensions. This issue proposes a different mechanism — an allowlist derived from Serena's own
Languageenum — that addresses the same symptom by deriving from existing capability instead of curating a list.When I'm working in a project, the agent has to read non-code files all the time just to figure out what's going on — CLAUDE.md, README, design docs, CHANGELOG, lockfiles, license files, dotfiles, IDE config, build manifests. The
serena-hooks remindhook counts every one of those reads toward its threshold and after the third in a row it issues apermissionDecision="deny"telling the agent to use Serena's symbolic tools (find_symbol,get_symbols_overview,find_referencing_symbols) instead. But those tools don't work on non-code files. There are no symbols in a markdown or license file for the LSP/tree-sitter layer to parse. The advice can't be followed.What I'm asking for: let the agent read non-code files directly without the hook treating those reads as evidence of drift. The hook is doing the right thing on actual source files. It just shouldn't be firing on files where its recommendation has no corresponding capability.
Two consequences worth flagging. The agent stops paying attention to denies, since every misfire is a tool turn burned on a recommendation that can't apply — which is exactly the drift the hook is supposed to catch. Worse, when a deny fires it both resets the counter (
hooks.py:298) and disables the hook entirely for the next 120 seconds (_MIN_DENY_INTERVAL_SECONDSathooks.py:132, gating logic athooks.py:268). So a misfire on a doc read silently gives the agent a 2-minute window in which actual drift on real source files would be invisible to the hook.Describe the solution you'd like
Let the agent read non-code files (Markdown, plain text, lockfiles, dotfiles, license files, JSON/YAML/TOML configs, etc.) directly without the
remindhook counting those reads toward its threshold. Concretely, a Read should only count when the file is one Serena can actually navigate symbolically — i.e. when its path matches a non-experimental language in theLanguageenum.The predicate already exists in the codebase.
Language.get_source_fn_matcher()atsrc/solidlsp/ls_config.py:210returns aFilenameMatcherfor each supported language, andLanguage.iter_all(include_experimental=False)at line 160 iterates the non-experimental ones. So the check fits in roughly four lines:In
ToolUseCounter.update()atsrc/serena/hooks.py:187, before incrementing the counters, call this predicate on the Read'sfile_pathand return early if it'sFalse. Thetool_inputpayload is already captured intoself._tool_inputathooks.py:57and never read afterwards, so the wiring is small.Scoping the change to Read only is deliberate. Grep fans out across many files of mixed types and has no clean "is this grep symbolically navigable" answer. Read is also where the false positives concentrate — every config-file open, every README consult, every CLAUDE.md read. Scoping to Read also sharpens the rest of the hook toward Grep, which is closer to its original intent (catching grep-heavy drift on code).
Semantically: the hook fires only when its recommendation could actually be followed. New non-code file types don't need to be enumerated; they automatically don't trigger. New supported languages flow through whenever they're added to the
Languageenum.Per-Read cost is a fnmatch loop over the non-experimental languages' file patterns — worst case ~80-100 checks (a non-matching path traverses every matcher). Probably doesn't register against the existing pickle I/O the hook already does, but worth measuring in the PR.
Describe alternatives you've considered
The denylist proposed by @rrva in #1398 addresses the same symptom and would also solve it. The decisive difference: an extension denylist fails open. Any new non-code extension that isn't on the list (
.tfvars,.proto,.csv,.dockerfile,.editorconfig, etc.) will keep producing the same false-positive nudge until someone files another ticket. An extension allowlist derived fromLanguagefails closed: anything not explicitly known to be symbolically navigable is excluded automatically. The maintenance burden also lives in one place that's already being maintained for other reasons — whenever a new language is added, its file matcher ships with it.Bumping the thresholds (3 → 5, etc.) just delays the same problem. Doc reads shouldn't count at all because the recommendation can't apply to them. Tried it before filing.
Disabling the
remindhook entirely defeats the purpose. The hook is genuinely useful when it fires on actual source files. The fix shouldn't be to turn it off.Additional context
A trace from one of my own recent sessions, showing the deny firing inside a burst of pure markdown reads — and the agent ignoring the recommendation because there's nothing else it can do:
Three consecutive Reads of the same
docs/plans/*.mdfile — the agent paging through a long implementation plan. Every Read in the burst is on a markdown file where symbolic navigation has no meaning, so the deny's recommendation to usefind_symbol/get_symbols_overview/find_referencing_symbolshas no possible response. The counter resets, the agent re-issues the same Read 3 seconds later, then reads the same file five more times in the next ~14 seconds. Net effect of the deny is one wasted tool turn; the agent's behavior is unchanged because there's no symbolic alternative to pivot to. The same pattern recurs across multiple unrelated projects in my session logs.Markdown, JSON, YAML, and TOML are in the
Languageenum (lines 116-134), but flagged experimental and listed inis_experimental()at lines 173-188. SoLanguage.iter_all(include_experimental=False)excludes them, which is the behavior we want. Users on default Serena setups don't have a Markdown LSP running, so the nudge would be wrong on.mdreads in that case. Users who explicitly opt into Markdown/YAML/JSON/TOML LSP viaproject.ymlare self-selecting for the nudge to apply.The lightweight-import constraint at
hooks.py:17should be OK with this change.solidlsp.ls_configonly pulls in theLanguageenum andFilenameMatcher; the actual language servers live insolidlsp.language_servers.*and aren't loaded byls_configunlessLanguage.get_ls_class()is called, which the predicate doesn't touch (theif TYPE_CHECKING: from solidlsp import SolidLanguageServerat line 11 confirms it).There is a new coupling implied here:
hooks.pywould gain a dependency onsolidlsp.ls_config.Language. If you'd rather keep the hook independent of the LSP config layer, a colocatedSYMBOLIC_EXTENSIONSconstant inhooks.pythat mirrors the matchers is an acceptable alternative — slightly less DRY, fully decoupled. Either works; just flagging the coupling so it's an explicit design choice rather than a buried one.ReadFileTool's own docstring atsrc/serena/tools/file_tools.py:27-28already frames this correctly:The "if you know which symbols you are looking for" qualifier is exactly the condition the remind hook should be checking, and currently isn't.
Zero new config knobs introduced by this change. Happy to PR — small, ~10 LOC plus tests covering Python/markdown/license files/novel non-code extensions/experimental-language opt-in.