Allow Reads of non-code files (e.g. .md, .txt) without the symbolic-tools nudge

# Is your feature request related to a problem? Please describe

Filing this separately from #1398: that thread tracks the user-facing symptom, and [@rrva's comment](https://github.com/oraios/serena/issues/1398#issuecomment-4302420795) proposes a denylist of common config/doc extensions. This issue proposes a different mechanism — an allowlist derived from Serena's own `Language` enum — that addresses the same symptom by deriving from existing capability instead of curating a list.

When I'm working in a project, the agent has to read non-code files all the time just to figure out what's going on — CLAUDE.md, README, design docs, CHANGELOG, lockfiles, license files, dotfiles, IDE config, build manifests. The `serena-hooks remind` hook counts every one of those reads toward its threshold and after the third in a row it issues a `permissionDecision="deny"` telling the agent to use Serena's symbolic tools (`find_symbol`, `get_symbols_overview`, `find_referencing_symbols`) instead. But those tools don't work on non-code files. There are no symbols in a markdown or license file for the LSP/tree-sitter layer to parse. The advice can't be followed.

What I'm asking for: let the agent read non-code files directly without the hook treating those reads as evidence of drift. The hook is doing the right thing on actual source files. It just shouldn't be firing on files where its recommendation has no corresponding capability.

Two consequences worth flagging. The agent stops paying attention to denies, since every misfire is a tool turn burned on a recommendation that can't apply — which is exactly the drift the hook is supposed to catch. Worse, when a deny fires it both resets the counter (`hooks.py:298`) and disables the hook entirely for the next 120 seconds (`_MIN_DENY_INTERVAL_SECONDS` at `hooks.py:132`, gating logic at `hooks.py:268`). So a misfire on a doc read silently gives the agent a 2-minute window in which actual drift on real source files would be invisible to the hook.

# Describe the solution you'd like

Let the agent read non-code files (Markdown, plain text, lockfiles, dotfiles, license files, JSON/YAML/TOML configs, etc.) directly without the `remind` hook counting those reads toward its threshold. Concretely, a Read should only count when the file is one Serena can actually navigate symbolically — i.e. when its path matches a non-experimental language in the `Language` enum.

The predicate already exists in the codebase. `Language.get_source_fn_matcher()` at `src/solidlsp/ls_config.py:210` returns a `FilenameMatcher` for each supported language, and `Language.iter_all(include_experimental=False)` at line 160 iterates the non-experimental ones. So the check fits in roughly four lines:

```python
def _is_symbolically_navigable(file_path: str) -> bool:
    return any(
        lang.get_source_fn_matcher().is_relevant_filename(file_path)
        for lang in Language.iter_all(include_experimental=False)
    )
```

In `ToolUseCounter.update()` at `src/serena/hooks.py:187`, before incrementing the counters, call this predicate on the Read's `file_path` and return early if it's `False`. The `tool_input` payload is already captured into `self._tool_input` at `hooks.py:57` and never read afterwards, so the wiring is small.

Scoping the change to Read only is deliberate. Grep fans out across many files of mixed types and has no clean "is this grep symbolically navigable" answer. Read is also where the false positives concentrate — every config-file open, every README consult, every CLAUDE.md read. Scoping to Read also sharpens the rest of the hook toward Grep, which is closer to its original intent (catching grep-heavy drift on code).

Semantically: the hook fires only when its recommendation could actually be followed. New non-code file types don't need to be enumerated; they automatically don't trigger. New supported languages flow through whenever they're added to the `Language` enum.

Per-Read cost is a fnmatch loop over the non-experimental languages' file patterns — worst case ~80-100 checks (a non-matching path traverses every matcher). Probably doesn't register against the existing pickle I/O the hook already does, but worth measuring in the PR.

# Describe alternatives you've considered

The denylist proposed by @rrva in #1398 addresses the same symptom and would also solve it. The decisive difference: an extension *denylist* fails open. Any new non-code extension that isn't on the list (`.tfvars`, `.proto`, `.csv`, `.dockerfile`, `.editorconfig`, etc.) will keep producing the same false-positive nudge until someone files another ticket. An extension *allowlist* derived from `Language` fails closed: anything not explicitly known to be symbolically navigable is excluded automatically. The maintenance burden also lives in one place that's already being maintained for other reasons — whenever a new language is added, its file matcher ships with it.

Bumping the thresholds (3 → 5, etc.) just delays the same problem. Doc reads shouldn't count at all because the recommendation can't apply to them. Tried it before filing.

Disabling the `remind` hook entirely defeats the purpose. The hook is genuinely useful when it fires on actual source files. The fix shouldn't be to turn it off.

# Additional context

A trace from one of my own recent sessions, showing the deny firing inside a burst of pure markdown reads — and the agent ignoring the recommendation because there's nothing else it can do:

```
20:01:03  Read   docs/plans/<redacted>.md
20:01:12  Read   docs/plans/<redacted>.md
20:01:16  Read   docs/plans/<redacted>.md
20:01:17  DENY   "Too many consecutive read calls without using symbolic tools."
20:01:20  Read   docs/plans/<redacted>.md      ← retry, 3 seconds after the deny
20:01:24  Read   docs/plans/<redacted>.md
   ... 4 more Reads of the same file in the next 14 seconds ...
```

Three consecutive Reads of the same `docs/plans/*.md` file — the agent paging through a long implementation plan. Every Read in the burst is on a markdown file where symbolic navigation has no meaning, so the deny's recommendation to use `find_symbol` / `get_symbols_overview` / `find_referencing_symbols` has no possible response. The counter resets, the agent re-issues the same Read 3 seconds later, then reads the same file five more times in the next ~14 seconds. Net effect of the deny is one wasted tool turn; the agent's behavior is unchanged because there's no symbolic alternative to pivot to. The same pattern recurs across multiple unrelated projects in my session logs.

Markdown, JSON, YAML, and TOML *are* in the `Language` enum (lines 116-134), but flagged experimental and listed in `is_experimental()` at lines 173-188. So `Language.iter_all(include_experimental=False)` excludes them, which is the behavior we want. Users on default Serena setups don't have a Markdown LSP running, so the nudge would be wrong on `.md` reads in that case. Users who explicitly opt into Markdown/YAML/JSON/TOML LSP via `project.yml` are self-selecting for the nudge to apply.

The lightweight-import constraint at `hooks.py:17` should be OK with this change. `solidlsp.ls_config` only pulls in the `Language` enum and `FilenameMatcher`; the actual language servers live in `solidlsp.language_servers.*` and aren't loaded by `ls_config` unless `Language.get_ls_class()` is called, which the predicate doesn't touch (the `if TYPE_CHECKING: from solidlsp import SolidLanguageServer` at line 11 confirms it).

There is a new coupling implied here: `hooks.py` would gain a dependency on `solidlsp.ls_config.Language`. If you'd rather keep the hook independent of the LSP config layer, a colocated `SYMBOLIC_EXTENSIONS` constant in `hooks.py` that mirrors the matchers is an acceptable alternative — slightly less DRY, fully decoupled. Either works; just flagging the coupling so it's an explicit design choice rather than a buried one.

`ReadFileTool`'s own docstring at `src/serena/tools/file_tools.py:27-28` already frames this correctly:

> "Generally, symbolic operations like find_symbol or find_referencing_symbols should be preferred if you know which symbols you are looking for."

The "if you know which symbols you are looking for" qualifier is exactly the condition the remind hook should be checking, and currently isn't.

Zero new config knobs introduced by this change. Happy to PR — small, ~10 LOC plus tests covering Python/markdown/license files/novel non-code extensions/experimental-language opt-in.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow Reads of non-code files (e.g. .md, .txt) without the symbolic-tools nudge #1429

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Allow Reads of non-code files (e.g. .md, .txt) without the symbolic-tools nudge #1429

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions