fix(language-lesson): drop over-broad "is" keyword that mislabels nodes with the type-guards concept#454
Conversation
…es with the type-guards concept
The "type guards" concept pattern included the bare keyword "is", and
detectLanguageConcepts matches via unbounded substring includes(). The
2-char substring "is" appears inside common English words (this, list,
exists, analysis), so nearly every node was tagged with "type guards",
polluting the concepts passed to the LLM lesson prompt.
Remove the "is" token and rely on the more specific existing keywords
("type guard", "narrowing", "discriminated union").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thejesh23
left a comment
There was a problem hiding this comment.
1. Same substring-matching bug remains in other concept patterns
The root cause is includes-based matching, not the "is" token specifically. decorators still has "@" (matches every JSDoc @param/email in a summary), dependency injection has "di" (matches "audio", "edit", "directory", "modifies"), and both middleware pattern and streams use "pipe". Same pollution risk; worth a follow-up issue at minimum, or just switching to a word-boundary regex helper now.
2. No predicate-style return-type signal replaces "is"
The canonical TS type guard is function foo(x): x is T. Since tags/summary are LLM-generated prose, you're relying on the model to use the words "type guard"/"narrowing" verbatim. If summaries say e.g. "checks whether value is a User", this PR yields zero detection. Consider adding "is " (with trailing space) or a \bx is \w+\b-style check against the node's signature if available.
3. Test coverage thin for the substring class
Only one negative case is added. The PR description cites three ("This function...", "Persists data", "Renders a list of items"); table-driving all three (plus a positive case where a summary genuinely mentions "type guard" / "narrowing") would lock in both the false-positive fix and the absence of precision regression.
Problem
detectLanguageConceptsmatches keywords withtext.toLowerCase().includes(keyword.toLowerCase()), i.e. unbounded substring matching. Thetype guardspattern includes the keyword"is". Becauseincludesdoes not respect word boundaries, the 2-char substringisappears inside extremely common English words found in node summaries/tags such as "this", "list", "persists", "exists", "analysis", "visible". As a result virtually every node — regardless of whether it has anything to do with TypeScript type guards — gets thetype guardsconcept appended, polluting the detected-concepts list passed into the LLM lesson prompt. Verified by running the exact matching logic: a node summary of "This function adds two numbers", "Persists data to disk", or "Renders a list of items" all return['type guards']. The maintainers narrowly avoided this in the existing 'returns empty for nodes…Fix
"is"keyword (it is too short/generic for substring matching) and rely on the more specific keywords already present ("type guard","narrowing","discriminated union"), e.g."type guards": ["type guard", "narrowing", "discriminated union"]. (A more thorough fix would switch keyword matching to word-boundary matching, but removing the"is"token is the minimal correctness fix.)Testing
Adds unit test(s) that fail before the change and pass after. The full core test suite,
eslint, andtsc --noEmitall pass locally on this branch.Found via a static correctness audit of the language-lesson concept detector.
🤖 Generated with Claude Code