Skip to content

fix(language-lesson): drop over-broad "is" keyword that mislabels nodes with the type-guards concept#454

Open
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/language-lesson-is-keyword
Open

fix(language-lesson): drop over-broad "is" keyword that mislabels nodes with the type-guards concept#454
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/language-lesson-is-keyword

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

  • detectLanguageConcepts matches keywords with text.toLowerCase().includes(keyword.toLowerCase()), i.e. unbounded substring matching. The type guards pattern includes the keyword "is". Because includes does not respect word boundaries, the 2-char substring is appears inside extremely common English words found in node summaries/tags such as "this", "list", "persists", "exists", "analysis", "visible". As a result virtually every node — regardless of whether it has anything to do with TypeScript type guards — gets the type guards concept appended, polluting the detected-concepts list passed into the LLM lesson prompt. Verified by running the exact matching logic: a node summary of "This function adds two numbers", "Persists data to disk", or "Renders a list of items" all return ['type guards']. The maintainers narrowly avoided this in the existing 'returns empty for nodes…

Fix

  • Drop the bare "is" keyword (it is too short/generic for substring matching) and rely on the more specific keywords already present ("type guard", "narrowing", "discriminated union"), e.g. "type guards": ["type guard", "narrowing", "discriminated union"]. (A more thorough fix would switch keyword matching to word-boundary matching, but removing the "is" token is the minimal correctness fix.)

Testing

Adds unit test(s) that fail before the change and pass after. The full core test suite, eslint, and tsc --noEmit all pass locally on this branch.

Found via a static correctness audit of the language-lesson concept detector.

🤖 Generated with Claude Code

…es with the type-guards concept

The "type guards" concept pattern included the bare keyword "is", and
detectLanguageConcepts matches via unbounded substring includes(). The
2-char substring "is" appears inside common English words (this, list,
exists, analysis), so nearly every node was tagged with "type guards",
polluting the concepts passed to the LLM lesson prompt.

Remove the "is" token and rely on the more specific existing keywords
("type guard", "narrowing", "discriminated union").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Same substring-matching bug remains in other concept patterns
The root cause is includes-based matching, not the "is" token specifically. decorators still has "@" (matches every JSDoc @param/email in a summary), dependency injection has "di" (matches "audio", "edit", "directory", "modifies"), and both middleware pattern and streams use "pipe". Same pollution risk; worth a follow-up issue at minimum, or just switching to a word-boundary regex helper now.

2. No predicate-style return-type signal replaces "is"
The canonical TS type guard is function foo(x): x is T. Since tags/summary are LLM-generated prose, you're relying on the model to use the words "type guard"/"narrowing" verbatim. If summaries say e.g. "checks whether value is a User", this PR yields zero detection. Consider adding "is " (with trailing space) or a \bx is \w+\b-style check against the node's signature if available.

3. Test coverage thin for the substring class
Only one negative case is added. The PR description cites three ("This function...", "Persists data", "Renders a list of items"); table-driving all three (plus a positive case where a summary genuinely mentions "type guard" / "narrowing") would lock in both the false-positive fix and the absence of precision regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants