fix(base-extractor): preserve full string value across escape sequences in getStringValue by tirth8205 · Pull Request #450 · Egonex-AI/Understand-Anything

tirth8205 · 2026-06-14T18:24:20Z

Problem

tree-sitter splits a string literal's contents into MULTIPLE string_fragment nodes whenever an escape_sequence (e.g. \t, ", \n) appears between them. getStringValue returns only the FIRST string_fragment child, so everything from the first escape onward is silently dropped. I verified this against the real tree-sitter-typescript WASM grammar shipped in the repo: for the source import x from './a\tb' the string node has children [" , string_fragment 'a', escape_sequence '\t', string_fragment 'b', '], and getStringValue returns "./a" instead of the full import source. Likewise "a\"b" returns "a" (should be a"b) and "he\tllo" returns "he". The only consumer, typescript-extractor.extractImport (line 364), therefore records a truncated import source for any module path containing an escape, and getStringValue is also re-exported publicly from extractors/index.ts.

Fix

Concatenate the text of every content child (string_fragment and escape_sequence) instead of returning only the first fragment: export function getStringValue(node: TreeSitterNode): string { let value = ""; let found = false; for (let i = 0; i < node.childCount; i++) { const child = node.child(i); if (child && (child.type === "string_fragment" || child.type === "escape_sequence")) { value += child.text; found = true; } } if (found) return value; return node.text.replace(/^['"]|['"]$/g, ""); }…

Testing

Adds unit test(s) that fail before the change and pass after. The full core test suite, eslint, and tsc --noEmit all pass locally on this branch.

Found via a static correctness audit of the shared tree-sitter base extractor.

🤖 Generated with Claude Code

…es in getStringValue tree-sitter splits a string literal's contents into multiple string_fragment nodes whenever an escape_sequence appears between them. getStringValue returned only the first string_fragment, silently dropping everything from the first escape onward (e.g. './a\tb' became './a'). This truncated import sources for any module path containing an escape. Fix concatenates the text of every content child (string_fragment and escape_sequence) instead of returning only the first fragment, preserving the full raw value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

thejesh23

1. Function returns raw source text, not the decoded string value.
escape_sequence.text is the literal source (e.g. the two characters \ + t), so getStringValue on './a\tb' now returns ./a\tb verbatim — a backslash plus t, not a tab. The name and JSDoc ("unquoted string value") imply a decoded value; consumers comparing import sources to filesystem paths or resolved module specifiers will still be wrong, just wrong differently. Either decode the escapes here or rename/document this as "raw inner text".

2. Behavior is JS/TS-grammar-specific but the helper is re-exported as generic.
Only string_fragment/escape_sequence are recognized; Python/Go/Rust/Ruby/etc. use different child node types (string_content, interpreted_string_literal children, raw_string_literal, etc.), so for those grammars the function silently falls through to the quote-stripping path. Worth either restricting the helper to JS-family extractors or adding the other content node types — relevant to #435 (Dart extractor already calls this for string-literal imports).

3. Test coverage misses the fallback and template-string paths.
All three new tests hit the new concatenation branch on TS grammar; the node.text.replace(/^['"]|['"]$/g, "") fallback and template_string (with template_chars / ${...}) are still unexercised, and the second assertion uses toContain rather than toBe so it would also pass on the pre-fix truncated output a. Tighten that assertion and add a fallback-path case.

thejesh23 reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(base-extractor): preserve full string value across escape sequences in getStringValue#450

fix(base-extractor): preserve full string value across escape sequences in getStringValue#450
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/base-extractor-string-escapes

tirth8205 commented Jun 14, 2026

Uh oh!

thejesh23 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tirth8205 commented Jun 14, 2026

Problem

Fix

Testing

Uh oh!

thejesh23 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants