Skip to content

fix(python-extractor): capture type-annotated *args/**kwargs parameters#444

Open
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/python-extractor-typed-varargs
Open

fix(python-extractor): capture type-annotated *args/**kwargs parameters#444
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/python-extractor-typed-varargs

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

In extractParams (python-extractor.ts), the typed_parameter case assumed the parameter name was a direct identifier child: const ident = findChild(child, "identifier"). For type-annotated variadic parameters such as def f(*args: int, **kwargs: str), the tree-sitter-python grammar wraps the splat in a typed_parameter whose first child is a list_splat_pattern (for *args) or dictionary_splat_pattern (for **kwargs) — the identifier is nested inside that pattern, not a direct child. Because findChild is shallow (only scans direct children), it returned null and the parameter was silently dropped.

Verified by running the extractor: def f(a: int, *args: str, b: int = 0, **kwargs: bool) returned params ["a", "b"] instead of ["a", "*args", "b", "**kwargs"]. The untyped forms (def f(*args, **kwargs)) already worked because they parse as bare list_splat_pattern/dictionary_splat_pattern children of parameters. The gap was specifically typed variadics — a very common pattern in typed Python (*args: Any, **kwargs: object).

Fix

In the typed_parameter case, check for a nested list_splat_pattern / dictionary_splat_pattern first and prefix the captured name with * / ** respectively, falling back to the existing plain-identifier lookup otherwise. The change is confined to understand-anything-plugin/packages/core/src/plugins/extractors/python-extractor.ts.

Testing

  • Added test extracts type-annotated *args and **kwargs to python-extractor.test.ts, asserting def f(a: int, *args: str, b: int = 0, **kwargs: bool) yields params ["a", "*args", "b", "**kwargs"].
  • The new test fails before the fix (expected [ 'a', 'b' ] to deeply equal [ 'a', '*args', 'b', '**kwargs' ]) and passes after.
  • Full core vitest suite is green (693 passing, +1 from this test) with no regressions.
  • tsc --noEmit (core) exits 0 and eslint is clean on both changed files.

🤖 Generated with Claude Code

The `typed_parameter` case in extractParams assumed the parameter name
was a direct `identifier` child. For type-annotated variadic parameters
such as `def f(*args: int, **kwargs: str)`, tree-sitter-python wraps the
splat in a `typed_parameter` whose first child is a `list_splat_pattern`
(`*args`) or `dictionary_splat_pattern` (`**kwargs`), with the identifier
nested inside. Since findChild is shallow, it returned null and the
parameter was silently dropped.

Now the `typed_parameter` case checks for a nested splat pattern first and
prefixes the name with `*`/`**` accordingly, falling back to the plain
identifier lookup otherwise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns before this lands.

1. findChild is shallow — fix is grammar-version-fragile. The new branches assume list_splat_pattern / dictionary_splat_pattern are direct children of typed_parameter. That holds in current tree-sitter-python, but if a future grammar revision wraps the splat (e.g., inside a pattern node) the lookup silently returns null again and we regress to the original bug with no test signal. Worth either a recursive search or a comment pinning the assumption to a grammar version.

2. Adjacent gaps still silently dropped. Same extractParams switch has no arm for positional_separator (/), keyword_separator (bare *,), or PEP 695 type_parameter lists (def f[T](x: T) — the [T] lives on function_definition, not parameters, so untouched here, but the bare *// separators do appear in the parameters node and are dropped). Untyped *args followed by keyword-only args already exercises this. Probably out of scope for the fix, but a TODO would help — and the same shape recurs for Dart's {named}/[positional] markers in #435.

3. Test gaps. The new test exercises only a top-level def. Not covered: (a) method form (def m(self, *args: int)) — the new splat branches skip the self/cls filter the plain-identifier branch applies, so a typed *self would leak through (degenerate, but documents intent); (b) async def — should be equivalent but no assertion; (c) typed **kwargs without a preceding typed *args in the same signature, to lock the dict-splat path independently.

Nit: the function-level JSDoc on extractParams (lines 5–10) still reads as if untyped splats are the only splat path — worth one line noting typed variadics are now handled too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants