fix(python-extractor): capture type-annotated *args/**kwargs parameters#444
fix(python-extractor): capture type-annotated *args/**kwargs parameters#444tirth8205 wants to merge 1 commit into
Conversation
The `typed_parameter` case in extractParams assumed the parameter name was a direct `identifier` child. For type-annotated variadic parameters such as `def f(*args: int, **kwargs: str)`, tree-sitter-python wraps the splat in a `typed_parameter` whose first child is a `list_splat_pattern` (`*args`) or `dictionary_splat_pattern` (`**kwargs`), with the identifier nested inside. Since findChild is shallow, it returned null and the parameter was silently dropped. Now the `typed_parameter` case checks for a nested splat pattern first and prefixes the name with `*`/`**` accordingly, falling back to the plain identifier lookup otherwise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thejesh23
left a comment
There was a problem hiding this comment.
A few concerns before this lands.
1. findChild is shallow — fix is grammar-version-fragile. The new branches assume list_splat_pattern / dictionary_splat_pattern are direct children of typed_parameter. That holds in current tree-sitter-python, but if a future grammar revision wraps the splat (e.g., inside a pattern node) the lookup silently returns null again and we regress to the original bug with no test signal. Worth either a recursive search or a comment pinning the assumption to a grammar version.
2. Adjacent gaps still silently dropped. Same extractParams switch has no arm for positional_separator (/), keyword_separator (bare *,), or PEP 695 type_parameter lists (def f[T](x: T) — the [T] lives on function_definition, not parameters, so untouched here, but the bare *// separators do appear in the parameters node and are dropped). Untyped *args followed by keyword-only args already exercises this. Probably out of scope for the fix, but a TODO would help — and the same shape recurs for Dart's {named}/[positional] markers in #435.
3. Test gaps. The new test exercises only a top-level def. Not covered: (a) method form (def m(self, *args: int)) — the new splat branches skip the self/cls filter the plain-identifier branch applies, so a typed *self would leak through (degenerate, but documents intent); (b) async def — should be equivalent but no assertion; (c) typed **kwargs without a preceding typed *args in the same signature, to lock the dict-splat path independently.
Nit: the function-level JSDoc on extractParams (lines 5–10) still reads as if untyped splats are the only splat path — worth one line noting typed variadics are now handled too.
Problem
In
extractParams(python-extractor.ts), thetyped_parametercase assumed the parameter name was a directidentifierchild:const ident = findChild(child, "identifier"). For type-annotated variadic parameters such asdef f(*args: int, **kwargs: str), the tree-sitter-python grammar wraps the splat in atyped_parameterwhose first child is alist_splat_pattern(for*args) ordictionary_splat_pattern(for**kwargs) — the identifier is nested inside that pattern, not a direct child. BecausefindChildis shallow (only scans direct children), it returnednulland the parameter was silently dropped.Verified by running the extractor:
def f(a: int, *args: str, b: int = 0, **kwargs: bool)returned params["a", "b"]instead of["a", "*args", "b", "**kwargs"]. The untyped forms (def f(*args, **kwargs)) already worked because they parse as barelist_splat_pattern/dictionary_splat_patternchildren ofparameters. The gap was specifically typed variadics — a very common pattern in typed Python (*args: Any,**kwargs: object).Fix
In the
typed_parametercase, check for a nestedlist_splat_pattern/dictionary_splat_patternfirst and prefix the captured name with*/**respectively, falling back to the existing plain-identifier lookup otherwise. The change is confined tounderstand-anything-plugin/packages/core/src/plugins/extractors/python-extractor.ts.Testing
extracts type-annotated *args and **kwargstopython-extractor.test.ts, assertingdef f(a: int, *args: str, b: int = 0, **kwargs: bool)yields params["a", "*args", "b", "**kwargs"].expected [ 'a', 'b' ] to deeply equal [ 'a', '*args', 'b', '**kwargs' ]) and passes after.tsc --noEmit(core) exits 0 andeslintis clean on both changed files.🤖 Generated with Claude Code