Summary
probe search -o json already gives the right high-level primitive for source discovery: language-aware blocks with line ranges, AST node types, and useful scope flags. For downstream traceability/evidence tooling, the JSON contract needs a bit more language-agnostic AST/search metadata so callers do not have to run their own repo-wide parsers.
This issue intentionally does not ask Probe to understand requirement semantics or test frameworks. Probe should not need to know what Vitest/Jest/Mocha/etc. means. The ask is only for generic facts Probe can know from the AST/search result:
- what symbol or declaration owns this block?
- what enclosing AST context/call/declaration contains this block?
- what comments are attached to the returned block?
- did a text match occur in a comment, string literal, or ordinary code token?
- can search/extract return one stable semantic owner block without merging unrelated owners?
Downstream tools can then apply their own policy. For example, Proof can decide that only comments matching Implements:, Verifies:, or MCDC count as evidence. Probe only needs to expose the source facts.
Tested with:
- CLI JSON output reports
"version": "0.6.0"
- npm package installed as
@probelabs/probe@0.6.0-rc315
- macOS, local CLI installed at
~/.npm-global/bin/probe
Reproduction Fixture
Create this disposable mixed-language fixture:
tmp="$(mktemp -d)"
mkdir -p "$tmp/web/src/__tests__" "$tmp/pkg/demo"
cat > "$tmp/web/src/service.ts" <<'EOF'
export class PolicyService {
// Implements: SYS-REQ-424
async evaluatePolicy(input: string): Promise<boolean> {
return input.length > 0 && input !== "deny";
}
}
// Implements: SYS-REQ-425
export const normalizeDecision = (raw: string) => {
return raw.trim().toLowerCase();
};
EOF
cat > "$tmp/web/src/__tests__/service.test.ts" <<'EOF'
import { describe, it, expect, test } from "vitest";
// Verifies: SYS-REQ-424 [boundary]
test("accepts valid policy", () => {
expect(true && true).toBe(true);
});
// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE
it("records witness row", () => {
expect(true).toBe(true);
});
describe("normalization", () => {
// Verifies: SYS-REQ-425
it("normalizes decisions", () => {
expect(" ALLOW ".trim().toLowerCase()).toBe("allow");
});
});
EOF
cat > "$tmp/pkg/demo/demo.go" <<'EOF'
package demo
// Implements: SYS-REQ-426
func RunDemo(flag bool) bool {
return flag
}
EOF
cat > "$tmp/web/src/noise.ts" <<'EOF'
export const literalOnly = "Implements: SYS-REQ-427";
export function unrelated() {
return "SYS-REQ-428";
}
// SYS-REQ-429 appears here without an annotation verb.
export function looseComment() {
return true;
}
EOF
Case 1: Go function ownership works and is the useful baseline
Command:
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
-o json '"SYS-REQ-426"' "$tmp"
Actual useful result excerpt:
{
"code": "// Implements: SYS-REQ-426\nfunc RunDemo(flag bool) bool {\n return flag\n}",
"node_type": "function_declaration",
"owner_symbol": "RunDemo",
"scope": "function",
"lines": [3, 6]
}
This is the shape downstream tools need: the returned block includes the leading comment and exposes the owning symbol.
Expected: preserve this behavior and expose equivalent generic owner metadata where the AST makes it knowable in JS/TS/TSX.
Case 2: TypeScript class methods and exported const arrows miss generic owner symbols
Command:
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
-o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
Actual TypeScript method result excerpt:
{
"code": " // Implements: SYS-REQ-424\n async evaluatePolicy(input: string): Promise<boolean> {\n return input.length > 0 && input !== \"deny\";\n }",
"node_type": "method_definition",
"scope": "function",
"lines": [2, 5]
}
Actual exported arrow result excerpt:
{
"code": "// Implements: SYS-REQ-425\nexport const normalizeDecision = (raw: string) => {\n return raw.trim().toLowerCase();\n};",
"node_type": "export_statement",
"scope": "declaration",
"lines": [8, 11]
}
Missing/inconsistent generic fields:
- no
owner_symbol for method evaluatePolicy
- no
owner_symbol for variable declarator normalizeDecision
- no generic containing declaration for the class method, e.g. containing class name
node_type: "export_statement" is too generic for the semantic owner
Related command:
probe symbols "$tmp/web/src/service.ts" -o json
Actual symbols excerpt:
[
{
"file": ".../web/src/service.ts",
"symbols": [
{
"name": "export_statement",
"kind": "export",
"signature": "export class PolicyService { ... }",
"line": 1,
"end_line": 6
},
{
"name": "export_statement",
"kind": "export",
"signature": "export const normalizeDecision = (raw: string) => {",
"line": 9,
"end_line": 11
}
]
}
]
Expected language-agnostic shape, using generic symbol/declaration concepts:
{
"language": "typescript",
"node_type": "method_definition",
"owner_symbol": "evaluatePolicy",
"owner_qualified_symbol": "PolicyService.evaluatePolicy",
"enclosing_symbols": [
{"kind": "class", "name": "PolicyService"}
],
"scope": "function"
}
and:
{
"language": "typescript",
"node_type": "variable_declarator",
"owner_symbol": "normalizeDecision",
"owner_qualified_symbol": "normalizeDecision",
"scope": "function"
}
No framework or domain knowledge is needed here. These are just AST owner/declaration facts.
Case 3: Callback blocks need generic enclosing-call context, not framework detection
Command:
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
-o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
Actual callback result excerpts:
{
"code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n expect(true && true).toBe(true);\n});",
"is_test": true,
"node_type": "arrow_function",
"scope": "test",
"lines": [3, 6]
}
{
"code": "// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n expect(true).toBe(true);\n});",
"is_test": true,
"node_type": "arrow_function",
"scope": "test",
"lines": [8, 11]
}
{
"code": " // Verifies: SYS-REQ-425\n it(\"normalizes decisions\", () => {\n expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n });",
"is_test": true,
"node_type": "arrow_function",
"scope": "test",
"lines": [14, 17]
}
Probe does not need to know these are Vitest tests. The useful missing data is generic AST context:
- this arrow function is an argument to a call expression
- the call expression callee text is
test or it
- the call expression first argument is a string literal
- the callback is nested inside another call expression with callee
describe
Expected language-agnostic shape:
{
"language": "typescript",
"node_type": "arrow_function",
"scope": "test",
"enclosing_call": {
"callee": "test",
"first_arg_literal": "accepts valid policy",
"line": 4
},
"enclosing_calls": [
{
"callee": "test",
"first_arg_literal": "accepts valid policy",
"line": 4
}
]
}
For the nested callback:
{
"enclosing_call": {
"callee": "it",
"first_arg_literal": "normalizes decisions",
"line": 15
},
"enclosing_calls": [
{
"callee": "describe",
"first_arg_literal": "normalization",
"line": 13
},
{
"callee": "it",
"first_arg_literal": "normalizes decisions",
"line": 15
}
]
}
This remains framework-agnostic. Downstream tools can decide whether test, it, describe, or any other callee name matters.
Case 4: Requirement IDs inside strings or loose comments are returned as search hits
Command:
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
-o json '"SYS-REQ-427" OR "SYS-REQ-428" OR "SYS-REQ-429"' "$tmp"
Actual result excerpts:
{
"code": "export const literalOnly = \"Implements: SYS-REQ-427\";",
"node_type": "export_statement",
"scope": "declaration",
"matched_keywords": ["sys-req-427"]
}
{
"code": "export function unrelated() {\n return \"SYS-REQ-428\";\n}",
"node_type": "function_declaration",
"owner_symbol": "unrelated",
"scope": "function",
"matched_keywords": ["sys-req-428"]
}
{
"code": "// SYS-REQ-429 appears here without an annotation verb.\nexport function looseComment() {\n return true;\n}",
"node_type": "export_statement",
"owner_symbol": "looseComment",
"scope": "declaration",
"matched_keywords": ["sys-req-429"]
}
It is correct for Probe to find these textual matches. The missing generic metadata is match classification:
- did the match occur in a comment, string literal, identifier, or other code token?
- if in a comment, is the comment leading/trailing/inner relative to the returned owner?
- what are the exact comment line ranges?
- what is the comment text without requiring callers to parse
code?
Expected:
{
"matches": [
{
"text": "SYS-REQ-427",
"line": 1,
"column": 40,
"kind": "string"
}
],
"leading_comments": []
}
and for a real leading comment:
{
"leading_comments": [
{
"text": "// Implements: SYS-REQ-424",
"start_line": 2,
"end_line": 2
}
],
"matches": [
{
"text": "SYS-REQ-424",
"line": 2,
"column": 18,
"kind": "comment",
"comment_role": "leading"
}
]
}
Downstream tools can then reject string-literal matches or loose comments by policy.
Case 5: extract can drop attached leading comments or return partial callback blocks
Command:
probe extract -o json \
"$tmp/web/src/service.ts:3" \
"$tmp/web/src/service.ts:9"
Actual result excerpt:
{
"code": " async evaluatePolicy(input: string): Promise<boolean> {\n return input.length > 0 && input !== \"deny\";\n }",
"lines": [3, 5],
"node_type": "merged_ast_line"
}
The leading comment at line 2 is not included, even though line 3 is inside the commented method.
Command:
probe extract --allow-tests -o json \
"$tmp/web/src/__tests__/service.test.ts:4" \
"$tmp/web/src/__tests__/service.test.ts:15"
Actual result excerpt:
{
"code": "test(\"accepts valid policy\", () => {",
"lines": [4, 4],
"node_type": "context"
}
Expected:
- extracting a line inside a semantic owner can optionally include attached leading comments
- extracting a line inside a callback can return the full enclosing call/callback block when requested
- extraction exposes the same generic owner/comment/context metadata as search
Possible option:
probe extract --semantic-block --allow-tests -o json "$file:$line"
Where --semantic-block means:
- choose the smallest useful semantic owner for the line
- include attached leading comments
- avoid partial fragments for functions, methods, declarations, and callback call blocks
- include generic comments, matches, and enclosing context fields
Case 6: Default search merging can combine multiple semantic owners
Command without --no-merge:
probe search --allow-tests --strict-elastic-syntax --max-results 20 \
-o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
Actual result excerpt:
{
"code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n expect(true && true).toBe(true);\n});\n\n// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n expect(true).toBe(true);\n});\n\ndescribe(\"normalization\", () => {\n // Verifies: SYS-REQ-425\n it(\"normalizes decisions\", () => {\n expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n });",
"is_test": true,
"lines": [3, 17],
"matched_keywords": ["sys-req-424", "sys-req-425"]
}
This is useful for LLM context, but not for evidence tools because one result now contains multiple semantic owners and multiple requirement IDs.
Expected:
- keep current merge behavior for general search if desired
- provide a stable mode that prevents merging across semantic owner boundaries
Possible option:
probe search --semantic-blocks --allow-tests -o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
Where --semantic-blocks means:
- one semantic owner per result
- no merging across function/method/declaration/callback-owner boundaries
- attached leading comments included
- generic comments, matches, and enclosing context fields included
Proposed JSON Fields
This is intentionally generic:
{
"file": "/path/to/file.ts",
"language": "typescript",
"lines": [3, 6],
"code": "...",
"node_type": "method_definition",
"scope": "function",
"owner_symbol": "evaluatePolicy",
"owner_qualified_symbol": "PolicyService.evaluatePolicy",
"enclosing_symbols": [
{"kind": "class", "name": "PolicyService", "line": 1}
],
"enclosing_call": null,
"enclosing_calls": [],
"symbol_signature": "async evaluatePolicy(input: string): Promise<boolean>",
"leading_comments": [
{
"text": "// Implements: SYS-REQ-424",
"start_line": 2,
"end_line": 2
}
],
"matches": [
{
"text": "SYS-REQ-424",
"start_line": 2,
"start_column": 18,
"end_line": 2,
"end_column": 29,
"kind": "comment",
"comment_role": "leading"
}
]
}
For callback blocks, the same shape can include generic call context:
{
"node_type": "arrow_function",
"enclosing_call": {
"callee": "it",
"first_arg_literal": "normalizes decisions",
"line": 15
},
"enclosing_calls": [
{"callee": "describe", "first_arg_literal": "normalization", "line": 13},
{"callee": "it", "first_arg_literal": "normalizes decisions", "line": 15}
]
}
Acceptance Criteria
- Go behavior remains compatible with current
owner_symbol results.
- TS/JS class methods expose method owner and containing class/object where knowable.
- TS/JS exported const arrow functions expose the variable declarator name as owner where knowable.
- Callback blocks can expose generic enclosing call context: callee text, first literal argument if present, and enclosing call chain.
probe symbols returns useful JS/TS symbol/declaration names instead of generic export_statement for common exported classes/functions/const arrows.
- JSON results expose structured attached comments with line ranges.
- JSON results expose match locations and token kind (
comment, string, code, etc.).
- A search/extract mode exists for evidence-style consumers that returns one semantic owner per result and includes attached comments.
- The fixture above can be used as regression coverage for all cases.
Why this matters
Without these generic fields, downstream multi-language tools have to reparse source code using their own AST logic, which recreates language-specific behavior and makes Go, JS, TS, and TSX support diverge.
With these fields, Probe can remain the language-agnostic source-discovery layer. Downstream tools can apply their own domain policy on top.
Summary
probe search -o jsonalready gives the right high-level primitive for source discovery: language-aware blocks with line ranges, AST node types, and useful scope flags. For downstream traceability/evidence tooling, the JSON contract needs a bit more language-agnostic AST/search metadata so callers do not have to run their own repo-wide parsers.This issue intentionally does not ask Probe to understand requirement semantics or test frameworks. Probe should not need to know what Vitest/Jest/Mocha/etc. means. The ask is only for generic facts Probe can know from the AST/search result:
Downstream tools can then apply their own policy. For example, Proof can decide that only comments matching
Implements:,Verifies:, orMCDCcount as evidence. Probe only needs to expose the source facts.Tested with:
"version": "0.6.0"@probelabs/probe@0.6.0-rc315~/.npm-global/bin/probeReproduction Fixture
Create this disposable mixed-language fixture:
Case 1: Go function ownership works and is the useful baseline
Command:
Actual useful result excerpt:
{ "code": "// Implements: SYS-REQ-426\nfunc RunDemo(flag bool) bool {\n return flag\n}", "node_type": "function_declaration", "owner_symbol": "RunDemo", "scope": "function", "lines": [3, 6] }This is the shape downstream tools need: the returned block includes the leading comment and exposes the owning symbol.
Expected: preserve this behavior and expose equivalent generic owner metadata where the AST makes it knowable in JS/TS/TSX.
Case 2: TypeScript class methods and exported const arrows miss generic owner symbols
Command:
Actual TypeScript method result excerpt:
{ "code": " // Implements: SYS-REQ-424\n async evaluatePolicy(input: string): Promise<boolean> {\n return input.length > 0 && input !== \"deny\";\n }", "node_type": "method_definition", "scope": "function", "lines": [2, 5] }Actual exported arrow result excerpt:
{ "code": "// Implements: SYS-REQ-425\nexport const normalizeDecision = (raw: string) => {\n return raw.trim().toLowerCase();\n};", "node_type": "export_statement", "scope": "declaration", "lines": [8, 11] }Missing/inconsistent generic fields:
owner_symbolfor methodevaluatePolicyowner_symbolfor variable declaratornormalizeDecisionnode_type: "export_statement"is too generic for the semantic ownerRelated command:
probe symbols "$tmp/web/src/service.ts" -o jsonActual symbols excerpt:
[ { "file": ".../web/src/service.ts", "symbols": [ { "name": "export_statement", "kind": "export", "signature": "export class PolicyService { ... }", "line": 1, "end_line": 6 }, { "name": "export_statement", "kind": "export", "signature": "export const normalizeDecision = (raw: string) => {", "line": 9, "end_line": 11 } ] } ]Expected language-agnostic shape, using generic symbol/declaration concepts:
{ "language": "typescript", "node_type": "method_definition", "owner_symbol": "evaluatePolicy", "owner_qualified_symbol": "PolicyService.evaluatePolicy", "enclosing_symbols": [ {"kind": "class", "name": "PolicyService"} ], "scope": "function" }and:
{ "language": "typescript", "node_type": "variable_declarator", "owner_symbol": "normalizeDecision", "owner_qualified_symbol": "normalizeDecision", "scope": "function" }No framework or domain knowledge is needed here. These are just AST owner/declaration facts.
Case 3: Callback blocks need generic enclosing-call context, not framework detection
Command:
Actual callback result excerpts:
{ "code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n expect(true && true).toBe(true);\n});", "is_test": true, "node_type": "arrow_function", "scope": "test", "lines": [3, 6] }{ "code": "// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n expect(true).toBe(true);\n});", "is_test": true, "node_type": "arrow_function", "scope": "test", "lines": [8, 11] }{ "code": " // Verifies: SYS-REQ-425\n it(\"normalizes decisions\", () => {\n expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n });", "is_test": true, "node_type": "arrow_function", "scope": "test", "lines": [14, 17] }Probe does not need to know these are Vitest tests. The useful missing data is generic AST context:
testoritdescribeExpected language-agnostic shape:
{ "language": "typescript", "node_type": "arrow_function", "scope": "test", "enclosing_call": { "callee": "test", "first_arg_literal": "accepts valid policy", "line": 4 }, "enclosing_calls": [ { "callee": "test", "first_arg_literal": "accepts valid policy", "line": 4 } ] }For the nested callback:
{ "enclosing_call": { "callee": "it", "first_arg_literal": "normalizes decisions", "line": 15 }, "enclosing_calls": [ { "callee": "describe", "first_arg_literal": "normalization", "line": 13 }, { "callee": "it", "first_arg_literal": "normalizes decisions", "line": 15 } ] }This remains framework-agnostic. Downstream tools can decide whether
test,it,describe, or any other callee name matters.Case 4: Requirement IDs inside strings or loose comments are returned as search hits
Command:
Actual result excerpts:
{ "code": "export const literalOnly = \"Implements: SYS-REQ-427\";", "node_type": "export_statement", "scope": "declaration", "matched_keywords": ["sys-req-427"] }{ "code": "export function unrelated() {\n return \"SYS-REQ-428\";\n}", "node_type": "function_declaration", "owner_symbol": "unrelated", "scope": "function", "matched_keywords": ["sys-req-428"] }{ "code": "// SYS-REQ-429 appears here without an annotation verb.\nexport function looseComment() {\n return true;\n}", "node_type": "export_statement", "owner_symbol": "looseComment", "scope": "declaration", "matched_keywords": ["sys-req-429"] }It is correct for Probe to find these textual matches. The missing generic metadata is match classification:
code?Expected:
{ "matches": [ { "text": "SYS-REQ-427", "line": 1, "column": 40, "kind": "string" } ], "leading_comments": [] }and for a real leading comment:
{ "leading_comments": [ { "text": "// Implements: SYS-REQ-424", "start_line": 2, "end_line": 2 } ], "matches": [ { "text": "SYS-REQ-424", "line": 2, "column": 18, "kind": "comment", "comment_role": "leading" } ] }Downstream tools can then reject string-literal matches or loose comments by policy.
Case 5:
extractcan drop attached leading comments or return partial callback blocksCommand:
Actual result excerpt:
{ "code": " async evaluatePolicy(input: string): Promise<boolean> {\n return input.length > 0 && input !== \"deny\";\n }", "lines": [3, 5], "node_type": "merged_ast_line" }The leading comment at line 2 is not included, even though line 3 is inside the commented method.
Command:
Actual result excerpt:
{ "code": "test(\"accepts valid policy\", () => {", "lines": [4, 4], "node_type": "context" }Expected:
Possible option:
probe extract --semantic-block --allow-tests -o json "$file:$line"Where
--semantic-blockmeans:Case 6: Default search merging can combine multiple semantic owners
Command without
--no-merge:Actual result excerpt:
{ "code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n expect(true && true).toBe(true);\n});\n\n// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n expect(true).toBe(true);\n});\n\ndescribe(\"normalization\", () => {\n // Verifies: SYS-REQ-425\n it(\"normalizes decisions\", () => {\n expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n });", "is_test": true, "lines": [3, 17], "matched_keywords": ["sys-req-424", "sys-req-425"] }This is useful for LLM context, but not for evidence tools because one result now contains multiple semantic owners and multiple requirement IDs.
Expected:
Possible option:
Where
--semantic-blocksmeans:Proposed JSON Fields
This is intentionally generic:
{ "file": "/path/to/file.ts", "language": "typescript", "lines": [3, 6], "code": "...", "node_type": "method_definition", "scope": "function", "owner_symbol": "evaluatePolicy", "owner_qualified_symbol": "PolicyService.evaluatePolicy", "enclosing_symbols": [ {"kind": "class", "name": "PolicyService", "line": 1} ], "enclosing_call": null, "enclosing_calls": [], "symbol_signature": "async evaluatePolicy(input: string): Promise<boolean>", "leading_comments": [ { "text": "// Implements: SYS-REQ-424", "start_line": 2, "end_line": 2 } ], "matches": [ { "text": "SYS-REQ-424", "start_line": 2, "start_column": 18, "end_line": 2, "end_column": 29, "kind": "comment", "comment_role": "leading" } ] }For callback blocks, the same shape can include generic call context:
{ "node_type": "arrow_function", "enclosing_call": { "callee": "it", "first_arg_literal": "normalizes decisions", "line": 15 }, "enclosing_calls": [ {"callee": "describe", "first_arg_literal": "normalization", "line": 13}, {"callee": "it", "first_arg_literal": "normalizes decisions", "line": 15} ] }Acceptance Criteria
owner_symbolresults.probe symbolsreturns useful JS/TS symbol/declaration names instead of genericexport_statementfor common exported classes/functions/const arrows.comment,string,code, etc.).Why this matters
Without these generic fields, downstream multi-language tools have to reparse source code using their own AST logic, which recreates language-specific behavior and makes Go, JS, TS, and TSX support diverge.
With these fields, Probe can remain the language-agnostic source-discovery layer. Downstream tools can apply their own domain policy on top.