Expose language-agnostic owner, comment, match, and enclosing-context metadata

## Summary

`probe search -o json` already gives the right high-level primitive for source discovery: language-aware blocks with line ranges, AST node types, and useful scope flags. For downstream traceability/evidence tooling, the JSON contract needs a bit more **language-agnostic AST/search metadata** so callers do not have to run their own repo-wide parsers.

This issue intentionally does **not** ask Probe to understand requirement semantics or test frameworks. Probe should not need to know what Vitest/Jest/Mocha/etc. means. The ask is only for generic facts Probe can know from the AST/search result:

- what symbol or declaration owns this block?
- what enclosing AST context/call/declaration contains this block?
- what comments are attached to the returned block?
- did a text match occur in a comment, string literal, or ordinary code token?
- can search/extract return one stable semantic owner block without merging unrelated owners?

Downstream tools can then apply their own policy. For example, Proof can decide that only comments matching `Implements:`, `Verifies:`, or `MCDC` count as evidence. Probe only needs to expose the source facts.

Tested with:

- CLI JSON output reports `"version": "0.6.0"`
- npm package installed as `@probelabs/probe@0.6.0-rc315`
- macOS, local CLI installed at `~/.npm-global/bin/probe`

## Reproduction Fixture

Create this disposable mixed-language fixture:

```bash
tmp="$(mktemp -d)"
mkdir -p "$tmp/web/src/__tests__" "$tmp/pkg/demo"

cat > "$tmp/web/src/service.ts" <<'EOF'
export class PolicyService {
  // Implements: SYS-REQ-424
  async evaluatePolicy(input: string): Promise<boolean> {
    return input.length > 0 && input !== "deny";
  }
}

// Implements: SYS-REQ-425
export const normalizeDecision = (raw: string) => {
  return raw.trim().toLowerCase();
};
EOF

cat > "$tmp/web/src/__tests__/service.test.ts" <<'EOF'
import { describe, it, expect, test } from "vitest";

// Verifies: SYS-REQ-424 [boundary]
test("accepts valid policy", () => {
  expect(true && true).toBe(true);
});

// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE
it("records witness row", () => {
  expect(true).toBe(true);
});

describe("normalization", () => {
  // Verifies: SYS-REQ-425
  it("normalizes decisions", () => {
    expect(" ALLOW ".trim().toLowerCase()).toBe("allow");
  });
});
EOF

cat > "$tmp/pkg/demo/demo.go" <<'EOF'
package demo

// Implements: SYS-REQ-426
func RunDemo(flag bool) bool {
    return flag
}
EOF

cat > "$tmp/web/src/noise.ts" <<'EOF'
export const literalOnly = "Implements: SYS-REQ-427";

export function unrelated() {
  return "SYS-REQ-428";
}

// SYS-REQ-429 appears here without an annotation verb.
export function looseComment() {
  return true;
}
EOF
```

## Case 1: Go function ownership works and is the useful baseline

Command:

```bash
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
  -o json '"SYS-REQ-426"' "$tmp"
```

Actual useful result excerpt:

```json
{
  "code": "// Implements: SYS-REQ-426\nfunc RunDemo(flag bool) bool {\n    return flag\n}",
  "node_type": "function_declaration",
  "owner_symbol": "RunDemo",
  "scope": "function",
  "lines": [3, 6]
}
```

This is the shape downstream tools need: the returned block includes the leading comment and exposes the owning symbol.

Expected: preserve this behavior and expose equivalent generic owner metadata where the AST makes it knowable in JS/TS/TSX.

## Case 2: TypeScript class methods and exported const arrows miss generic owner symbols

Command:

```bash
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
  -o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
```

Actual TypeScript method result excerpt:

```json
{
  "code": "  // Implements: SYS-REQ-424\n  async evaluatePolicy(input: string): Promise<boolean> {\n    return input.length > 0 && input !== \"deny\";\n  }",
  "node_type": "method_definition",
  "scope": "function",
  "lines": [2, 5]
}
```

Actual exported arrow result excerpt:

```json
{
  "code": "// Implements: SYS-REQ-425\nexport const normalizeDecision = (raw: string) => {\n  return raw.trim().toLowerCase();\n};",
  "node_type": "export_statement",
  "scope": "declaration",
  "lines": [8, 11]
}
```

Missing/inconsistent generic fields:

- no `owner_symbol` for method `evaluatePolicy`
- no `owner_symbol` for variable declarator `normalizeDecision`
- no generic containing declaration for the class method, e.g. containing class name
- `node_type: "export_statement"` is too generic for the semantic owner

Related command:

```bash
probe symbols "$tmp/web/src/service.ts" -o json
```

Actual symbols excerpt:

```json
[
  {
    "file": ".../web/src/service.ts",
    "symbols": [
      {
        "name": "export_statement",
        "kind": "export",
        "signature": "export class PolicyService { ... }",
        "line": 1,
        "end_line": 6
      },
      {
        "name": "export_statement",
        "kind": "export",
        "signature": "export const normalizeDecision = (raw: string) => {",
        "line": 9,
        "end_line": 11
      }
    ]
  }
]
```

Expected language-agnostic shape, using generic symbol/declaration concepts:

```json
{
  "language": "typescript",
  "node_type": "method_definition",
  "owner_symbol": "evaluatePolicy",
  "owner_qualified_symbol": "PolicyService.evaluatePolicy",
  "enclosing_symbols": [
    {"kind": "class", "name": "PolicyService"}
  ],
  "scope": "function"
}
```

and:

```json
{
  "language": "typescript",
  "node_type": "variable_declarator",
  "owner_symbol": "normalizeDecision",
  "owner_qualified_symbol": "normalizeDecision",
  "scope": "function"
}
```

No framework or domain knowledge is needed here. These are just AST owner/declaration facts.

## Case 3: Callback blocks need generic enclosing-call context, not framework detection

Command:

```bash
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
  -o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
```

Actual callback result excerpts:

```json
{
  "code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n  expect(true && true).toBe(true);\n});",
  "is_test": true,
  "node_type": "arrow_function",
  "scope": "test",
  "lines": [3, 6]
}
```

```json
{
  "code": "// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n  expect(true).toBe(true);\n});",
  "is_test": true,
  "node_type": "arrow_function",
  "scope": "test",
  "lines": [8, 11]
}
```

```json
{
  "code": "  // Verifies: SYS-REQ-425\n  it(\"normalizes decisions\", () => {\n    expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n  });",
  "is_test": true,
  "node_type": "arrow_function",
  "scope": "test",
  "lines": [14, 17]
}
```

Probe does not need to know these are Vitest tests. The useful missing data is generic AST context:

- this arrow function is an argument to a call expression
- the call expression callee text is `test` or `it`
- the call expression first argument is a string literal
- the callback is nested inside another call expression with callee `describe`

Expected language-agnostic shape:

```json
{
  "language": "typescript",
  "node_type": "arrow_function",
  "scope": "test",
  "enclosing_call": {
    "callee": "test",
    "first_arg_literal": "accepts valid policy",
    "line": 4
  },
  "enclosing_calls": [
    {
      "callee": "test",
      "first_arg_literal": "accepts valid policy",
      "line": 4
    }
  ]
}
```

For the nested callback:

```json
{
  "enclosing_call": {
    "callee": "it",
    "first_arg_literal": "normalizes decisions",
    "line": 15
  },
  "enclosing_calls": [
    {
      "callee": "describe",
      "first_arg_literal": "normalization",
      "line": 13
    },
    {
      "callee": "it",
      "first_arg_literal": "normalizes decisions",
      "line": 15
    }
  ]
}
```

This remains framework-agnostic. Downstream tools can decide whether `test`, `it`, `describe`, or any other callee name matters.

## Case 4: Requirement IDs inside strings or loose comments are returned as search hits

Command:

```bash
probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
  -o json '"SYS-REQ-427" OR "SYS-REQ-428" OR "SYS-REQ-429"' "$tmp"
```

Actual result excerpts:

```json
{
  "code": "export const literalOnly = \"Implements: SYS-REQ-427\";",
  "node_type": "export_statement",
  "scope": "declaration",
  "matched_keywords": ["sys-req-427"]
}
```

```json
{
  "code": "export function unrelated() {\n  return \"SYS-REQ-428\";\n}",
  "node_type": "function_declaration",
  "owner_symbol": "unrelated",
  "scope": "function",
  "matched_keywords": ["sys-req-428"]
}
```

```json
{
  "code": "// SYS-REQ-429 appears here without an annotation verb.\nexport function looseComment() {\n  return true;\n}",
  "node_type": "export_statement",
  "owner_symbol": "looseComment",
  "scope": "declaration",
  "matched_keywords": ["sys-req-429"]
}
```

It is correct for Probe to find these textual matches. The missing generic metadata is match classification:

- did the match occur in a comment, string literal, identifier, or other code token?
- if in a comment, is the comment leading/trailing/inner relative to the returned owner?
- what are the exact comment line ranges?
- what is the comment text without requiring callers to parse `code`?

Expected:

```json
{
  "matches": [
    {
      "text": "SYS-REQ-427",
      "line": 1,
      "column": 40,
      "kind": "string"
    }
  ],
  "leading_comments": []
}
```

and for a real leading comment:

```json
{
  "leading_comments": [
    {
      "text": "// Implements: SYS-REQ-424",
      "start_line": 2,
      "end_line": 2
    }
  ],
  "matches": [
    {
      "text": "SYS-REQ-424",
      "line": 2,
      "column": 18,
      "kind": "comment",
      "comment_role": "leading"
    }
  ]
}
```

Downstream tools can then reject string-literal matches or loose comments by policy.

## Case 5: `extract` can drop attached leading comments or return partial callback blocks

Command:

```bash
probe extract -o json \
  "$tmp/web/src/service.ts:3" \
  "$tmp/web/src/service.ts:9"
```

Actual result excerpt:

```json
{
  "code": "  async evaluatePolicy(input: string): Promise<boolean> {\n    return input.length > 0 && input !== \"deny\";\n  }",
  "lines": [3, 5],
  "node_type": "merged_ast_line"
}
```

The leading comment at line 2 is not included, even though line 3 is inside the commented method.

Command:

```bash
probe extract --allow-tests -o json \
  "$tmp/web/src/__tests__/service.test.ts:4" \
  "$tmp/web/src/__tests__/service.test.ts:15"
```

Actual result excerpt:

```json
{
  "code": "test(\"accepts valid policy\", () => {",
  "lines": [4, 4],
  "node_type": "context"
}
```

Expected:

- extracting a line inside a semantic owner can optionally include attached leading comments
- extracting a line inside a callback can return the full enclosing call/callback block when requested
- extraction exposes the same generic owner/comment/context metadata as search

Possible option:

```bash
probe extract --semantic-block --allow-tests -o json "$file:$line"
```

Where `--semantic-block` means:

- choose the smallest useful semantic owner for the line
- include attached leading comments
- avoid partial fragments for functions, methods, declarations, and callback call blocks
- include generic comments, matches, and enclosing context fields

## Case 6: Default search merging can combine multiple semantic owners

Command without `--no-merge`:

```bash
probe search --allow-tests --strict-elastic-syntax --max-results 20 \
  -o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
```

Actual result excerpt:

```json
{
  "code": "// Verifies: SYS-REQ-424 [boundary]\ntest(\"accepts valid policy\", () => {\n  expect(true && true).toBe(true);\n});\n\n// MCDC SYS-REQ-424: input_valid=T, not_denied=T => TRUE\nit(\"records witness row\", () => {\n  expect(true).toBe(true);\n});\n\ndescribe(\"normalization\", () => {\n  // Verifies: SYS-REQ-425\n  it(\"normalizes decisions\", () => {\n    expect(\" ALLOW \".trim().toLowerCase()).toBe(\"allow\");\n  });",
  "is_test": true,
  "lines": [3, 17],
  "matched_keywords": ["sys-req-424", "sys-req-425"]
}
```

This is useful for LLM context, but not for evidence tools because one result now contains multiple semantic owners and multiple requirement IDs.

Expected:

- keep current merge behavior for general search if desired
- provide a stable mode that prevents merging across semantic owner boundaries

Possible option:

```bash
probe search --semantic-blocks --allow-tests -o json '"SYS-REQ-424" OR "SYS-REQ-425"' "$tmp"
```

Where `--semantic-blocks` means:

- one semantic owner per result
- no merging across function/method/declaration/callback-owner boundaries
- attached leading comments included
- generic comments, matches, and enclosing context fields included

## Proposed JSON Fields

This is intentionally generic:

```json
{
  "file": "/path/to/file.ts",
  "language": "typescript",
  "lines": [3, 6],
  "code": "...",
  "node_type": "method_definition",
  "scope": "function",
  "owner_symbol": "evaluatePolicy",
  "owner_qualified_symbol": "PolicyService.evaluatePolicy",
  "enclosing_symbols": [
    {"kind": "class", "name": "PolicyService", "line": 1}
  ],
  "enclosing_call": null,
  "enclosing_calls": [],
  "symbol_signature": "async evaluatePolicy(input: string): Promise<boolean>",
  "leading_comments": [
    {
      "text": "// Implements: SYS-REQ-424",
      "start_line": 2,
      "end_line": 2
    }
  ],
  "matches": [
    {
      "text": "SYS-REQ-424",
      "start_line": 2,
      "start_column": 18,
      "end_line": 2,
      "end_column": 29,
      "kind": "comment",
      "comment_role": "leading"
    }
  ]
}
```

For callback blocks, the same shape can include generic call context:

```json
{
  "node_type": "arrow_function",
  "enclosing_call": {
    "callee": "it",
    "first_arg_literal": "normalizes decisions",
    "line": 15
  },
  "enclosing_calls": [
    {"callee": "describe", "first_arg_literal": "normalization", "line": 13},
    {"callee": "it", "first_arg_literal": "normalizes decisions", "line": 15}
  ]
}
```

## Acceptance Criteria

- Go behavior remains compatible with current `owner_symbol` results.
- TS/JS class methods expose method owner and containing class/object where knowable.
- TS/JS exported const arrow functions expose the variable declarator name as owner where knowable.
- Callback blocks can expose generic enclosing call context: callee text, first literal argument if present, and enclosing call chain.
- `probe symbols` returns useful JS/TS symbol/declaration names instead of generic `export_statement` for common exported classes/functions/const arrows.
- JSON results expose structured attached comments with line ranges.
- JSON results expose match locations and token kind (`comment`, `string`, `code`, etc.).
- A search/extract mode exists for evidence-style consumers that returns one semantic owner per result and includes attached comments.
- The fixture above can be used as regression coverage for all cases.

## Why this matters

Without these generic fields, downstream multi-language tools have to reparse source code using their own AST logic, which recreates language-specific behavior and makes Go, JS, TS, and TSX support diverge.

With these fields, Probe can remain the language-agnostic source-discovery layer. Downstream tools can apply their own domain policy on top.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose language-agnostic owner, comment, match, and enclosing-context metadata #557

Summary

Reproduction Fixture

Case 1: Go function ownership works and is the useful baseline

Case 2: TypeScript class methods and exported const arrows miss generic owner symbols

Case 3: Callback blocks need generic enclosing-call context, not framework detection

Case 4: Requirement IDs inside strings or loose comments are returned as search hits

Case 5: `extract` can drop attached leading comments or return partial callback blocks

Case 6: Default search merging can combine multiple semantic owners

Proposed JSON Fields

Acceptance Criteria

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose language-agnostic owner, comment, match, and enclosing-context metadata #557

Description

Summary

Reproduction Fixture

Case 1: Go function ownership works and is the useful baseline

Case 2: TypeScript class methods and exported const arrows miss generic owner symbols

Case 3: Callback blocks need generic enclosing-call context, not framework detection

Case 4: Requirement IDs inside strings or loose comments are returned as search hits

Case 5: extract can drop attached leading comments or return partial callback blocks

Case 6: Default search merging can combine multiple semantic owners

Proposed JSON Fields

Acceptance Criteria

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Case 5: `extract` can drop attached leading comments or return partial callback blocks