Skip to content

parse: add lightweight outlines for common extensions#322

Merged
justrach merged 1 commit intorelease/0.2.579from
fix/common-extension-parsers
Apr 25, 2026
Merged

parse: add lightweight outlines for common extensions#322
justrach merged 1 commit intorelease/0.2.579from
fix/common-extension-parsers

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented Apr 25, 2026

Summary

Follows #321 by making the newly detected extension families produce useful outlines instead of only language tags.

Adds lightweight line-oriented outline parsing for:

  • Java and Kotlin: imports, classes/interfaces/enums, methods/functions, common vars/constants
  • Svelte/Vue/Astro: script imports/functions/constants plus simple style selectors
  • shell: sourced files, functions, assignments
  • CSS/SCSS: selectors, variables, keyframes, mixins/functions
  • SQL: CREATE TABLE/VIEW/INDEX/FUNCTION/PROCEDURE/TRIGGER/TYPE
  • protobuf: imports, messages, enums, services, RPCs
  • Fortran: modules, use, types, subroutines, functions, programs
  • LLVM IR: define/declare, globals, %type = type
  • MLIR: named modules and func ops
  • TableGen: includes, classes/multiclasses, defs/defms, lets

.cc and .mm continue through the existing C/C++ parser path.

Scope

These are deliberately lightweight outline parsers, not full grammars. The goal is high-signal repo tree/symbol coverage for large corpora without adding parser dependencies.

Validation

  • zig build test
  • zig build
  • Added cross-extension outline test covering imports and representative symbols for every new family
  • GitHub bench-regression / bench passed

Benchmark notes

The benchmark check passed. The first run flagged codedb_edit and codedb_status; a rerun cleared codedb_edit but still flagged codedb_status at +22.87%.

The changed code only affects language outline parsing and README text. It does not touch codedb_status or telemetry.approxIndexSizeBytes, so this looks like benchmark noise or unrelated sensitivity in the status case. The rerun showed all parser/search/tree-adjacent tools within threshold:

  • codedb_outline -2.80%
  • codedb_symbol +4.02%
  • codedb_tree +3.97%
  • codedb_find -1.28%
  • codedb_search +0.16%

@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 417329 427242 +2.38% OK
codedb_changes 48039 48796 +1.58% OK
codedb_deps 10365 7858 -24.19% OK
codedb_edit 4784 5482 +14.59% FAIL
codedb_find 56002 54918 -1.94% OK
codedb_hot 88980 86865 -2.38% OK
codedb_outline 214612 210840 -1.76% OK
codedb_read 73709 74668 +1.30% OK
codedb_search 149505 145987 -2.35% OK
codedb_snapshot 2060301 2126995 +3.24% OK
codedb_status 98889 132192 +33.68% FAIL
codedb_symbol 66758 53208 -20.30% OK
codedb_tree 60027 59245 -1.30% OK
codedb_word 60529 58743 -2.95% OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27b8d81f5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/explore.zig
Comment on lines +3922 to +3924
if (indexOfCaseInsensitive(line, keyword)) |pos| {
if (pos > 0 and isIdentChar(line[pos - 1])) return null;
return extractIdent(std.mem.trimStart(u8, line[pos + keyword.len ..], " \t"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore Fortran end statements when extracting declarations

The declaration matching for Fortran relies on extractIdentAfterKeywordIgnoreCase, which accepts keyword hits anywhere in a line rather than only at statement start. As a result, named terminators like end module solver and end subroutine step are parsed as fresh declarations, creating duplicate/incorrect symbols in outlines and symbol search for common Fortran codebases.

Useful? React with 👍 / 👎.

Comment thread src/explore.zig
Comment on lines +4077 to +4080
fn parseFortranTypeName(line: []const u8) ?[]const u8 {
if (!startsWithIgnoreCase(line, "type")) return null;
const sep = std.mem.indexOf(u8, line, "::") orelse return null;
return extractIdent(std.mem.trimStart(u8, line[sep + 2 ..], " \t"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict Fortran type parsing to actual type definitions

parseFortranTypeName currently matches any type...:: form, so regular variable declarations such as type(MyType) :: value are emitted as .struct_def symbols named value. This misclassifies non-type symbols and significantly pollutes Fortran outlines in real projects where derived-type variable declarations are common.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Threshold: 10.00%

Tool Base (ns) Head (ns) Delta Status
codedb_bundle 525431 540603 +2.89% OK
codedb_changes 62158 61725 -0.70% OK
codedb_deps 10551 9928 -5.90% OK
codedb_edit 6076 6214 +2.27% OK
codedb_find 66685 65834 -1.28% OK
codedb_hot 108835 106009 -2.60% OK
codedb_outline 274075 266398 -2.80% OK
codedb_read 92303 92881 +0.63% OK
codedb_search 181414 181700 +0.16% OK
codedb_snapshot 2688574 2717109 +1.06% OK
codedb_status 114812 141071 +22.87% FAIL
codedb_symbol 63486 66041 +4.02% OK
codedb_tree 73406 76322 +3.97% OK
codedb_word 76488 76559 +0.09% OK

@justrach justrach merged commit 4e5864f into release/0.2.579 Apr 25, 2026
2 checks passed
@justrach justrach deleted the fix/common-extension-parsers branch April 27, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant