parse: add lightweight outlines for common extensions#322
parse: add lightweight outlines for common extensions#322justrach merged 1 commit intorelease/0.2.579from
Conversation
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27b8d81f5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (indexOfCaseInsensitive(line, keyword)) |pos| { | ||
| if (pos > 0 and isIdentChar(line[pos - 1])) return null; | ||
| return extractIdent(std.mem.trimStart(u8, line[pos + keyword.len ..], " \t")); |
There was a problem hiding this comment.
Ignore Fortran end statements when extracting declarations
The declaration matching for Fortran relies on extractIdentAfterKeywordIgnoreCase, which accepts keyword hits anywhere in a line rather than only at statement start. As a result, named terminators like end module solver and end subroutine step are parsed as fresh declarations, creating duplicate/incorrect symbols in outlines and symbol search for common Fortran codebases.
Useful? React with 👍 / 👎.
| fn parseFortranTypeName(line: []const u8) ?[]const u8 { | ||
| if (!startsWithIgnoreCase(line, "type")) return null; | ||
| const sep = std.mem.indexOf(u8, line, "::") orelse return null; | ||
| return extractIdent(std.mem.trimStart(u8, line[sep + 2 ..], " \t")); |
There was a problem hiding this comment.
Restrict Fortran type parsing to actual type definitions
parseFortranTypeName currently matches any type...:: form, so regular variable declarations such as type(MyType) :: value are emitted as .struct_def symbols named value. This misclassifies non-type symbols and significantly pollutes Fortran outlines in real projects where derived-type variable declarations are common.
Useful? React with 👍 / 👎.
Benchmark Regression ReportThreshold: 10.00%
|
Summary
Follows #321 by making the newly detected extension families produce useful outlines instead of only language tags.
Adds lightweight line-oriented outline parsing for:
CREATE TABLE/VIEW/INDEX/FUNCTION/PROCEDURE/TRIGGER/TYPEuse, types, subroutines, functions, programsdefine/declare, globals,%type = typefuncops.ccand.mmcontinue through the existing C/C++ parser path.Scope
These are deliberately lightweight outline parsers, not full grammars. The goal is high-signal repo tree/symbol coverage for large corpora without adding parser dependencies.
Validation
zig build testzig buildbench-regression / benchpassedBenchmark notes
The benchmark check passed. The first run flagged
codedb_editandcodedb_status; a rerun clearedcodedb_editbut still flaggedcodedb_statusat +22.87%.The changed code only affects language outline parsing and README text. It does not touch
codedb_statusortelemetry.approxIndexSizeBytes, so this looks like benchmark noise or unrelated sensitivity in the status case. The rerun showed all parser/search/tree-adjacent tools within threshold:codedb_outline-2.80%codedb_symbol+4.02%codedb_tree+3.97%codedb_find-1.28%codedb_search+0.16%