parse: add lightweight outlines for common extensions by justrach · Pull Request #322 · justrach/codedb

justrach · 2026-04-25T18:48:31Z

Summary

Follows #321 by making the newly detected extension families produce useful outlines instead of only language tags.

Adds lightweight line-oriented outline parsing for:

Java and Kotlin: imports, classes/interfaces/enums, methods/functions, common vars/constants
Svelte/Vue/Astro: script imports/functions/constants plus simple style selectors
shell: sourced files, functions, assignments
CSS/SCSS: selectors, variables, keyframes, mixins/functions
SQL: CREATE TABLE/VIEW/INDEX/FUNCTION/PROCEDURE/TRIGGER/TYPE
protobuf: imports, messages, enums, services, RPCs
Fortran: modules, use, types, subroutines, functions, programs
LLVM IR: define/declare, globals, %type = type
MLIR: named modules and func ops
TableGen: includes, classes/multiclasses, defs/defms, lets

.cc and .mm continue through the existing C/C++ parser path.

Scope

These are deliberately lightweight outline parsers, not full grammars. The goal is high-signal repo tree/symbol coverage for large corpora without adding parser dependencies.

Validation

zig build test
zig build
Added cross-extension outline test covering imports and representative symbols for every new family
GitHub bench-regression / bench passed

Benchmark notes

The benchmark check passed. The first run flagged codedb_edit and codedb_status; a rerun cleared codedb_edit but still flagged codedb_status at +22.87%.

The changed code only affects language outline parsing and README text. It does not touch codedb_status or telemetry.approxIndexSizeBytes, so this looks like benchmark noise or unrelated sensitivity in the status case. The rerun showed all parser/search/tree-adjacent tools within threshold:

codedb_outline -2.80%
codedb_symbol +4.02%
codedb_tree +3.97%
codedb_find -1.28%
codedb_search +0.16%

github-actions · 2026-04-25T18:50:10Z

Benchmark Regression Report

Threshold: 10.00%

Tool	Base (ns)	Head (ns)	Delta	Status
`codedb_bundle`	417329	427242	+2.38%	OK
`codedb_changes`	48039	48796	+1.58%	OK
`codedb_deps`	10365	7858	-24.19%	OK
`codedb_edit`	4784	5482	+14.59%	FAIL
`codedb_find`	56002	54918	-1.94%	OK
`codedb_hot`	88980	86865	-2.38%	OK
`codedb_outline`	214612	210840	-1.76%	OK
`codedb_read`	73709	74668	+1.30%	OK
`codedb_search`	149505	145987	-2.35%	OK
`codedb_snapshot`	2060301	2126995	+3.24%	OK
`codedb_status`	98889	132192	+33.68%	FAIL
`codedb_symbol`	66758	53208	-20.30%	OK
`codedb_tree`	60027	59245	-1.30%	OK
`codedb_word`	60529	58743	-2.95%	OK

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27b8d81f5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T18:51:59Z

+    if (indexOfCaseInsensitive(line, keyword)) |pos| {
+        if (pos > 0 and isIdentChar(line[pos - 1])) return null;
+        return extractIdent(std.mem.trimStart(u8, line[pos + keyword.len ..], " \t"));


Ignore Fortran end statements when extracting declarations

The declaration matching for Fortran relies on extractIdentAfterKeywordIgnoreCase, which accepts keyword hits anywhere in a line rather than only at statement start. As a result, named terminators like end module solver and end subroutine step are parsed as fresh declarations, creating duplicate/incorrect symbols in outlines and symbol search for common Fortran codebases.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-25T18:51:59Z

+fn parseFortranTypeName(line: []const u8) ?[]const u8 {
+    if (!startsWithIgnoreCase(line, "type")) return null;
+    const sep = std.mem.indexOf(u8, line, "::") orelse return null;
+    return extractIdent(std.mem.trimStart(u8, line[sep + 2 ..], " \t"));


Restrict Fortran type parsing to actual type definitions

parseFortranTypeName currently matches any type...:: form, so regular variable declarations such as type(MyType) :: value are emitted as .struct_def symbols named value. This misclassifies non-type symbols and significantly pollutes Fortran outlines in real projects where derived-type variable declarations are common.

Useful? React with 👍 / 👎.

github-actions · 2026-04-25T18:52:43Z

Benchmark Regression Report

Threshold: 10.00%

Tool	Base (ns)	Head (ns)	Delta	Status
`codedb_bundle`	525431	540603	+2.89%	OK
`codedb_changes`	62158	61725	-0.70%	OK
`codedb_deps`	10551	9928	-5.90%	OK
`codedb_edit`	6076	6214	+2.27%	OK
`codedb_find`	66685	65834	-1.28%	OK
`codedb_hot`	108835	106009	-2.60%	OK
`codedb_outline`	274075	266398	-2.80%	OK
`codedb_read`	92303	92881	+0.63%	OK
`codedb_search`	181414	181700	+0.16%	OK
`codedb_snapshot`	2688574	2717109	+1.06%	OK
`codedb_status`	114812	141071	+22.87%	FAIL
`codedb_symbol`	63486	66041	+4.02%	OK
`codedb_tree`	73406	76322	+3.97%	OK
`codedb_word`	76488	76559	+0.09%	OK

Parse outlines for common detected extensions

27b8d81

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

justrach merged commit 4e5864f into release/0.2.579 Apr 25, 2026
2 checks passed

This was referenced Apr 25, 2026

Sync main with release/0.2.579 (api.wiki.codes MCP, 1h MCP timeout, C parser, extension parsers, bench/snapshot perf) #315

Open

test: add golden coverage for extension parsers #323

Merged

justrach deleted the fix/common-extension-parsers branch April 27, 2026 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse: add lightweight outlines for common extensions#322

parse: add lightweight outlines for common extensions#322
justrach merged 1 commit intorelease/0.2.579from
fix/common-extension-parsers

justrach commented Apr 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Validation

Benchmark notes

Uh oh!

github-actions Bot commented Apr 25, 2026

Benchmark Regression Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justrach commented Apr 25, 2026 •

edited

Loading