|
| 1 | +# Parser Test Failures - Root Cause Analysis |
| 2 | + |
| 3 | +## Failed Parsers (9 total) |
| 4 | + |
| 5 | +The following parsers failed to build when removed: |
| 6 | + |
| 7 | +1. **tree-sitter-ada** - FAILED |
| 8 | +2. **tree-sitter-c** - FAILED |
| 9 | +3. **tree-sitter-elm** - FAILED (but later succeeded with 0.23 MB) |
| 10 | +4. **tree-sitter-make** - FAILED |
| 11 | +5. **tree-sitter-ocaml** - FAILED |
| 12 | + |
| 13 | +## Root Causes |
| 14 | + |
| 15 | +### 1. Cross-Parser Dependencies (Highlight Queries) |
| 16 | + |
| 17 | +**tree-sitter-c** - Failed because C++ depends on it: |
| 18 | +```rust |
| 19 | +// In CPlusPlus case: |
| 20 | +let mut highlight_query = tree_sitter_c::HIGHLIGHT_QUERY.to_owned(); |
| 21 | +highlight_query.push_str(tree_sitter_cpp::HIGHLIGHT_QUERY); |
| 22 | +``` |
| 23 | +C++ extends the C grammar, so it imports C's highlighting queries. Removing tree-sitter-c breaks the C++ parser compilation. |
| 24 | + |
| 25 | +**tree-sitter-javascript** - Would fail if removed because: |
| 26 | +- TypeScript depends on it (both TypeScript and TypeScriptTsx) |
| 27 | +- QML depends on it |
| 28 | +```rust |
| 29 | +// In TypeScript case: |
| 30 | +let mut highlight_query = tree_sitter_javascript::HIGHLIGHT_QUERY.to_owned(); |
| 31 | +highlight_query.push_str(tree_sitter_typescript::HIGHLIGHTS_QUERY); |
| 32 | + |
| 33 | +// In Qml case: |
| 34 | +let mut highlight_query = tree_sitter_javascript::HIGHLIGHT_QUERY.to_owned(); |
| 35 | +highlight_query.push_str(tree_sitter_typescript::HIGHLIGHTS_QUERY); |
| 36 | +highlight_query.push_str(tree_sitter_qmljs::HIGHLIGHTS_QUERY); |
| 37 | +``` |
| 38 | + |
| 39 | +### 2. Sub-Language Dependencies |
| 40 | + |
| 41 | +**tree-sitter-make** - Failed because Make parser has embedded Bash: |
| 42 | +```rust |
| 43 | +sub_languages: vec![TreeSitterSubLanguage { |
| 44 | + query: ts::Query::new(&language, "(shell_function (shell_command) @contents)") |
| 45 | + .unwrap(), |
| 46 | + parse_as: Bash, // ← Requires Bash parser to exist |
| 47 | +}], |
| 48 | +``` |
| 49 | + |
| 50 | +**tree-sitter-html** - Would fail if CSS or JavaScript were removed: |
| 51 | +```rust |
| 52 | +sub_languages: vec![ |
| 53 | + TreeSitterSubLanguage { |
| 54 | + query: ts::Query::new(&language, "(style_element (raw_text) @contents)").unwrap(), |
| 55 | + parse_as: Css, // ← Requires CSS parser |
| 56 | + }, |
| 57 | + TreeSitterSubLanguage { |
| 58 | + query: ts::Query::new(&language, "(script_element (raw_text) @contents)").unwrap(), |
| 59 | + parse_as: JavaScript, // ← Requires JavaScript parser |
| 60 | + }, |
| 61 | +], |
| 62 | +``` |
| 63 | + |
| 64 | +### 3. Multi-Variant Languages |
| 65 | + |
| 66 | +**tree-sitter-ocaml** - Failed because one crate provides two language variants: |
| 67 | +```rust |
| 68 | +OCaml => { |
| 69 | + let language_fn = tree_sitter_ocaml::LANGUAGE_OCAML; |
| 70 | + // ... |
| 71 | +} |
| 72 | +OCamlInterface => { |
| 73 | + let language_fn = tree_sitter_ocaml::LANGUAGE_OCAML_INTERFACE; |
| 74 | + // ... |
| 75 | +} |
| 76 | +``` |
| 77 | +Both OCaml and OCamlInterface come from the same `tree-sitter-ocaml` crate. The testing script only stubbed one variant, causing the other to fail compilation. |
| 78 | + |
| 79 | +### 4. Vendored/Build Issues |
| 80 | + |
| 81 | +**tree-sitter-ada** - Likely failed due to build.rs or vendored parser issues. Ada may have special compilation requirements or dependencies that weren't properly handled by the simple stub approach. |
| 82 | + |
| 83 | +## Dependency Graph |
| 84 | + |
| 85 | +``` |
| 86 | +tree-sitter-c |
| 87 | + └─→ tree-sitter-cpp (uses C's HIGHLIGHT_QUERY) |
| 88 | +
|
| 89 | +tree-sitter-javascript |
| 90 | + ├─→ tree-sitter-typescript (uses JS's HIGHLIGHT_QUERY) |
| 91 | + ├─→ tree-sitter-qmljs (uses JS's HIGHLIGHT_QUERY) |
| 92 | + └─→ tree-sitter-html (sub-language for <script> tags) |
| 93 | +
|
| 94 | +tree-sitter-bash |
| 95 | + └─→ tree-sitter-make (sub-language for shell commands) |
| 96 | +
|
| 97 | +tree-sitter-css |
| 98 | + └─→ tree-sitter-html (sub-language for <style> tags) |
| 99 | +
|
| 100 | +tree-sitter-ocaml (single crate) |
| 101 | + ├─→ OCaml language variant |
| 102 | + └─→ OCamlInterface language variant |
| 103 | +``` |
| 104 | + |
| 105 | +## Impact on Analysis |
| 106 | + |
| 107 | +### Parsers We Couldn't Measure Individually |
| 108 | + |
| 109 | +These parsers can't be removed independently without breaking other parsers: |
| 110 | + |
| 111 | +- **C**: Required by C++ |
| 112 | +- **JavaScript**: Required by TypeScript, QML, HTML |
| 113 | +- **Bash**: Required by Make |
| 114 | +- **CSS**: Required by HTML |
| 115 | + |
| 116 | +### What This Means for Size Reduction |
| 117 | + |
| 118 | +The cross-dependencies create "parser bundles" that must be kept together: |
| 119 | + |
| 120 | +1. **C/C++ bundle**: Can't remove C without breaking C++ |
| 121 | +2. **Web bundle**: Can't remove JavaScript without breaking TypeScript, QML, and HTML |
| 122 | +3. **Systems bundle**: Bash + Make are linked |
| 123 | + |
| 124 | +This is actually **useful information** for the feature flag design - these should be grouped together in feature tiers since they depend on each other anyway. |
| 125 | + |
| 126 | +## Recommended Feature Grouping |
| 127 | + |
| 128 | +Based on dependencies: |
| 129 | + |
| 130 | +```toml |
| 131 | +[features] |
| 132 | +# Web development (must stay together) |
| 133 | +web = ["javascript", "typescript", "html", "css"] |
| 134 | +javascript = ["dep:tree-sitter-javascript"] |
| 135 | +typescript = ["dep:tree-sitter-typescript", "javascript"] # depends on JS |
| 136 | +html = ["dep:tree-sitter-html", "javascript", "css"] # depends on both |
| 137 | +css = ["dep:tree-sitter-css"] |
| 138 | + |
| 139 | +# Systems programming (must stay together) |
| 140 | +systems-c = ["c", "cpp"] |
| 141 | +c = ["dep:tree-sitter-c"] |
| 142 | +cpp = ["dep:tree-sitter-cpp", "c"] # depends on C |
| 143 | + |
| 144 | +# Build tools (must stay together) |
| 145 | +build-tools = ["bash", "make"] |
| 146 | +bash = ["dep:tree-sitter-bash"] |
| 147 | +make = ["dep:tree-sitter-make", "bash"] # depends on Bash |
| 148 | +``` |
| 149 | + |
| 150 | +## Conclusion |
| 151 | + |
| 152 | +The 9 failed parsers weren't due to bugs in the testing approach, but rather **fundamental cross-dependencies** in the parser architecture. This is valuable information that should inform the feature flag design to ensure dependent parsers are always included together. |
0 commit comments