redo textmate in yaml and remove monarch grammar#3225
redo textmate in yaml and remove monarch grammar#3225codeshaunted wants to merge 1 commit intocanaryfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR migrates BAML syntax grammar from an embedded Monaco Monarch grammar to external TextMate grammar files, introducing a YAML-based source with a build script that generates JSON grammars for both VSCode extension and web-based editor. The MonacoEditor is updated to reference the new grammar location, and a pre-commit hook ensures generated grammars remain synchronized. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
typescript2/app-vscode-ext/package.json (1)
46-55:⚠️ Potential issue | 🟠 MajorGrammars must be bundled into the extension package.
The manifest points
contributes.grammars[*].pathto./node_modules/@b/pkg-textmate-grammar/syntaxes/*.tmLanguage.json, but no grammar files exist underapp-vscode-extandtsup.config.tsdoes not copy them into thedist/directory. The extension will fail to load syntax highlighting when installed outside the monorepo context, since those workspace paths will not exist.Copy the grammar files into the extension (e.g.,
dist/syntaxes/) as part of the build, update the manifest paths to reference them locally (e.g.,./syntaxes/baml.tmLanguage.json), and ensure they are included in the packaged extension via afilesfield inpackage.json.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e1e28a2d-d52d-471c-98f1-75fbb57efc81
⛔ Files ignored due to path filters (1)
typescript2/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (13)
.vscode/settings.jsontypescript2/.gitignoretypescript2/app-promptfiddle/package.jsontypescript2/app-promptfiddle/src/playground/MonacoEditor.tsxtypescript2/app-promptfiddle/src/playground/baml-monarch.tstypescript2/app-promptfiddle/src/playground/baml.tmLanguage.jsontypescript2/app-vscode-ext/package.jsontypescript2/app-vscode-ext/syntaxes/baml.tmLanguage.jsontypescript2/pkg-textmate-grammar/package.jsontypescript2/pkg-textmate-grammar/scripts/build-grammar.tstypescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yamltypescript2/pkg-textmate-grammar/syntaxes/jinja.tmLanguage.jsontypescript2/turbo.json
💤 Files with no reviewable changes (3)
- typescript2/app-promptfiddle/src/playground/baml-monarch.ts
- typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
- typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json
.vscode/settings.json
Outdated
| "yaml.schemas": { | ||
| "https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml" | ||
| }, |
There was a problem hiding this comment.
Point the schema association at the new workspace location, not a machine-local path.
This setting is checked in with /home/ghost/... and still references typescript2/app-vscode-ext/syntaxes, but the grammar now lives at typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml:1. As written, the YAML schema association will fail for every other checkout and after the old path is removed.
🔧 Proposed fix
"yaml.schemas": {
- "https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml"
+ "https://www.schemastore.org/tmlanguage.json": "typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml"
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "yaml.schemas": { | |
| "https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml" | |
| }, | |
| "yaml.schemas": { | |
| "https://www.schemastore.org/tmlanguage.json": "typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml" | |
| }, |
| import path from "path"; | ||
| import yaml from "js-yaml"; | ||
|
|
||
| const __dirname = path.dirname(new URL(import.meta.url).pathname); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Node.js ESM docs: when deriving __dirname from import.meta.url, should fileURLToPath(import.meta.url) be used instead of new URL(import.meta.url).pathname, and what breaks if pathname is used directly?
💡 Result:
Yes—use fileURLToPath() (or import.meta.filename / import.meta.dirname on Node ≥ 20.11.0) rather than using new URL(import.meta.url).pathname as a filesystem path. In Node’s docs, url.fileURLToPath() is explicitly the conversion that produces a platform-correct path and decodes percent-encoding. It also shows multiple cases where pathname is “Incorrect.”[1][2]
What breaks if you use .pathname directly:
- Windows drive-letter paths: you get an extra leading slash (
/C:/...) instead of a Windows path (C:\...).[1] - Windows UNC/network shares:
file://server/share/filebecomes just"/file"(host is lost), instead of\\server\share\file.[1] - Percent-encoded characters stay encoded: spaces and non-ASCII characters remain
%20,%E4..., etc., instead of real characters.[1]
Recommended patterns:
import { fileURLToPath } from 'node:url';
import path from 'node:path';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);(Or on Node ≥ 20.11.0: import.meta.filename / import.meta.dirname.)[2]
Sources:
[1] Node.js url.fileURLToPath() docs (shows pathname pitfalls + correct outputs) (nodejs.org)
[2] Node.js ESM docs (import.meta.filename == url.fileURLToPath(import.meta.url), and import.meta.dirname) (nodejs.org)
Citations:
- 1: https://nodejs.org/docs/latest-v20.x/api/url.html
- 2: https://nodejs.org/download/release/v20.18.0/docs/api/esm.html
Use fileURLToPath(import.meta.url) for __dirname.
new URL(import.meta.url).pathname creates cross-platform issues: on Windows, drive-letter paths get an extra leading slash, UNC network paths lose the host component, and percent-encoded characters stay encoded. Node.js docs recommend fileURLToPath(import.meta.url) for proper platform-aware conversion.
🛠️ Proposed fix
import fs from "fs";
import path from "path";
+import { fileURLToPath } from "node:url";
import yaml from "js-yaml";
-const __dirname = path.dirname(new URL(import.meta.url).pathname);
+const __dirname = path.dirname(fileURLToPath(import.meta.url));| const src = path.resolve(__dirname, "../syntaxes/baml.tmLanguage.yaml"); | ||
| const dest = path.resolve(__dirname, "../syntaxes/baml.tmLanguage.json"); |
There was a problem hiding this comment.
The build only guarantees baml.tmLanguage.json.
All package scripts run this generator, but it hard-codes a single BAML source/output pair even though @b/pkg-textmate-grammar also exports ./jinja.tmLanguage.json and the VS Code extension manifest consumes that asset. One exported runtime file is therefore outside the build and can go stale or disappear without the build failing.
| parameter: | ||
| begin: "(self)|({{identifier}})" | ||
| beginCaptures: | ||
| "1": | ||
| name: variable.language.self.baml | ||
| "2": | ||
| name: variable.parameter.baml | ||
| end: (?=[,)]) | ||
| patterns: | ||
| - include: "#comment" | ||
| - include: "#type-expression" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml | sed -n '355,375p'Repository: BoundaryML/baml
Length of output: 685
🏁 Script executed:
cat -n typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml | grep -A 20 "type-expression:" | head -30Repository: BoundaryML/baml
Length of output: 889
🏁 Script executed:
# Search for how other similar patterns handle colons in this grammar
rg "match.*:|begin.*:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 2 -B 2 | head -50Repository: BoundaryML/baml
Length of output: 1076
🏁 Script executed:
# Check if there are example BAML files in the repository to see parameter syntax
find . -name "*.baml" -type f | head -10Repository: BoundaryML/baml
Length of output: 699
🏁 Script executed:
# Look for function definitions with parameters to understand the expected syntax
rg "function\s+\w+\(" -A 5 . --type baml 2>/dev/null | head -50Repository: BoundaryML/baml
Length of output: 41
🏁 Script executed:
# Search for how parameters are used in test files
rg "parameter" typescript2/pkg-textmate-grammar/ --type yaml -B 3 -A 3 | grep -E "(function|template|parameter:)" -A 3Repository: BoundaryML/baml
Length of output: 1737
🏁 Script executed:
# Look at example BAML files to see the actual syntax
cat baml_language/crates/baml_builtins/baml/llm.baml | head -100Repository: BoundaryML/baml
Length of output: 3685
🏁 Script executed:
# Check another BAML file
cat baml_language/crates/baml_builtins2/baml_std/baml/containers.baml | head -80Repository: BoundaryML/baml
Length of output: 974
🏁 Script executed:
# Check the type-namespaced pattern to see what it matches
rg "type-namespaced:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 10Repository: BoundaryML/baml
Length of output: 281
🏁 Script executed:
# Check literal pattern to understand what it matches
rg "literal:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 15 | head -30Repository: BoundaryML/baml
Length of output: 902
🏁 Script executed:
# Look at how the return-type handles the colon before type-expression
rg "return-type:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 10Repository: BoundaryML/baml
Length of output: 302
🏁 Script executed:
# Verify the parameter rule matches actual BAML syntax by looking for function definitions
rg "function\s+\w+\(" baml_language/crates/baml_builtins/baml/llm.baml -A 2 | head -30Repository: BoundaryML/baml
Length of output: 1234
Consume the parameter colon before matching the type.
BAML function and template_string parameters are written as name: Type (e.g., index: int, client_name: string). The parameter rule matches the identifier but does not consume the colon, leaving the cursor on : when #type-expression patterns are evaluated. Since none of those patterns start with :, the type annotation never receives syntax highlighting.
🛠️ Proposed fix
parameter:
begin: "(self)|({{identifier}})"
beginCaptures:
"1":
name: variable.language.self.baml
"2":
name: variable.parameter.baml
end: (?=[,)])
patterns:
- include: "#comment"
+ - match: ":"
- include: "#type-expression"b722137 to
234e065
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a7b420f9-8cf8-4e25-b77f-d42309b696bd
⛔ Files ignored due to path filters (1)
typescript2/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (11)
.pre-commit-config.yamltypescript2/app-promptfiddle/src/playground/MonacoEditor.tsxtypescript2/app-promptfiddle/src/playground/baml-monarch.tstypescript2/app-promptfiddle/src/playground/baml.tmLanguage.jsontypescript2/app-promptfiddle/syntaxes/baml.tmLanguage.jsontypescript2/app-promptfiddle/syntaxes/jinja.tmLanguage.jsontypescript2/app-vscode-ext/syntaxes/baml.tmLanguage.jsontypescript2/package.jsontypescript2/textmate-grammar/baml.tmLanguage.yamltypescript2/textmate-grammar/build-grammar.tstypescript2/textmate-grammar/jinja.tmLanguage.json
💤 Files with no reviewable changes (2)
- typescript2/app-promptfiddle/src/playground/baml-monarch.ts
- typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
| - id: textmate-grammar | ||
| name: textmate grammar (yaml → json) | ||
| entry: bash -c 'cd typescript2 && pnpm exec tsx textmate-grammar/build-grammar.ts && git diff --exit-code app-vscode-ext/syntaxes/baml.tmLanguage.json app-vscode-ext/syntaxes/jinja.tmLanguage.json app-promptfiddle/syntaxes/baml.tmLanguage.json app-promptfiddle/syntaxes/jinja.tmLanguage.json' | ||
| language: system | ||
| pass_filenames: false | ||
| files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json)$ | ||
| priority: 1 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider including the build script in the trigger pattern.
The hook triggers on changes to baml.tmLanguage.yaml and jinja.tmLanguage.json, but not on changes to build-grammar.ts itself. If someone modifies the build script logic, the hook won't run, and the generated JSONs could become stale.
♻️ Proposed fix
- files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json)$
+ files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json|build-grammar\.ts)$| "template-string-body": { | ||
| "begin": "(#+)\"", | ||
| "beginCaptures": { | ||
| "0": { | ||
| "name": "string.quoted.raw.baml" | ||
| } | ||
| }, | ||
| "end": "\\1", |
There was a problem hiding this comment.
Consume the full raw-string closing delimiter.
Line 531 closes on \1, so any # inside a prompt/template body can terminate the string before the real closing "### delimiter. This should mirror the raw-string rule and include the closing quote as part of the end pattern. Since this file is generated, please apply the fix in the YAML source and regenerate the JSON.
Suggested fix
"template-string-body": {
"begin": "(#+)\"",
@@
- "end": "\\1",
+ "end": "\"\\1",| "for-in-clause": { | ||
| "begin": "(let\\s+)?([a-zA-Z_][a-zA-Z0-9_]*)\\s+(in)\\s+", | ||
| "beginCaptures": { | ||
| "1": { | ||
| "name": "keyword.declaration.baml" | ||
| }, | ||
| "2": { | ||
| "name": "variable.other.baml" | ||
| }, | ||
| "3": { | ||
| "name": "keyword.control.baml" | ||
| } | ||
| }, | ||
| "end": "\\)", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Loop rules in the TextMate source:"
rg -n -C3 'for-loop|for-in-clause|while-loop' typescript2/textmate-grammar/baml.tmLanguage.yaml
echo
echo "Loop syntax in checked-in BAML examples/tests:"
rg -n -C2 --glob '**/*.baml' '\b(for|while)\b'Repository: BoundaryML/baml
Length of output: 50374
🏁 Script executed:
head -n 550 typescript2/textmate-grammar/baml.tmLanguage.yaml | tail -n 70Repository: BoundaryML/baml
Length of output: 2231
🏁 Script executed:
cat typescript2/textmate-grammar/baml.tmLanguage.yaml | grep -A 10 "imperative-block:" | head -20Repository: BoundaryML/baml
Length of output: 318
🏁 Script executed:
grep -B5 -A20 "C-style for" typescript2/textmate-grammar/baml.tmLanguage.yamlRepository: BoundaryML/baml
Length of output: 1152
🏁 Script executed:
sed -n '798,820p' typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.jsonRepository: BoundaryML/baml
Length of output: 571
🏁 Script executed:
sed -n '779,830p' typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.jsonRepository: BoundaryML/baml
Length of output: 1165
🏁 Script executed:
rg -n 'for\s*\(' typescript2/app-vscode-ext/syntaxes/all.test.bamlRepository: BoundaryML/baml
Length of output: 148
🏁 Script executed:
find . -name "*.test.baml" -o -name "all.test.baml" 2>/dev/null | head -5Repository: BoundaryML/baml
Length of output: 110
🏁 Script executed:
sed -n '60,75p' typescript/apps/vscode-ext/syntaxes/all.test.bamlRepository: BoundaryML/baml
Length of output: 331
The for loop header parentheses are indeed mismatched in the grammar rules.
The for-loop rule matches the keyword, and for-in-clause expects to match starting with an optional let or an identifier—but neither rule consumes the opening ( that appears after for in all valid BAML syntax (for (item in [...])). However, for-in-clause requires an end match on ), creating an asymmetry.
Because of this, the opening parenthesis will not be captured or scoped by the grammar, leading to incorrect syntax highlighting. The grammar source (YAML) has the same issue, indicating the problem exists in the original rule design rather than the JSON generation.
| patterns: | ||
| - include: "#for-in-clause" | ||
| - include: "#imperative-block" | ||
| - include: "#statement" # Match any statement to allow for C-style for loops |
There was a problem hiding this comment.
Fix YAMLlint comment spacing issues.
Lines 489 and 513 need two spaces before the trailing # comments per YAML style conventions. Also, remove the extra blank line at the end of the file (line 772).
🛠️ Proposed fix for line 489
- - include: "#statement" # Match any statement to allow for C-style for loops
+ - include: "#statement" # Match any statement to allow for C-style for loopsAlso applies to: 513-513, 772-772
🧰 Tools
🪛 YAMLlint (1.38.0)
[warning] 489-489: too few spaces before comment: expected 2
(comments)
Completely reworks and simplifies the textmate grammar, converting it from the original JSON into a YAML source format that is built into JSON with a script.
Also removes the vestigial Monarch grammar and unifies prompt-fiddle and the VSCode extension onto the same grammar.
Summary by CodeRabbit
New Features
Refactor