Skip to content

redo textmate in yaml and remove monarch grammar#3225

Open
codeshaunted wants to merge 1 commit intocanaryfrom
avery/yaml-tm
Open

redo textmate in yaml and remove monarch grammar#3225
codeshaunted wants to merge 1 commit intocanaryfrom
avery/yaml-tm

Conversation

@codeshaunted
Copy link
Collaborator

@codeshaunted codeshaunted commented Mar 9, 2026

Completely reworks and simplifies the textmate grammar, converting it from the original JSON into a YAML source format that is built into JSON with a script.

Also removes the vestigial Monarch grammar and unifies prompt-fiddle and the VSCode extension onto the same grammar.

Summary by CodeRabbit

  • New Features

    • Added syntax highlighting support for Jinja template syntax in the editor.
  • Refactor

    • Reorganized BAML language syntax highlighting infrastructure for improved maintainability and consistency across editor environments.

@vercel
Copy link

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Mar 11, 2026 9:17pm
promptfiddle Ready Ready Preview, Comment Mar 11, 2026 9:17pm

Request Review

@codeshaunted codeshaunted changed the title Avery/yaml tm redo textmate in yaml and remove monarch grammar Mar 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This PR migrates BAML syntax grammar from an embedded Monaco Monarch grammar to external TextMate grammar files, introducing a YAML-based source with a build script that generates JSON grammars for both VSCode extension and web-based editor. The MonacoEditor is updated to reference the new grammar location, and a pre-commit hook ensures generated grammars remain synchronized.

Changes

Cohort / File(s) Summary
BAML Grammar Restructuring
typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json, typescript2/app-promptfiddle/syntaxes/baml.tmLanguage.json
Completely restructured BAML TextMate grammar with modular organization, expanded pattern sets for control flow, expressions, and declarations, replacing rigid Monarch rules with semantic token captures and nested pattern configurations.
Grammar Source & Build
typescript2/textmate-grammar/baml.tmLanguage.yaml, typescript2/textmate-grammar/build-grammar.ts
Introduces YAML-based grammar source with variable templating and a TypeScript build script that performs substitution and generates JSON outputs to both VSCode and Promptfiddle syntax directories.
Jinja Template Grammar
typescript2/textmate-grammar/jinja.tmLanguage.json, typescript2/app-promptfiddle/syntaxes/jinja.tmLanguage.json
New comprehensive Jinja2 TextMate grammar supporting raw blocks, comments, variables, tags, expressions, strings, and nested structures with detailed token captures.
Removed In-Code Grammars
typescript2/app-promptfiddle/src/playground/baml-monarch.ts, typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
Deletes embedded Monaco Monarch grammar definition and local TextMate grammar, consolidating to external grammar files.
MonacoEditor Path Update
typescript2/app-promptfiddle/src/playground/MonacoEditor.tsx
Updates grammar source path from local './baml.tmLanguage.json' to external '../../syntaxes/baml.tmLanguage.json'; includes minor formatting adjustments in disposal and Promise handlers.
Infrastructure Changes
.pre-commit-config.yaml, typescript2/package.json
Adds pre-commit hook to validate textmate-grammar build consistency and dev dependencies (js-yaml, tsx) for build script execution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • hellovai
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: converting TextMate grammar to YAML and removing the Monarch grammar, which are the primary objectives of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch avery/yaml-tm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeshaunted
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
typescript2/app-vscode-ext/package.json (1)

46-55: ⚠️ Potential issue | 🟠 Major

Grammars must be bundled into the extension package.

The manifest points contributes.grammars[*].path to ./node_modules/@b/pkg-textmate-grammar/syntaxes/*.tmLanguage.json, but no grammar files exist under app-vscode-ext and tsup.config.ts does not copy them into the dist/ directory. The extension will fail to load syntax highlighting when installed outside the monorepo context, since those workspace paths will not exist.

Copy the grammar files into the extension (e.g., dist/syntaxes/) as part of the build, update the manifest paths to reference them locally (e.g., ./syntaxes/baml.tmLanguage.json), and ensure they are included in the packaged extension via a files field in package.json.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1e28a2d-d52d-471c-98f1-75fbb57efc81

📥 Commits

Reviewing files that changed from the base of the PR and between 06910c5 and b722137.

⛔ Files ignored due to path filters (1)
  • typescript2/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • .vscode/settings.json
  • typescript2/.gitignore
  • typescript2/app-promptfiddle/package.json
  • typescript2/app-promptfiddle/src/playground/MonacoEditor.tsx
  • typescript2/app-promptfiddle/src/playground/baml-monarch.ts
  • typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
  • typescript2/app-vscode-ext/package.json
  • typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json
  • typescript2/pkg-textmate-grammar/package.json
  • typescript2/pkg-textmate-grammar/scripts/build-grammar.ts
  • typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml
  • typescript2/pkg-textmate-grammar/syntaxes/jinja.tmLanguage.json
  • typescript2/turbo.json
💤 Files with no reviewable changes (3)
  • typescript2/app-promptfiddle/src/playground/baml-monarch.ts
  • typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
  • typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json

Comment on lines +61 to +63
"yaml.schemas": {
"https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point the schema association at the new workspace location, not a machine-local path.

This setting is checked in with /home/ghost/... and still references typescript2/app-vscode-ext/syntaxes, but the grammar now lives at typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml:1. As written, the YAML schema association will fail for every other checkout and after the old path is removed.

🔧 Proposed fix
   "yaml.schemas": {
-    "https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml"
+    "https://www.schemastore.org/tmlanguage.json": "typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml"
   },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"yaml.schemas": {
"https://www.schemastore.org/tmlanguage.json": "file:///home/ghost/Documents/baml/typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.yaml"
},
"yaml.schemas": {
"https://www.schemastore.org/tmlanguage.json": "typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml"
},

import path from "path";
import yaml from "js-yaml";

const __dirname = path.dirname(new URL(import.meta.url).pathname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Node.js ESM docs: when deriving __dirname from import.meta.url, should fileURLToPath(import.meta.url) be used instead of new URL(import.meta.url).pathname, and what breaks if pathname is used directly?

💡 Result:

Yes—use fileURLToPath() (or import.meta.filename / import.meta.dirname on Node ≥ 20.11.0) rather than using new URL(import.meta.url).pathname as a filesystem path. In Node’s docs, url.fileURLToPath() is explicitly the conversion that produces a platform-correct path and decodes percent-encoding. It also shows multiple cases where pathname is “Incorrect.”[1][2]

What breaks if you use .pathname directly:

  • Windows drive-letter paths: you get an extra leading slash (/C:/...) instead of a Windows path (C:\...).[1]
  • Windows UNC/network shares: file://server/share/file becomes just "/file" (host is lost), instead of \\server\share\file.[1]
  • Percent-encoded characters stay encoded: spaces and non-ASCII characters remain %20, %E4..., etc., instead of real characters.[1]

Recommended patterns:

import { fileURLToPath } from 'node:url';
import path from 'node:path';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

(Or on Node ≥ 20.11.0: import.meta.filename / import.meta.dirname.)[2]

Sources:
[1] Node.js url.fileURLToPath() docs (shows pathname pitfalls + correct outputs) (nodejs.org)
[2] Node.js ESM docs (import.meta.filename == url.fileURLToPath(import.meta.url), and import.meta.dirname) (nodejs.org)

Citations:


Use fileURLToPath(import.meta.url) for __dirname.

new URL(import.meta.url).pathname creates cross-platform issues: on Windows, drive-letter paths get an extra leading slash, UNC network paths lose the host component, and percent-encoded characters stay encoded. Node.js docs recommend fileURLToPath(import.meta.url) for proper platform-aware conversion.

🛠️ Proposed fix
 import fs from "fs";
 import path from "path";
+import { fileURLToPath } from "node:url";
 import yaml from "js-yaml";
 
-const __dirname = path.dirname(new URL(import.meta.url).pathname);
+const __dirname = path.dirname(fileURLToPath(import.meta.url));

Comment on lines +8 to +9
const src = path.resolve(__dirname, "../syntaxes/baml.tmLanguage.yaml");
const dest = path.resolve(__dirname, "../syntaxes/baml.tmLanguage.json");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The build only guarantees baml.tmLanguage.json.

All package scripts run this generator, but it hard-codes a single BAML source/output pair even though @b/pkg-textmate-grammar also exports ./jinja.tmLanguage.json and the VS Code extension manifest consumes that asset. One exported runtime file is therefore outside the build and can go stale or disappear without the build failing.

Comment on lines +359 to +369
parameter:
begin: "(self)|({{identifier}})"
beginCaptures:
"1":
name: variable.language.self.baml
"2":
name: variable.parameter.baml
end: (?=[,)])
patterns:
- include: "#comment"
- include: "#type-expression"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml | sed -n '355,375p'

Repository: BoundaryML/baml

Length of output: 685


🏁 Script executed:

cat -n typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml | grep -A 20 "type-expression:" | head -30

Repository: BoundaryML/baml

Length of output: 889


🏁 Script executed:

# Search for how other similar patterns handle colons in this grammar
rg "match.*:|begin.*:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 2 -B 2 | head -50

Repository: BoundaryML/baml

Length of output: 1076


🏁 Script executed:

# Check if there are example BAML files in the repository to see parameter syntax
find . -name "*.baml" -type f | head -10

Repository: BoundaryML/baml

Length of output: 699


🏁 Script executed:

# Look for function definitions with parameters to understand the expected syntax
rg "function\s+\w+\(" -A 5 . --type baml 2>/dev/null | head -50

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Search for how parameters are used in test files
rg "parameter" typescript2/pkg-textmate-grammar/ --type yaml -B 3 -A 3 | grep -E "(function|template|parameter:)" -A 3

Repository: BoundaryML/baml

Length of output: 1737


🏁 Script executed:

# Look at example BAML files to see the actual syntax
cat baml_language/crates/baml_builtins/baml/llm.baml | head -100

Repository: BoundaryML/baml

Length of output: 3685


🏁 Script executed:

# Check another BAML file
cat baml_language/crates/baml_builtins2/baml_std/baml/containers.baml | head -80

Repository: BoundaryML/baml

Length of output: 974


🏁 Script executed:

# Check the type-namespaced pattern to see what it matches
rg "type-namespaced:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 10

Repository: BoundaryML/baml

Length of output: 281


🏁 Script executed:

# Check literal pattern to understand what it matches
rg "literal:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 15 | head -30

Repository: BoundaryML/baml

Length of output: 902


🏁 Script executed:

# Look at how the return-type handles the colon before type-expression
rg "return-type:" typescript2/pkg-textmate-grammar/syntaxes/baml.tmLanguage.yaml -A 10

Repository: BoundaryML/baml

Length of output: 302


🏁 Script executed:

# Verify the parameter rule matches actual BAML syntax by looking for function definitions
rg "function\s+\w+\(" baml_language/crates/baml_builtins/baml/llm.baml -A 2 | head -30

Repository: BoundaryML/baml

Length of output: 1234


Consume the parameter colon before matching the type.

BAML function and template_string parameters are written as name: Type (e.g., index: int, client_name: string). The parameter rule matches the identifier but does not consume the colon, leaving the cursor on : when #type-expression patterns are evaluated. Since none of those patterns start with :, the type annotation never receives syntax highlighting.

🛠️ Proposed fix
   parameter:
     begin: "(self)|({{identifier}})"
     beginCaptures:
       "1":
         name: variable.language.self.baml
       "2":
         name: variable.parameter.baml
     end: (?=[,)])
     patterns:
       - include: "#comment"
+      - match: ":"
       - include: "#type-expression"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7b420f9-8cf8-4e25-b77f-d42309b696bd

📥 Commits

Reviewing files that changed from the base of the PR and between b722137 and 234e065.

⛔ Files ignored due to path filters (1)
  • typescript2/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • .pre-commit-config.yaml
  • typescript2/app-promptfiddle/src/playground/MonacoEditor.tsx
  • typescript2/app-promptfiddle/src/playground/baml-monarch.ts
  • typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json
  • typescript2/app-promptfiddle/syntaxes/baml.tmLanguage.json
  • typescript2/app-promptfiddle/syntaxes/jinja.tmLanguage.json
  • typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json
  • typescript2/package.json
  • typescript2/textmate-grammar/baml.tmLanguage.yaml
  • typescript2/textmate-grammar/build-grammar.ts
  • typescript2/textmate-grammar/jinja.tmLanguage.json
💤 Files with no reviewable changes (2)
  • typescript2/app-promptfiddle/src/playground/baml-monarch.ts
  • typescript2/app-promptfiddle/src/playground/baml.tmLanguage.json

Comment on lines +104 to +110
- id: textmate-grammar
name: textmate grammar (yaml → json)
entry: bash -c 'cd typescript2 && pnpm exec tsx textmate-grammar/build-grammar.ts && git diff --exit-code app-vscode-ext/syntaxes/baml.tmLanguage.json app-vscode-ext/syntaxes/jinja.tmLanguage.json app-promptfiddle/syntaxes/baml.tmLanguage.json app-promptfiddle/syntaxes/jinja.tmLanguage.json'
language: system
pass_filenames: false
files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json)$
priority: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider including the build script in the trigger pattern.

The hook triggers on changes to baml.tmLanguage.yaml and jinja.tmLanguage.json, but not on changes to build-grammar.ts itself. If someone modifies the build script logic, the hook won't run, and the generated JSONs could become stale.

♻️ Proposed fix
-        files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json)$
+        files: ^typescript2/textmate-grammar/(baml\.tmLanguage\.yaml|jinja\.tmLanguage\.json|build-grammar\.ts)$

Comment on lines +524 to +531
"template-string-body": {
"begin": "(#+)\"",
"beginCaptures": {
"0": {
"name": "string.quoted.raw.baml"
}
},
"end": "\\1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consume the full raw-string closing delimiter.

Line 531 closes on \1, so any # inside a prompt/template body can terminate the string before the real closing "### delimiter. This should mirror the raw-string rule and include the closing quote as part of the end pattern. Since this file is generated, please apply the fix in the YAML source and regenerate the JSON.

Suggested fix
     "template-string-body": {
       "begin": "(#+)\"",
@@
-      "end": "\\1",
+      "end": "\"\\1",

Comment on lines +798 to +811
"for-in-clause": {
"begin": "(let\\s+)?([a-zA-Z_][a-zA-Z0-9_]*)\\s+(in)\\s+",
"beginCaptures": {
"1": {
"name": "keyword.declaration.baml"
},
"2": {
"name": "variable.other.baml"
},
"3": {
"name": "keyword.control.baml"
}
},
"end": "\\)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Loop rules in the TextMate source:"
rg -n -C3 'for-loop|for-in-clause|while-loop' typescript2/textmate-grammar/baml.tmLanguage.yaml

echo
echo "Loop syntax in checked-in BAML examples/tests:"
rg -n -C2 --glob '**/*.baml' '\b(for|while)\b'

Repository: BoundaryML/baml

Length of output: 50374


🏁 Script executed:

head -n 550 typescript2/textmate-grammar/baml.tmLanguage.yaml | tail -n 70

Repository: BoundaryML/baml

Length of output: 2231


🏁 Script executed:

cat typescript2/textmate-grammar/baml.tmLanguage.yaml | grep -A 10 "imperative-block:" | head -20

Repository: BoundaryML/baml

Length of output: 318


🏁 Script executed:

grep -B5 -A20 "C-style for" typescript2/textmate-grammar/baml.tmLanguage.yaml

Repository: BoundaryML/baml

Length of output: 1152


🏁 Script executed:

sed -n '798,820p' typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json

Repository: BoundaryML/baml

Length of output: 571


🏁 Script executed:

sed -n '779,830p' typescript2/app-vscode-ext/syntaxes/baml.tmLanguage.json

Repository: BoundaryML/baml

Length of output: 1165


🏁 Script executed:

rg -n 'for\s*\(' typescript2/app-vscode-ext/syntaxes/all.test.baml

Repository: BoundaryML/baml

Length of output: 148


🏁 Script executed:

find . -name "*.test.baml" -o -name "all.test.baml" 2>/dev/null | head -5

Repository: BoundaryML/baml

Length of output: 110


🏁 Script executed:

sed -n '60,75p' typescript/apps/vscode-ext/syntaxes/all.test.baml

Repository: BoundaryML/baml

Length of output: 331


The for loop header parentheses are indeed mismatched in the grammar rules.

The for-loop rule matches the keyword, and for-in-clause expects to match starting with an optional let or an identifier—but neither rule consumes the opening ( that appears after for in all valid BAML syntax (for (item in [...])). However, for-in-clause requires an end match on ), creating an asymmetry.

Because of this, the opening parenthesis will not be captured or scoped by the grammar, leading to incorrect syntax highlighting. The grammar source (YAML) has the same issue, indicating the problem exists in the original rule design rather than the JSON generation.

patterns:
- include: "#for-in-clause"
- include: "#imperative-block"
- include: "#statement" # Match any statement to allow for C-style for loops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix YAMLlint comment spacing issues.

Lines 489 and 513 need two spaces before the trailing # comments per YAML style conventions. Also, remove the extra blank line at the end of the file (line 772).

🛠️ Proposed fix for line 489
-      - include: "#statement" # Match any statement to allow for C-style for loops
+      - include: "#statement"  # Match any statement to allow for C-style for loops

Also applies to: 513-513, 772-772

🧰 Tools
🪛 YAMLlint (1.38.0)

[warning] 489-489: too few spaces before comment: expected 2

(comments)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant