ossuminc
diff --git a/‎.github/workflows/scala.yml‎
Lines changed: 13 additions & 2 deletions b/‎.github/workflows/scala.yml‎
Lines changed: 13 additions & 2 deletions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 21 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎NOTEBOOK.md‎
Lines changed: 117 additions & 1 deletion b/‎NOTEBOOK.md‎
Lines changed: 117 additions & 1 deletion
diff --git a/‎commands/input/rbbq.riddl‎
Lines changed: 1 addition & 1 deletion b/‎commands/input/rbbq.riddl‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎language/input/domains/rbbq.riddl‎
Lines changed: 1 addition & 1 deletion b/‎language/input/domains/rbbq.riddl‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎language/jvm/src/test/python/ebnf_preprocessor.py‎
Lines changed: 166 additions & 0 deletions b/‎language/jvm/src/test/python/ebnf_preprocessor.py‎
Lines changed: 166 additions & 0 deletions
@@ -176,10 +176,21 @@ jobs:
       run: |
         pip install -r language/jvm/src/test/python/requirements.txt
 
-    - name: Validate EBNF Grammar
+    - name: Validate EBNF Grammar with TatSu (internal test files)
       run: |
         cd language/jvm/src/test/python
-        python ebnf_validator.py --verbose
+        python ebnf_tatsu_validator.py --verbose
+
+    - name: Checkout riddl-examples
+      uses: actions/checkout@v4
+      with:
+        repository: ossuminc/riddl-examples
+        path: riddl-examples
+
+    - name: Validate EBNF Grammar against riddl-examples
+      run: |
+        cd language/jvm/src/test/python
+        python validate_external_riddl.py --repo ${{ github.workspace }}/riddl-examples
 
   dependency-check:
     timeout-minutes: 15
 
@@ -28,3 +28,5 @@ target
 *.bak
 language/input/everything.bast
 .claude/
+language/jvm/src/test/python/.venv/
+__pycache__/
@@ -247,6 +247,27 @@ sed -i 's/scala-3.3.7/scala-3.9.0/g' .github/workflows/*.yml
 
 ## Testing Patterns
 
+### Parser/EBNF Synchronization Requirement
+
+**Any change to the fastparse parser MUST have a corresponding change to the EBNF grammar.**
+
+The EBNF grammar at `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf`
+is the canonical specification of RIDDL syntax. It is validated by a TatSu-based parser
+that runs in CI on all `**/input/**/*.riddl` test files.
+
+When modifying the fastparse parser:
+1. Update the corresponding rule(s) in `ebnf-grammar.ebnf`
+2. Run the EBNF validator locally:
+   ```bash
+   cd language/jvm/src/test/python
+   pip install -r requirements.txt  # first time only
+   python ebnf_tatsu_validator.py
+   ```
+3. Ensure both parsers accept the same inputs
+4. CI will fail if the EBNF parser cannot parse test files that fastparse accepts
+
+This ensures the documented grammar stays in sync with the actual implementation.
+
 ### Compilation After Every Change
 When implementing new code:
 1. Write the code
 
@@ -6,11 +6,26 @@ This is the central engineering notebook for the RIDDL project. It tracks curren
 
 ## Current Status
 
-**Last Updated**: January 31, 2026
+**Last Updated**: February 1, 2026
 
 **Release 1.2.0 Published**: Scala 3.7.4 upgrade complete. All tests pass. Published
 to GitHub Packages.
 
+**In Progress**: TatSu-based EBNF validation (branch: `feature/tatsu-ebnf-validation`).
+Fixing EBNF grammar drift from fastparse implementation.
+
+**EBNF Validation Complete** (Feb 1, 2026):
+- Internal test files: 59/77 passed, 13 include fragments skipped, 5 expected
+  failures (all have bugs that fastparse also rejects)
+- External test files (riddl-examples): 8/9 passed, 1 expected failure (Trello
+  needs AI regeneration)
+- Key fixes made: comment regex tokens (avoid whitespace-skipping issues),
+  statement rule (added morph/become), pseudo_code_contents rule, interactions
+  rule (added comment support)
+- CI updated to validate against riddl-examples repository
+- Tasks #7-10 created for fastparse fixes to match EBNF (hex escapes, cardinality
+  mutual exclusivity, metadata with-block requirement)
+
 The RIDDL project is a mature compiler and toolchain for the Reactive Interface
 to Domain Definition Language. BAST serialization is **complete** (60 tests,
 6-10x speedup). Hugo and diagrams modules moved to another repository.
@@ -35,6 +50,14 @@ AI-friendly validation pass for MCP server integration. See design section below
 
 ---
 
+## Blocked Tasks
+
+| Task | Blocked By | Notes |
+|------|------------|-------|
+| Add EBNF validation for riddl-models repository | riddl-models needs to be populated with RIDDL models | Similar to riddl-examples validation in CI; will validate against external repository once content exists |
+
+---
+
 ## Scheduled Tasks
 
 | Date | Task | Notes |
@@ -116,6 +139,99 @@ The `pseudoCodeBlock` parser now allows comments before and/or after `???`:
 
 ## Session Log
 
+### February 1, 2026 (Cardinality Fix)
+
+**Focus**: Fix cardinality prefix/suffix mutual exclusivity
+
+**Branch**: `feature/parsing-fixes`
+
+**Work Completed**:
+1. ✅ **Updated EBNF grammar** to allow `many optional` as valid prefix combination
+   - `type_expression` and `field_type_expression` now accept `("many" ["optional"] | "optional")`
+2. ✅ **Updated TypeParser.scala** to enforce mutual exclusivity
+   - Allows prefix only: `many` (=+), `optional` (=?), `many optional` (=*)
+   - Allows suffix only: `?`, `+`, `*`
+   - Rejects prefix AND suffix together with clear error message
+3. ✅ **Restored rbbq.riddl** to use `many optional RewardEvent` syntax
+   - Demonstrates valid cardinality prefix usage
+   - Fixes TokenParserTest expected offsets
+
+**Test Results**: All 715 tests pass across all modules
+
+**Task #10 Verification** (metadata with-block requirement):
+- Confirmed fastparse already correctly enforces `with { }` wrapper for metadata
+- `} briefly "..."` (after close, no with) → Rejected ✅
+- `{ briefly "..." }` (inside body) → Rejected ✅
+- `} with { briefly "..." }` (correct syntax) → Accepted ✅
+- No code changes needed - task was already satisfied
+
+**Files Modified**:
+- `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf`
+- `language/shared/src/main/scala/com/ossuminc/riddl/language/parsing/TypeParser.scala`
+- `language/input/rbbq.riddl`
+
+---
+
+### February 1, 2026 (TatSu EBNF Validation - In Progress)
+
+**Focus**: Implement automated EBNF grammar validation in CI using TatSu
+
+**Branch**: `feature/tatsu-ebnf-validation`
+
+**Context**: The EBNF grammar at `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf` documents RIDDL syntax but can drift from the actual fastparse implementation. This work adds CI validation to catch drift.
+
+**Work Completed**:
+1. ✅ **Created TatSu-based validator framework**
+   - `language/jvm/src/test/python/ebnf_preprocessor.py` - Converts EBNF to TatSu format
+   - `language/jvm/src/test/python/ebnf_tatsu_validator.py` - Validates RIDDL files
+   - Updated `requirements.txt` with TatSu>=5.12.0
+
+2. ✅ **Updated CI workflow** (`.github/workflows/scala.yml`)
+   - Changed from Lark-based to TatSu-based validation
+
+3. ✅ **Updated CLAUDE.md**
+   - Added "Parser/EBNF Synchronization Requirement" section
+
+4. ✅ **Fixed EBNF Issue #1: `???` placeholder ordering**
+   - Problem: PEG parsers try alternatives in order; closures matching zero items shadow `???`
+   - Fixed `enumerators` (line 100): `{enumerator [","]} | "???"` → `"???" | {enumerator [","]}`
+   - Fixed `aggregate_definitions` (line 115): same pattern
+
+5. 🚧 **Discovered EBNF Issue #2: `simple_identifier` consuming whitespace**
+   - TatSu skips whitespace between tokens, even inside closures
+   - `simple_identifier = letter { letter | digit | "_" | "-" }` causes "A from context Two" to parse as single identifier
+   - **Proposed fix**: `simple_identifier = /[a-zA-Z][a-zA-Z0-9_-]*/` (regex pattern)
+
+**Current Validation Results**:
+- 14/77 files pass
+- 13 skipped (include fragments)
+- 2 expected failures
+- 48 unexpected failures (EBNF/parser drift to fix)
+
+**Key Learning**: TatSu uses PEG semantics where:
+- Alternatives are tried in order (put specific literals before general patterns)
+- Whitespace is skipped between token matches (lexical rules need regex patterns)
+- Closures matching zero items "succeed" and don't try next alternative
+
+**Files Created**:
+- `language/jvm/src/test/python/ebnf_preprocessor.py`
+- `language/jvm/src/test/python/ebnf_tatsu_validator.py`
+
+**Files Modified**:
+- `language/jvm/src/test/python/requirements.txt`
+- `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf` (2 fixes)
+- `.github/workflows/scala.yml`
+- `.gitignore` (added .venv)
+- `CLAUDE.md`
+
+**Next Steps**:
+1. Fix `simple_identifier` to use regex pattern (prevents whitespace consumption)
+2. Fix `quoted_identifier` similarly
+3. Review and fix other lexical rules (zone, option_name, etc.)
+4. Work through remaining 48 failures systematically
+
+---
+
 ### January 31, 2026 (Scala 3.7.4 Compiler Bug - RESOLVED)
 
 **Focus**: Fix Scala 3.7.4 compiler infinite loop caused by opaque type Contents
 
@@ -80,7 +80,7 @@ domain ReactiveBBQ is {
       record fields is {
         id is ReactiveBBQ.CustomerId,
         points is Number,
-        rewardEvents is many optional Loyalty.RewardEvent
+        rewardEvents is Loyalty.RewardEvent*
       }
       state RewardState of RewardsAccount.fields
       handler Inputs is { ??? }
 
@@ -78,7 +78,7 @@ domain ReactiveBBQ is {
       record fields is {
         id is ReactiveBBQ.CustomerId,
         points is Number,
-        rewardEvents is many optional Loyalty.RewardEvent
+        rewardEvents is Loyalty.RewardEvent*
       }
       state RewardState of RewardsAccount.fields
       handler Inputs is { ??? }
 
@@ -0,0 +1,166 @@
+#!/usr/bin/env python3
+"""
+EBNF to TatSu Preprocessor
+
+Converts RIDDL's EBNF grammar to TatSu-compatible format.
+TatSu uses PEG (Parsing Expression Grammar) semantics but reads EBNF-like syntax.
+
+Key conversions:
+- Comments: (* comment *) -> # comment
+- Character ranges: "A" | "B" | ... | "Z" -> /[A-Z]/
+- Repetition: {x}+ stays as {x}+, {x} stays as {x}
+- Optional: [x] stays as [x]
+- TatSu doesn't support bare x+ outside braces
+
+TatSu syntax reference: https://tatsu.readthedocs.io/en/stable/syntax.html
+"""
+
+import re
+from typing import List, Tuple
+
+
+def preprocess_for_tatsu(ebnf_content: str) -> str:
+    """
+    Convert RIDDL EBNF to TatSu-compatible format.
+
+    Args:
+        ebnf_content: The original EBNF content from ebnf-grammar.ebnf
+
+    Returns:
+        TatSu-compatible grammar string
+    """
+    result = ebnf_content
+
+    # 1. Convert EBNF comments (* ... *) to TatSu comments # ...
+    # Handle multi-line comments
+    result = re.sub(r'\(\*([^*]*(?:\*(?!\))[^*]*)*)\*\)', _convert_comment, result)
+
+    # 2. Convert ellipsis character ranges to regex patterns
+    # "A" | "B" | ... | "Z" -> /[A-Z]/
+    result = re.sub(
+        r'"([A-Za-z])"\s*\|\s*"([A-Za-z])"\s*\|\s*\.\.\.\s*\|\s*"([A-Za-z])"',
+        lambda m: f'/[{m.group(1)}-{m.group(3)}]/',
+        result
+    )
+
+    # 3. Convert digit ranges if present
+    # "0" | "1" | ... | "9" -> /[0-9]/
+    result = re.sub(
+        r'"(\d)"\s*\|\s*"(\d)"\s*\|\s*\.\.\.\s*\|\s*"(\d)"',
+        lambda m: f'/[{m.group(1)}-{m.group(3)}]/',
+        result
+    )
+
+    # 4. TatSu-specific: function call syntax like cardinality(...) is not valid
+    # Convert cardinality(type_expression) to just the content with cardinality prefix
+    # This is a complex grammar feature - for now, simplify by removing the function syntax
+    result = re.sub(r'cardinality\s*\(\s*\n?', '(', result)
+
+    # 5. TatSu header with start rule and whitespace handling
+    header = """# RIDDL Grammar in TatSu Format
+# AUTO-GENERATED from ebnf-grammar.ebnf by ebnf_preprocessor.py
+# DO NOT EDIT MANUALLY
+
+@@grammar :: RIDDL
+@@whitespace :: /[\\s]+/
+
+# Start rule - uses closure for one-or-more
+start = {root_content}+ $ ;
+
+"""
+
+    # 6. Clean up any empty lines left over
+    result = re.sub(r'\n{3,}', '\n\n', result)
+
+    return header + result
+
+
+def _convert_comment(match: re.Match) -> str:
+    """Convert a single EBNF comment to TatSu format."""
+    comment_text = match.group(1).strip()
+    lines = comment_text.split('\n')
+    return '\n'.join(f'# {line.strip()}' for line in lines)
+
+
+def extract_rules(ebnf_content: str) -> List[Tuple[str, str]]:
+    """
+    Extract all rules from EBNF content as (name, body) tuples.
+
+    Args:
+        ebnf_content: EBNF grammar content
+
+    Returns:
+        List of (rule_name, rule_body) tuples
+    """
+    rules = []
+
+    # Remove comments first
+    content = re.sub(r'\(\*[^*]*(?:\*(?!\))[^*]*)*\*\)', '', ebnf_content)
+
+    # Match rules: name = body ;
+    # Handle multi-line rules
+    current_rule = ""
+    for line in content.split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+
+        current_rule += " " + line
+
+        if ';' in current_rule:
+            # Could have multiple rules on one line
+            for rule_match in re.finditer(r'(\w+)\s*=\s*(.+?)\s*;', current_rule):
+                name = rule_match.group(1)
+                body = rule_match.group(2).strip()
+                rules.append((name, body))
+            current_rule = ""
+
+    return rules
+
+
+def find_keywords(ebnf_content: str) -> set:
+    """
+    Find all keyword literals in the grammar.
+
+    Returns set of keyword strings that appear as terminals.
+    """
+    keywords = set()
+
+    # Find all quoted strings that look like keywords (lowercase identifiers)
+    for match in re.finditer(r'"([a-z][a-z_]*)"', ebnf_content):
+        keyword = match.group(1)
+        # Skip single characters and special tokens
+        if len(keyword) > 1:
+            keywords.add(keyword)
+
+    return keywords
+
+
+if __name__ == "__main__":
+    import argparse
+    from pathlib import Path
+
+    parser = argparse.ArgumentParser(description="Convert EBNF to TatSu format")
+    parser.add_argument("input", type=Path, help="Input EBNF file")
+    parser.add_argument("-o", "--output", type=Path, help="Output file")
+    parser.add_argument("--show-keywords", action="store_true",
+                        help="Show keywords found in grammar")
+
+    args = parser.parse_args()
+
+    with open(args.input, 'r', encoding='utf-8') as f:
+        content = f.read()
+
+    if args.show_keywords:
+        keywords = find_keywords(content)
+        print("Keywords found:")
+        for kw in sorted(keywords):
+            print(f"  {kw}")
+    else:
+        result = preprocess_for_tatsu(content)
+        if args.output:
+            with open(args.output, 'w', encoding='utf-8') as f:
+                f.write(result)
+            print(f"Wrote TatSu grammar to {args.output}")
+        else:
+            print(result)
Original file line number	Diff line number	Diff line change
`@@ -80,7 +80,7 @@ domain ReactiveBBQ is {`
`80`	`80`	`record fields is {`
`81`	`81`	`id is ReactiveBBQ.CustomerId,`
`82`	`82`	`points is Number,`
`83`		`- rewardEvents is many optional Loyalty.RewardEvent`
	`83`	`+ rewardEvents is Loyalty.RewardEvent*`
`84`	`84`	`}`
`85`	`85`	`state RewardState of RewardsAccount.fields`
`86`	`86`	`handler Inputs is { ??? }`
Original file line number	Diff line number	Diff line change
`@@ -78,7 +78,7 @@ domain ReactiveBBQ is {`
`78`	`78`	`record fields is {`
`79`	`79`	`id is ReactiveBBQ.CustomerId,`
`80`	`80`	`points is Number,`
`81`		`- rewardEvents is many optional Loyalty.RewardEvent`
	`81`	`+ rewardEvents is Loyalty.RewardEvent*`
`82`	`82`	`}`
`83`	`83`	`state RewardState of RewardsAccount.fields`
`84`	`84`	`handler Inputs is { ??? }`