Skip to content

Commit b24c8f8

Browse files
committed
Merge feature/parsing-fixes for release 1.2.1
2 parents 98cb0e0 + bfea9a0 commit b24c8f8

14 files changed

Lines changed: 904 additions & 71 deletions

File tree

.github/workflows/scala.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -176,10 +176,21 @@ jobs:
176176
run: |
177177
pip install -r language/jvm/src/test/python/requirements.txt
178178
179-
- name: Validate EBNF Grammar
179+
- name: Validate EBNF Grammar with TatSu (internal test files)
180180
run: |
181181
cd language/jvm/src/test/python
182-
python ebnf_validator.py --verbose
182+
python ebnf_tatsu_validator.py --verbose
183+
184+
- name: Checkout riddl-examples
185+
uses: actions/checkout@v4
186+
with:
187+
repository: ossuminc/riddl-examples
188+
path: riddl-examples
189+
190+
- name: Validate EBNF Grammar against riddl-examples
191+
run: |
192+
cd language/jvm/src/test/python
193+
python validate_external_riddl.py --repo ${{ github.workspace }}/riddl-examples
183194
184195
dependency-check:
185196
timeout-minutes: 15

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,5 @@ target
2828
*.bak
2929
language/input/everything.bast
3030
.claude/
31+
language/jvm/src/test/python/.venv/
32+
__pycache__/

CLAUDE.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,27 @@ sed -i 's/scala-3.3.7/scala-3.9.0/g' .github/workflows/*.yml
247247

248248
## Testing Patterns
249249

250+
### Parser/EBNF Synchronization Requirement
251+
252+
**Any change to the fastparse parser MUST have a corresponding change to the EBNF grammar.**
253+
254+
The EBNF grammar at `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf`
255+
is the canonical specification of RIDDL syntax. It is validated by a TatSu-based parser
256+
that runs in CI on all `**/input/**/*.riddl` test files.
257+
258+
When modifying the fastparse parser:
259+
1. Update the corresponding rule(s) in `ebnf-grammar.ebnf`
260+
2. Run the EBNF validator locally:
261+
```bash
262+
cd language/jvm/src/test/python
263+
pip install -r requirements.txt # first time only
264+
python ebnf_tatsu_validator.py
265+
```
266+
3. Ensure both parsers accept the same inputs
267+
4. CI will fail if the EBNF parser cannot parse test files that fastparse accepts
268+
269+
This ensures the documented grammar stays in sync with the actual implementation.
270+
250271
### Compilation After Every Change
251272
When implementing new code:
252273
1. Write the code

NOTEBOOK.md

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,26 @@ This is the central engineering notebook for the RIDDL project. It tracks curren
66

77
## Current Status
88

9-
**Last Updated**: January 31, 2026
9+
**Last Updated**: February 1, 2026
1010

1111
**Release 1.2.0 Published**: Scala 3.7.4 upgrade complete. All tests pass. Published
1212
to GitHub Packages.
1313

14+
**In Progress**: TatSu-based EBNF validation (branch: `feature/tatsu-ebnf-validation`).
15+
Fixing EBNF grammar drift from fastparse implementation.
16+
17+
**EBNF Validation Complete** (Feb 1, 2026):
18+
- Internal test files: 59/77 passed, 13 include fragments skipped, 5 expected
19+
failures (all have bugs that fastparse also rejects)
20+
- External test files (riddl-examples): 8/9 passed, 1 expected failure (Trello
21+
needs AI regeneration)
22+
- Key fixes made: comment regex tokens (avoid whitespace-skipping issues),
23+
statement rule (added morph/become), pseudo_code_contents rule, interactions
24+
rule (added comment support)
25+
- CI updated to validate against riddl-examples repository
26+
- Tasks #7-10 created for fastparse fixes to match EBNF (hex escapes, cardinality
27+
mutual exclusivity, metadata with-block requirement)
28+
1429
The RIDDL project is a mature compiler and toolchain for the Reactive Interface
1530
to Domain Definition Language. BAST serialization is **complete** (60 tests,
1631
6-10x speedup). Hugo and diagrams modules moved to another repository.
@@ -35,6 +50,14 @@ AI-friendly validation pass for MCP server integration. See design section below
3550

3651
---
3752

53+
## Blocked Tasks
54+
55+
| Task | Blocked By | Notes |
56+
|------|------------|-------|
57+
| Add EBNF validation for riddl-models repository | riddl-models needs to be populated with RIDDL models | Similar to riddl-examples validation in CI; will validate against external repository once content exists |
58+
59+
---
60+
3861
## Scheduled Tasks
3962

4063
| Date | Task | Notes |
@@ -116,6 +139,99 @@ The `pseudoCodeBlock` parser now allows comments before and/or after `???`:
116139

117140
## Session Log
118141

142+
### February 1, 2026 (Cardinality Fix)
143+
144+
**Focus**: Fix cardinality prefix/suffix mutual exclusivity
145+
146+
**Branch**: `feature/parsing-fixes`
147+
148+
**Work Completed**:
149+
1.**Updated EBNF grammar** to allow `many optional` as valid prefix combination
150+
- `type_expression` and `field_type_expression` now accept `("many" ["optional"] | "optional")`
151+
2.**Updated TypeParser.scala** to enforce mutual exclusivity
152+
- Allows prefix only: `many` (=+), `optional` (=?), `many optional` (=*)
153+
- Allows suffix only: `?`, `+`, `*`
154+
- Rejects prefix AND suffix together with clear error message
155+
3.**Restored rbbq.riddl** to use `many optional RewardEvent` syntax
156+
- Demonstrates valid cardinality prefix usage
157+
- Fixes TokenParserTest expected offsets
158+
159+
**Test Results**: All 715 tests pass across all modules
160+
161+
**Task #10 Verification** (metadata with-block requirement):
162+
- Confirmed fastparse already correctly enforces `with { }` wrapper for metadata
163+
- `} briefly "..."` (after close, no with) → Rejected ✅
164+
- `{ briefly "..." }` (inside body) → Rejected ✅
165+
- `} with { briefly "..." }` (correct syntax) → Accepted ✅
166+
- No code changes needed - task was already satisfied
167+
168+
**Files Modified**:
169+
- `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf`
170+
- `language/shared/src/main/scala/com/ossuminc/riddl/language/parsing/TypeParser.scala`
171+
- `language/input/rbbq.riddl`
172+
173+
---
174+
175+
### February 1, 2026 (TatSu EBNF Validation - In Progress)
176+
177+
**Focus**: Implement automated EBNF grammar validation in CI using TatSu
178+
179+
**Branch**: `feature/tatsu-ebnf-validation`
180+
181+
**Context**: The EBNF grammar at `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf` documents RIDDL syntax but can drift from the actual fastparse implementation. This work adds CI validation to catch drift.
182+
183+
**Work Completed**:
184+
1.**Created TatSu-based validator framework**
185+
- `language/jvm/src/test/python/ebnf_preprocessor.py` - Converts EBNF to TatSu format
186+
- `language/jvm/src/test/python/ebnf_tatsu_validator.py` - Validates RIDDL files
187+
- Updated `requirements.txt` with TatSu>=5.12.0
188+
189+
2.**Updated CI workflow** (`.github/workflows/scala.yml`)
190+
- Changed from Lark-based to TatSu-based validation
191+
192+
3.**Updated CLAUDE.md**
193+
- Added "Parser/EBNF Synchronization Requirement" section
194+
195+
4.**Fixed EBNF Issue #1: `???` placeholder ordering**
196+
- Problem: PEG parsers try alternatives in order; closures matching zero items shadow `???`
197+
- Fixed `enumerators` (line 100): `{enumerator [","]} | "???"``"???" | {enumerator [","]}`
198+
- Fixed `aggregate_definitions` (line 115): same pattern
199+
200+
5. 🚧 **Discovered EBNF Issue #2: `simple_identifier` consuming whitespace**
201+
- TatSu skips whitespace between tokens, even inside closures
202+
- `simple_identifier = letter { letter | digit | "_" | "-" }` causes "A from context Two" to parse as single identifier
203+
- **Proposed fix**: `simple_identifier = /[a-zA-Z][a-zA-Z0-9_-]*/` (regex pattern)
204+
205+
**Current Validation Results**:
206+
- 14/77 files pass
207+
- 13 skipped (include fragments)
208+
- 2 expected failures
209+
- 48 unexpected failures (EBNF/parser drift to fix)
210+
211+
**Key Learning**: TatSu uses PEG semantics where:
212+
- Alternatives are tried in order (put specific literals before general patterns)
213+
- Whitespace is skipped between token matches (lexical rules need regex patterns)
214+
- Closures matching zero items "succeed" and don't try next alternative
215+
216+
**Files Created**:
217+
- `language/jvm/src/test/python/ebnf_preprocessor.py`
218+
- `language/jvm/src/test/python/ebnf_tatsu_validator.py`
219+
220+
**Files Modified**:
221+
- `language/jvm/src/test/python/requirements.txt`
222+
- `language/shared/src/main/resources/riddl/grammar/ebnf-grammar.ebnf` (2 fixes)
223+
- `.github/workflows/scala.yml`
224+
- `.gitignore` (added .venv)
225+
- `CLAUDE.md`
226+
227+
**Next Steps**:
228+
1. Fix `simple_identifier` to use regex pattern (prevents whitespace consumption)
229+
2. Fix `quoted_identifier` similarly
230+
3. Review and fix other lexical rules (zone, option_name, etc.)
231+
4. Work through remaining 48 failures systematically
232+
233+
---
234+
119235
### January 31, 2026 (Scala 3.7.4 Compiler Bug - RESOLVED)
120236

121237
**Focus**: Fix Scala 3.7.4 compiler infinite loop caused by opaque type Contents

commands/input/rbbq.riddl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ domain ReactiveBBQ is {
8080
record fields is {
8181
id is ReactiveBBQ.CustomerId,
8282
points is Number,
83-
rewardEvents is many optional Loyalty.RewardEvent
83+
rewardEvents is Loyalty.RewardEvent*
8484
}
8585
state RewardState of RewardsAccount.fields
8686
handler Inputs is { ??? }

language/input/domains/rbbq.riddl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ domain ReactiveBBQ is {
7878
record fields is {
7979
id is ReactiveBBQ.CustomerId,
8080
points is Number,
81-
rewardEvents is many optional Loyalty.RewardEvent
81+
rewardEvents is Loyalty.RewardEvent*
8282
}
8383
state RewardState of RewardsAccount.fields
8484
handler Inputs is { ??? }
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
#!/usr/bin/env python3
2+
"""
3+
EBNF to TatSu Preprocessor
4+
5+
Converts RIDDL's EBNF grammar to TatSu-compatible format.
6+
TatSu uses PEG (Parsing Expression Grammar) semantics but reads EBNF-like syntax.
7+
8+
Key conversions:
9+
- Comments: (* comment *) -> # comment
10+
- Character ranges: "A" | "B" | ... | "Z" -> /[A-Z]/
11+
- Repetition: {x}+ stays as {x}+, {x} stays as {x}
12+
- Optional: [x] stays as [x]
13+
- TatSu doesn't support bare x+ outside braces
14+
15+
TatSu syntax reference: https://tatsu.readthedocs.io/en/stable/syntax.html
16+
"""
17+
18+
import re
19+
from typing import List, Tuple
20+
21+
22+
def preprocess_for_tatsu(ebnf_content: str) -> str:
23+
"""
24+
Convert RIDDL EBNF to TatSu-compatible format.
25+
26+
Args:
27+
ebnf_content: The original EBNF content from ebnf-grammar.ebnf
28+
29+
Returns:
30+
TatSu-compatible grammar string
31+
"""
32+
result = ebnf_content
33+
34+
# 1. Convert EBNF comments (* ... *) to TatSu comments # ...
35+
# Handle multi-line comments
36+
result = re.sub(r'\(\*([^*]*(?:\*(?!\))[^*]*)*)\*\)', _convert_comment, result)
37+
38+
# 2. Convert ellipsis character ranges to regex patterns
39+
# "A" | "B" | ... | "Z" -> /[A-Z]/
40+
result = re.sub(
41+
r'"([A-Za-z])"\s*\|\s*"([A-Za-z])"\s*\|\s*\.\.\.\s*\|\s*"([A-Za-z])"',
42+
lambda m: f'/[{m.group(1)}-{m.group(3)}]/',
43+
result
44+
)
45+
46+
# 3. Convert digit ranges if present
47+
# "0" | "1" | ... | "9" -> /[0-9]/
48+
result = re.sub(
49+
r'"(\d)"\s*\|\s*"(\d)"\s*\|\s*\.\.\.\s*\|\s*"(\d)"',
50+
lambda m: f'/[{m.group(1)}-{m.group(3)}]/',
51+
result
52+
)
53+
54+
# 4. TatSu-specific: function call syntax like cardinality(...) is not valid
55+
# Convert cardinality(type_expression) to just the content with cardinality prefix
56+
# This is a complex grammar feature - for now, simplify by removing the function syntax
57+
result = re.sub(r'cardinality\s*\(\s*\n?', '(', result)
58+
59+
# 5. TatSu header with start rule and whitespace handling
60+
header = """# RIDDL Grammar in TatSu Format
61+
# AUTO-GENERATED from ebnf-grammar.ebnf by ebnf_preprocessor.py
62+
# DO NOT EDIT MANUALLY
63+
64+
@@grammar :: RIDDL
65+
@@whitespace :: /[\\s]+/
66+
67+
# Start rule - uses closure for one-or-more
68+
start = {root_content}+ $ ;
69+
70+
"""
71+
72+
# 6. Clean up any empty lines left over
73+
result = re.sub(r'\n{3,}', '\n\n', result)
74+
75+
return header + result
76+
77+
78+
def _convert_comment(match: re.Match) -> str:
79+
"""Convert a single EBNF comment to TatSu format."""
80+
comment_text = match.group(1).strip()
81+
lines = comment_text.split('\n')
82+
return '\n'.join(f'# {line.strip()}' for line in lines)
83+
84+
85+
def extract_rules(ebnf_content: str) -> List[Tuple[str, str]]:
86+
"""
87+
Extract all rules from EBNF content as (name, body) tuples.
88+
89+
Args:
90+
ebnf_content: EBNF grammar content
91+
92+
Returns:
93+
List of (rule_name, rule_body) tuples
94+
"""
95+
rules = []
96+
97+
# Remove comments first
98+
content = re.sub(r'\(\*[^*]*(?:\*(?!\))[^*]*)*\*\)', '', ebnf_content)
99+
100+
# Match rules: name = body ;
101+
# Handle multi-line rules
102+
current_rule = ""
103+
for line in content.split('\n'):
104+
line = line.strip()
105+
if not line:
106+
continue
107+
108+
current_rule += " " + line
109+
110+
if ';' in current_rule:
111+
# Could have multiple rules on one line
112+
for rule_match in re.finditer(r'(\w+)\s*=\s*(.+?)\s*;', current_rule):
113+
name = rule_match.group(1)
114+
body = rule_match.group(2).strip()
115+
rules.append((name, body))
116+
current_rule = ""
117+
118+
return rules
119+
120+
121+
def find_keywords(ebnf_content: str) -> set:
122+
"""
123+
Find all keyword literals in the grammar.
124+
125+
Returns set of keyword strings that appear as terminals.
126+
"""
127+
keywords = set()
128+
129+
# Find all quoted strings that look like keywords (lowercase identifiers)
130+
for match in re.finditer(r'"([a-z][a-z_]*)"', ebnf_content):
131+
keyword = match.group(1)
132+
# Skip single characters and special tokens
133+
if len(keyword) > 1:
134+
keywords.add(keyword)
135+
136+
return keywords
137+
138+
139+
if __name__ == "__main__":
140+
import argparse
141+
from pathlib import Path
142+
143+
parser = argparse.ArgumentParser(description="Convert EBNF to TatSu format")
144+
parser.add_argument("input", type=Path, help="Input EBNF file")
145+
parser.add_argument("-o", "--output", type=Path, help="Output file")
146+
parser.add_argument("--show-keywords", action="store_true",
147+
help="Show keywords found in grammar")
148+
149+
args = parser.parse_args()
150+
151+
with open(args.input, 'r', encoding='utf-8') as f:
152+
content = f.read()
153+
154+
if args.show_keywords:
155+
keywords = find_keywords(content)
156+
print("Keywords found:")
157+
for kw in sorted(keywords):
158+
print(f" {kw}")
159+
else:
160+
result = preprocess_for_tatsu(content)
161+
if args.output:
162+
with open(args.output, 'w', encoding='utf-8') as f:
163+
f.write(result)
164+
print(f"Wrote TatSu grammar to {args.output}")
165+
else:
166+
print(result)

0 commit comments

Comments
 (0)