Skip to content

Commit d1de810

Browse files
reidspencerclaude
andcommitted
Complete BAST Phase 3: Fix Repository/Schema tag collision
- Add NODE_SCHEMA tag (35) for Schema nodes to resolve tag collision - Repository now writes without subtype byte - Schema uses NODE_SCHEMA with schemaKind subtype - Split reader into separate readRepositoryNode() and readSchemaNode() - All 14 tests passing, ShopifyCart 20x faster than parsing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent a3abb44 commit d1de810

5 files changed

Lines changed: 575 additions & 346 deletions

File tree

bast/KNOWN_ISSUES.md

Lines changed: 57 additions & 145 deletions
Original file line numberDiff line numberDiff line change
@@ -1,180 +1,92 @@
11
# BAST Known Issues
22

3-
## Multiple Contents Fields Serialization Bug
3+
## Current Status (January 14, 2026)
44

5-
**Status**: CRITICAL - Affects serialization/deserialization of larger files
5+
**All known issues have been fixed. BAST serialization is working correctly.**
66

7-
**Discovered**: January 11, 2026
8-
9-
### Symptoms
10-
11-
- Small files (ToDoodles: 12 nodes) serialize/deserialize correctly
12-
- Larger files (ShopifyCart: 510 nodes) fail during deserialization
13-
- Error: "Invalid string table index: 1000019 (table size: 649)"
14-
- The error occurs because the reader/writer become out of sync
7+
---
158

16-
### Root Cause
9+
## Fixed Issues
1710

18-
Some AST nodes have **multiple Contents fields**:
11+
### 1. Repository/Schema Tag Collision - FIXED (Jan 14, 2026)
1912

20-
1. **`SagaStep`** (will not be removed)
21-
- `doStatements: Contents[Statements]`
22-
- `undoStatements: Contents[Statements]`
13+
**Problem**: Both Repository and Schema were using the same `NODE_REPOSITORY` tag (12), with a subtype byte to distinguish them. This caused the reader to misinterpret data when reading Repository nodes.
2314

24-
2. **`IfThenElseStatement`** (may be removed in future revision)
25-
- `thens: Contents[Statements]`
26-
- `elses: Contents[Statements]`
15+
**Solution**:
16+
- Added new `NODE_SCHEMA` tag (35) for Schema nodes
17+
- Repository now writes: `NODE_REPOSITORY, loc, id, contents, metadata`
18+
- Schema now writes: `NODE_SCHEMA, schemaKind, loc, id, data, links, indices, metadata`
19+
- Reader dispatch handles both tags separately
2720

28-
**The Problem**:
21+
**Files Modified**:
22+
- `package.scala`: Added `NODE_SCHEMA: Byte = 35`
23+
- `BASTWriter.scala`: Updated `writeRepository()` to not write subtype, updated `writeSchema()` to use `NODE_SCHEMA`
24+
- `BASTReader.scala`: Split `readRepositoryOrSchemaNode()` into `readRepositoryNode()` and `readSchemaNode()`
2925

30-
The current serialization architecture has a design flaw:
26+
### 2. Multiple Contents Fields Serialization - FIXED (Jan 13, 2026)
3127

32-
```scala
33-
// BASTWriter.scala:658-660
34-
private def writeSagaStep(ss: SagaStep): Unit = {
35-
writer.writeU8(NODE_HANDLER)
36-
writeLocation(ss.loc)
37-
writeIdentifier(ss.id)
38-
writeContents(ss.doStatements) // Writes count immediately
39-
writeContents(ss.undoStatements) // Writes count immediately
40-
}
41-
```
28+
**Problem**: Nodes with multiple Contents fields (SagaStep, IfThenElseStatement, ForEachStatement) were not serializing correctly.
4229

43-
The `writeContents()` method writes the count:
30+
**Solution**:
31+
- Added special cases in `traverse()` override for multi-Contents nodes
32+
- Each Contents field now writes: count, items (interleaved)
33+
- Added `NODE_SAGA_STEP` tag (34) to distinguish SagaStep from Handler
4434

45-
```scala
46-
private def writeContents[T <: RiddlValue](contents: Contents[T]): Unit = {
47-
writer.writeVarInt(contents.length)
48-
// Note: Individual elements are written by the main process() method during traversal
49-
}
50-
```
51-
52-
But the `traverse()` override only processes the main `.contents` field:
53-
54-
```scala
55-
override protected def traverse(definition: RiddlValue, parents: ParentStack): Unit = {
56-
definition match {
57-
case branch: Branch[?] with WithMetaData =>
58-
process(branch, parents)
59-
parents.push(branch)
60-
branch.contents.foreach { value => traverse(value, parents) } // Only .contents!
61-
parents.pop()
62-
writeMetadataCount(branch.metadata)
63-
case _ =>
64-
super.traverse(definition, parents)
65-
}
66-
}
67-
```
35+
### 3. Statement/Handler Disambiguation - FIXED (Jan 13, 2026)
6836

69-
**Result**:
70-
- Count for `doStatements` is written
71-
- Count for `undoStatements` is written
72-
- Items for `doStatements` are NEVER written (not in `.contents`)
73-
- Items for `undoStatements` are NEVER written (not in `.contents`)
74-
- Reader expects items after the count, reads garbage data
75-
- Deserialization fails with "Invalid string table index"
37+
**Problem**: Reader couldn't reliably distinguish statements from handlers.
7638

77-
### Affected Files
39+
**Solution**:
40+
- Added `STATEMENT_MARKER` (0xFF = 255) byte after NODE_HANDLER for statements
41+
- Reader checks for marker before interpreting statement type
7842

79-
**Writer**: `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTWriter.scala`
80-
- Lines 655-660: `writeSagaStep()`
81-
- Lines 913-920: `writeIfThenElseStatement()`
82-
- Line 1682-1685: `writeContents()`
43+
### 4. Branch Types without WithMetaData - FIXED (Jan 13, 2026)
8344

84-
**Reader**: `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTReader.scala`
85-
- Lines 1660-1673: `readContentsDeferred()`
45+
**Problem**: Several types extend `Branch[T]` but NOT `WithMetaData`.
8646

87-
### Impact
47+
**Affected types**: Handler, OnClauses, Type, UseCase, Group, Output, Input
8848

89-
-**Works**: Files without SagaStep or IfThenElseStatement (e.g., ToDoodles)
90-
-**Fails**: Files containing SagaStep or IfThenElseStatement (e.g., ShopifyCart)
49+
**Solution**: Added explicit traverse() cases for each of these types.
9150

92-
### Performance Impact
93-
94-
When working correctly:
95-
- **Small files (12 nodes)**: 1.8x speedup over parsing
96-
- **Large files (510 nodes)**: 24.6x speedup over parsing (observed before deserialization failed)
51+
---
9752

98-
### Solution Options
53+
## Performance Results (All Tests Passing)
9954

100-
#### Option 1: Special-case traverse() for multi-Contents nodes
55+
| File | Nodes | Parse Time | BAST Read | Speedup | Status |
56+
|------|-------|------------|-----------|---------|--------|
57+
| ToDoodles | 12 | ~6ms | ~6ms | ~1x | ✅ Working |
58+
| ShopifyCart | 667 | ~77ms | ~4ms | **20.1x** | ✅ Working |
10159

102-
Extend the `traverse()` override to detect and handle nodes with multiple Contents fields:
60+
---
10361

104-
```scala
105-
override protected def traverse(definition: RiddlValue, parents: ParentStack): Unit = {
106-
definition match {
107-
case ss: SagaStep =>
108-
process(ss, parents)
109-
parents.push(ss)
110-
ss.doStatements.foreach { value => traverse(value, parents) }
111-
ss.undoStatements.foreach { value => traverse(value, parents) }
112-
parents.pop()
113-
writeMetadataCount(ss.metadata)
62+
## Technical Notes
11463

115-
case ite: IfThenElseStatement =>
116-
process(ite, parents)
117-
parents.push(ite)
118-
ite.thens.foreach { value => traverse(value, parents) }
119-
ite.elses.foreach { value => traverse(value, parents) }
120-
parents.pop()
121-
// No metadata for statements
64+
### Write/Read Order
12265

123-
case branch: Branch[?] with WithMetaData =>
124-
// ... existing code
125-
}
126-
}
12766
```
67+
Writer produces for Branch nodes:
68+
tag, nodeSpecificFields..., contentsCount, contentItems..., metadataCount, metadataItems...
12869
129-
**Pros**: Minimal changes, surgical fix
130-
**Cons**: Must remember to update for any future multi-Contents nodes
131-
132-
#### Option 2: Refactor writeContents() to defer writing
133-
134-
Change `writeContents()` to not write anything immediately. Instead, track pending Contents writes and emit them during traversal.
135-
136-
**Pros**: More robust, handles any future cases
137-
**Cons**: Significant refactoring, more complex state management
138-
139-
#### Option 3: Two-phase serialization
140-
141-
Separate count-writing from item-writing phases.
142-
143-
**Pros**: Clean separation of concerns
144-
**Cons**: Requires complete redesign of serialization
145-
146-
### Recommended Fix
147-
148-
**Option 1** (special-case traverse) is recommended because:
149-
1. Minimal code changes
150-
2. Easy to understand and verify
151-
3. Only 2 node types affected (possibly only 1 if IfThenElseStatement is removed)
152-
4. Fast to implement and test
153-
154-
### Test Cases Needed
155-
156-
After fix:
157-
1. ✅ Verify ToDoodles still works (regression test)
158-
2. ✅ Verify ShopifyCart serializes/deserializes correctly
159-
3. ✅ Create test specifically for SagaStep round-trip
160-
4. ✅ Create test for IfThenElseStatement round-trip (if not removed)
161-
5. ✅ Verify performance benchmarks still show speedup
70+
Reader expects (after tag dispatch):
71+
nodeSpecificFields..., contentsCount+Items (via readContentsDeferred), metadataCount+Items
72+
```
16273

163-
### Related Files
74+
### Statement Format
16475

165-
- `bast/jvm/src/test/scala/com/ossuminc/riddl/bast/BenchmarkRunner.scala` - Performance benchmark
166-
- `bast/jvm/src/test/scala/com/ossuminc/riddl/bast/TestRunner.scala` - Round-trip test
167-
- `bast/shared/src/test/scala/com/ossuminc/riddl/bast/DeepASTComparison.scala` - Deep comparison utility
76+
```
77+
Statement: NODE_HANDLER, STATEMENT_MARKER(255), stmtType(0-16), loc, stmtSpecificData...
78+
Handler: NODE_HANDLER, loc, id, contentsCount, contentItems..., metadataCount, metadataItems...
79+
```
16880

169-
### Notes
81+
### Location Encoding
17082

171-
- The issue does NOT affect metadata serialization (fixed in earlier session)
172-
- The issue does NOT affect Include node preservation (working correctly)
173-
- The issue does NOT affect location delta encoding (working correctly)
174-
- The issue ONLY affects nodes with multiple Contents fields
83+
```
84+
First location: originString, offset, line, col
85+
Delta location: sameSourceFlag(0/1), [originString if flag=1], offsetDelta+1M, lineDelta+1K, colDelta+1K
86+
```
17587

17688
---
17789

178-
**Last Updated**: January 11, 2026
179-
**Severity**: High
180-
**Priority**: Must fix before production use
90+
**Last Updated**: January 14, 2026
91+
**Severity**: None (all issues resolved)
92+
**Priority**: N/A

bast/SESSION_HANDOFF.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# BAST Session Handoff Document
2+
3+
**Date**: January 14, 2026
4+
**Branch**: `development`
5+
**Previous Session**: January 13, 2026
6+
7+
---
8+
9+
## Overview
10+
11+
**BAST (Binary AST) serialization is now fully working.** All known issues have been resolved. Both small (ToDoodles) and large (ShopifyCart) files serialize and deserialize correctly with excellent performance characteristics.
12+
13+
---
14+
15+
## Session Progress (January 14, 2026)
16+
17+
### Issue Fixed
18+
19+
**Repository/Schema Tag Collision** - FIXED
20+
21+
**Problem**: Both Repository and Schema were using the same `NODE_REPOSITORY` tag (12), with a subtype byte to distinguish them. This caused the reader to misinterpret data when reading Repository nodes, leading to "Invalid string table index: 1000019" errors.
22+
23+
**Solution**:
24+
- Added new `NODE_SCHEMA` tag (35) for Schema nodes
25+
- Repository now writes: `NODE_REPOSITORY, loc, id, contents, metadata` (no subtype)
26+
- Schema now writes: `NODE_SCHEMA, schemaKind, loc, id, data, links, indices, metadata`
27+
- Reader dispatch handles both tags separately with dedicated `readRepositoryNode()` and `readSchemaNode()` methods
28+
29+
**Files Modified**:
30+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/package.scala`: Added `NODE_SCHEMA: Byte = 35`
31+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTWriter.scala`:
32+
- `writeRepository()`: Removed subtype byte (255)
33+
- `writeSchema()`: Changed from `NODE_REPOSITORY` to `NODE_SCHEMA`
34+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTReader.scala`:
35+
- Added `NODE_SCHEMA` case to dispatch
36+
- Split `readRepositoryOrSchemaNode()` into `readRepositoryNode()` and `readSchemaNode()`
37+
38+
---
39+
40+
## Current Status
41+
42+
### All Tests Passing
43+
44+
| Test Suite | Tests | Status |
45+
|------------|-------|--------|
46+
| BASTWriterSpec | 7 | ✅ All passing |
47+
| BASTRoundTripTest | 3 | ✅ All passing |
48+
| BASTPerformanceTest | 4 | ✅ All passing |
49+
| **Total** | **14** |**All passing** |
50+
51+
### Performance Results
52+
53+
| File | Nodes | Parse Time | BAST Read | Speedup | Status |
54+
|------|-------|------------|-----------|---------|--------|
55+
| ToDoodles | 12 | 5.95 ms | 6.16 ms | ~1x | ✅ Working |
56+
| ShopifyCart | 667 | 77.01 ms | 3.83 ms | **20.1x** | ✅ Working |
57+
58+
### Deep Comparison Results (ToDoodles)
59+
60+
- Total comparisons: 8
61+
- Successes: 8 (100%)
62+
- Failures: 0
63+
- All structural relationships preserved
64+
65+
---
66+
67+
## Previously Fixed Issues (Jan 13, 2026)
68+
69+
1. **Multi-Contents Nodes** - SagaStep, IfThenElseStatement, ForEachStatement
70+
2. **Statement/Handler Disambiguation** - Added STATEMENT_MARKER (255)
71+
3. **Branch Types without WithMetaData** - Handler, OnClauses, Type, UseCase, Group, Output, Input
72+
73+
---
74+
75+
## Remaining Work (Future Sessions)
76+
77+
### Phase 4: Import Integration (PENDING)
78+
- [ ] Update import syntax parser in `CommonParser.scala`
79+
- [ ] Implement `doImport()` in `ParsingContext.scala`
80+
- [ ] Implement namespace resolution in path resolution
81+
- [ ] Add `.bast` file detection to `TopLevelParser`
82+
- [ ] Implement BAST cache invalidation
83+
84+
### Phase 5: CLI & Testing (PENDING)
85+
- [ ] Add `riddlc bast-gen` command
86+
- [ ] Add command-line flags (`--use-bast-cache`, `--bast-dir`)
87+
- [ ] Additional edge case tests
88+
- [ ] Cross-platform testing (JS, Native)
89+
90+
### Phase 6: Documentation (PENDING)
91+
- [ ] Write BAST format specification document
92+
- [ ] Document serialization/deserialization API
93+
- [ ] Add ScalaDoc to all public APIs
94+
- [ ] Create usage examples
95+
96+
---
97+
98+
## How to Test
99+
100+
```bash
101+
# Unit tests (14 tests)
102+
sbt "project bast" test
103+
104+
# Deep comparison test (ToDoodles)
105+
sbt "bast/Test/runMain com.ossuminc.riddl.bast.TestRunner"
106+
107+
# Performance benchmark (ToDoodles + ShopifyCart)
108+
sbt "bast/Test/runMain com.ossuminc.riddl.bast.BenchmarkRunner"
109+
```
110+
111+
---
112+
113+
## Key Code Locations
114+
115+
**Core Serialization**:
116+
- `BASTWriter.scala` - Serialization logic
117+
- `BASTReader.scala` - Deserialization logic
118+
- `package.scala` - Node type tags and constants
119+
120+
**Test Utilities**:
121+
- `BenchmarkRunner.scala` - Performance testing
122+
- `TestRunner.scala` - Deep comparison testing
123+
- `DeepASTComparison.scala` - Structural comparison utility
124+
125+
---
126+
127+
## Git Information
128+
129+
**Branch**: `development`
130+
131+
**Files Modified This Session**:
132+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/package.scala`
133+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTWriter.scala`
134+
- `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTReader.scala`
135+
- `bast/KNOWN_ISSUES.md`
136+
- `bast/SESSION_HANDOFF.md`
137+
138+
---
139+
140+
## Summary
141+
142+
**Phase 2 Complete**: Core serialization working for all node types
143+
**Phase 3 Complete**: Deserialization working with full round-trip verification
144+
**Performance**: 20x speedup for large files confirmed
145+
**Next**: Import integration (Phase 4)
146+
147+
The BAST foundation is solid and production-ready. The next major milestone is implementing the `import "file.bast" as namespace` functionality.
148+
149+
---
150+
151+
**End of Handoff Document**

0 commit comments

Comments
 (0)