|
1 | 1 | # BAST Known Issues |
2 | 2 |
|
3 | | -## Multiple Contents Fields Serialization Bug |
| 3 | +## Current Status (January 14, 2026) |
4 | 4 |
|
5 | | -**Status**: CRITICAL - Affects serialization/deserialization of larger files |
| 5 | +**All known issues have been fixed. BAST serialization is working correctly.** |
6 | 6 |
|
7 | | -**Discovered**: January 11, 2026 |
8 | | - |
9 | | -### Symptoms |
10 | | - |
11 | | -- Small files (ToDoodles: 12 nodes) serialize/deserialize correctly |
12 | | -- Larger files (ShopifyCart: 510 nodes) fail during deserialization |
13 | | -- Error: "Invalid string table index: 1000019 (table size: 649)" |
14 | | -- The error occurs because the reader/writer become out of sync |
| 7 | +--- |
15 | 8 |
|
16 | | -### Root Cause |
| 9 | +## Fixed Issues |
17 | 10 |
|
18 | | -Some AST nodes have **multiple Contents fields**: |
| 11 | +### 1. Repository/Schema Tag Collision - FIXED (Jan 14, 2026) |
19 | 12 |
|
20 | | -1. **`SagaStep`** (will not be removed) |
21 | | - - `doStatements: Contents[Statements]` |
22 | | - - `undoStatements: Contents[Statements]` |
| 13 | +**Problem**: Both Repository and Schema were using the same `NODE_REPOSITORY` tag (12), with a subtype byte to distinguish them. This caused the reader to misinterpret data when reading Repository nodes. |
23 | 14 |
|
24 | | -2. **`IfThenElseStatement`** (may be removed in future revision) |
25 | | - - `thens: Contents[Statements]` |
26 | | - - `elses: Contents[Statements]` |
| 15 | +**Solution**: |
| 16 | +- Added new `NODE_SCHEMA` tag (35) for Schema nodes |
| 17 | +- Repository now writes: `NODE_REPOSITORY, loc, id, contents, metadata` |
| 18 | +- Schema now writes: `NODE_SCHEMA, schemaKind, loc, id, data, links, indices, metadata` |
| 19 | +- Reader dispatch handles both tags separately |
27 | 20 |
|
28 | | -**The Problem**: |
| 21 | +**Files Modified**: |
| 22 | +- `package.scala`: Added `NODE_SCHEMA: Byte = 35` |
| 23 | +- `BASTWriter.scala`: Updated `writeRepository()` to not write subtype, updated `writeSchema()` to use `NODE_SCHEMA` |
| 24 | +- `BASTReader.scala`: Split `readRepositoryOrSchemaNode()` into `readRepositoryNode()` and `readSchemaNode()` |
29 | 25 |
|
30 | | -The current serialization architecture has a design flaw: |
| 26 | +### 2. Multiple Contents Fields Serialization - FIXED (Jan 13, 2026) |
31 | 27 |
|
32 | | -```scala |
33 | | -// BASTWriter.scala:658-660 |
34 | | -private def writeSagaStep(ss: SagaStep): Unit = { |
35 | | - writer.writeU8(NODE_HANDLER) |
36 | | - writeLocation(ss.loc) |
37 | | - writeIdentifier(ss.id) |
38 | | - writeContents(ss.doStatements) // Writes count immediately |
39 | | - writeContents(ss.undoStatements) // Writes count immediately |
40 | | -} |
41 | | -``` |
| 28 | +**Problem**: Nodes with multiple Contents fields (SagaStep, IfThenElseStatement, ForEachStatement) were not serializing correctly. |
42 | 29 |
|
43 | | -The `writeContents()` method writes the count: |
| 30 | +**Solution**: |
| 31 | +- Added special cases in `traverse()` override for multi-Contents nodes |
| 32 | +- Each Contents field now writes: count, items (interleaved) |
| 33 | +- Added `NODE_SAGA_STEP` tag (34) to distinguish SagaStep from Handler |
44 | 34 |
|
45 | | -```scala |
46 | | -private def writeContents[T <: RiddlValue](contents: Contents[T]): Unit = { |
47 | | - writer.writeVarInt(contents.length) |
48 | | - // Note: Individual elements are written by the main process() method during traversal |
49 | | -} |
50 | | -``` |
51 | | - |
52 | | -But the `traverse()` override only processes the main `.contents` field: |
53 | | - |
54 | | -```scala |
55 | | -override protected def traverse(definition: RiddlValue, parents: ParentStack): Unit = { |
56 | | - definition match { |
57 | | - case branch: Branch[?] with WithMetaData => |
58 | | - process(branch, parents) |
59 | | - parents.push(branch) |
60 | | - branch.contents.foreach { value => traverse(value, parents) } // Only .contents! |
61 | | - parents.pop() |
62 | | - writeMetadataCount(branch.metadata) |
63 | | - case _ => |
64 | | - super.traverse(definition, parents) |
65 | | - } |
66 | | -} |
67 | | -``` |
| 35 | +### 3. Statement/Handler Disambiguation - FIXED (Jan 13, 2026) |
68 | 36 |
|
69 | | -**Result**: |
70 | | -- Count for `doStatements` is written |
71 | | -- Count for `undoStatements` is written |
72 | | -- Items for `doStatements` are NEVER written (not in `.contents`) |
73 | | -- Items for `undoStatements` are NEVER written (not in `.contents`) |
74 | | -- Reader expects items after the count, reads garbage data |
75 | | -- Deserialization fails with "Invalid string table index" |
| 37 | +**Problem**: Reader couldn't reliably distinguish statements from handlers. |
76 | 38 |
|
77 | | -### Affected Files |
| 39 | +**Solution**: |
| 40 | +- Added `STATEMENT_MARKER` (0xFF = 255) byte after NODE_HANDLER for statements |
| 41 | +- Reader checks for marker before interpreting statement type |
78 | 42 |
|
79 | | -**Writer**: `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTWriter.scala` |
80 | | -- Lines 655-660: `writeSagaStep()` |
81 | | -- Lines 913-920: `writeIfThenElseStatement()` |
82 | | -- Line 1682-1685: `writeContents()` |
| 43 | +### 4. Branch Types without WithMetaData - FIXED (Jan 13, 2026) |
83 | 44 |
|
84 | | -**Reader**: `bast/shared/src/main/scala/com/ossuminc/riddl/bast/BASTReader.scala` |
85 | | -- Lines 1660-1673: `readContentsDeferred()` |
| 45 | +**Problem**: Several types extend `Branch[T]` but NOT `WithMetaData`. |
86 | 46 |
|
87 | | -### Impact |
| 47 | +**Affected types**: Handler, OnClauses, Type, UseCase, Group, Output, Input |
88 | 48 |
|
89 | | -- ✅ **Works**: Files without SagaStep or IfThenElseStatement (e.g., ToDoodles) |
90 | | -- ❌ **Fails**: Files containing SagaStep or IfThenElseStatement (e.g., ShopifyCart) |
| 49 | +**Solution**: Added explicit traverse() cases for each of these types. |
91 | 50 |
|
92 | | -### Performance Impact |
93 | | - |
94 | | -When working correctly: |
95 | | -- **Small files (12 nodes)**: 1.8x speedup over parsing |
96 | | -- **Large files (510 nodes)**: 24.6x speedup over parsing (observed before deserialization failed) |
| 51 | +--- |
97 | 52 |
|
98 | | -### Solution Options |
| 53 | +## Performance Results (All Tests Passing) |
99 | 54 |
|
100 | | -#### Option 1: Special-case traverse() for multi-Contents nodes |
| 55 | +| File | Nodes | Parse Time | BAST Read | Speedup | Status | |
| 56 | +|------|-------|------------|-----------|---------|--------| |
| 57 | +| ToDoodles | 12 | ~6ms | ~6ms | ~1x | ✅ Working | |
| 58 | +| ShopifyCart | 667 | ~77ms | ~4ms | **20.1x** | ✅ Working | |
101 | 59 |
|
102 | | -Extend the `traverse()` override to detect and handle nodes with multiple Contents fields: |
| 60 | +--- |
103 | 61 |
|
104 | | -```scala |
105 | | -override protected def traverse(definition: RiddlValue, parents: ParentStack): Unit = { |
106 | | - definition match { |
107 | | - case ss: SagaStep => |
108 | | - process(ss, parents) |
109 | | - parents.push(ss) |
110 | | - ss.doStatements.foreach { value => traverse(value, parents) } |
111 | | - ss.undoStatements.foreach { value => traverse(value, parents) } |
112 | | - parents.pop() |
113 | | - writeMetadataCount(ss.metadata) |
| 62 | +## Technical Notes |
114 | 63 |
|
115 | | - case ite: IfThenElseStatement => |
116 | | - process(ite, parents) |
117 | | - parents.push(ite) |
118 | | - ite.thens.foreach { value => traverse(value, parents) } |
119 | | - ite.elses.foreach { value => traverse(value, parents) } |
120 | | - parents.pop() |
121 | | - // No metadata for statements |
| 64 | +### Write/Read Order |
122 | 65 |
|
123 | | - case branch: Branch[?] with WithMetaData => |
124 | | - // ... existing code |
125 | | - } |
126 | | -} |
127 | 66 | ``` |
| 67 | +Writer produces for Branch nodes: |
| 68 | + tag, nodeSpecificFields..., contentsCount, contentItems..., metadataCount, metadataItems... |
128 | 69 |
|
129 | | -**Pros**: Minimal changes, surgical fix |
130 | | -**Cons**: Must remember to update for any future multi-Contents nodes |
131 | | - |
132 | | -#### Option 2: Refactor writeContents() to defer writing |
133 | | - |
134 | | -Change `writeContents()` to not write anything immediately. Instead, track pending Contents writes and emit them during traversal. |
135 | | - |
136 | | -**Pros**: More robust, handles any future cases |
137 | | -**Cons**: Significant refactoring, more complex state management |
138 | | - |
139 | | -#### Option 3: Two-phase serialization |
140 | | - |
141 | | -Separate count-writing from item-writing phases. |
142 | | - |
143 | | -**Pros**: Clean separation of concerns |
144 | | -**Cons**: Requires complete redesign of serialization |
145 | | - |
146 | | -### Recommended Fix |
147 | | - |
148 | | -**Option 1** (special-case traverse) is recommended because: |
149 | | -1. Minimal code changes |
150 | | -2. Easy to understand and verify |
151 | | -3. Only 2 node types affected (possibly only 1 if IfThenElseStatement is removed) |
152 | | -4. Fast to implement and test |
153 | | - |
154 | | -### Test Cases Needed |
155 | | - |
156 | | -After fix: |
157 | | -1. ✅ Verify ToDoodles still works (regression test) |
158 | | -2. ✅ Verify ShopifyCart serializes/deserializes correctly |
159 | | -3. ✅ Create test specifically for SagaStep round-trip |
160 | | -4. ✅ Create test for IfThenElseStatement round-trip (if not removed) |
161 | | -5. ✅ Verify performance benchmarks still show speedup |
| 70 | +Reader expects (after tag dispatch): |
| 71 | + nodeSpecificFields..., contentsCount+Items (via readContentsDeferred), metadataCount+Items |
| 72 | +``` |
162 | 73 |
|
163 | | -### Related Files |
| 74 | +### Statement Format |
164 | 75 |
|
165 | | -- `bast/jvm/src/test/scala/com/ossuminc/riddl/bast/BenchmarkRunner.scala` - Performance benchmark |
166 | | -- `bast/jvm/src/test/scala/com/ossuminc/riddl/bast/TestRunner.scala` - Round-trip test |
167 | | -- `bast/shared/src/test/scala/com/ossuminc/riddl/bast/DeepASTComparison.scala` - Deep comparison utility |
| 76 | +``` |
| 77 | +Statement: NODE_HANDLER, STATEMENT_MARKER(255), stmtType(0-16), loc, stmtSpecificData... |
| 78 | +Handler: NODE_HANDLER, loc, id, contentsCount, contentItems..., metadataCount, metadataItems... |
| 79 | +``` |
168 | 80 |
|
169 | | -### Notes |
| 81 | +### Location Encoding |
170 | 82 |
|
171 | | -- The issue does NOT affect metadata serialization (fixed in earlier session) |
172 | | -- The issue does NOT affect Include node preservation (working correctly) |
173 | | -- The issue does NOT affect location delta encoding (working correctly) |
174 | | -- The issue ONLY affects nodes with multiple Contents fields |
| 83 | +``` |
| 84 | +First location: originString, offset, line, col |
| 85 | +Delta location: sameSourceFlag(0/1), [originString if flag=1], offsetDelta+1M, lineDelta+1K, colDelta+1K |
| 86 | +``` |
175 | 87 |
|
176 | 88 | --- |
177 | 89 |
|
178 | | -**Last Updated**: January 11, 2026 |
179 | | -**Severity**: High |
180 | | -**Priority**: Must fix before production use |
| 90 | +**Last Updated**: January 14, 2026 |
| 91 | +**Severity**: None (all issues resolved) |
| 92 | +**Priority**: N/A |
0 commit comments