Skip to content

Commit 088fd8c

Browse files
reidspencerclaude
andcommitted
Document BAST design decisions and lazy loading analysis
- Compression: Will NOT be implemented (HTTP already provides gzip) - Incremental updates: Noted as future area of interest - Lazy loading: Full analysis with recommendation to not implement now Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 6dcd7b5 commit 088fd8c

1 file changed

Lines changed: 105 additions & 0 deletions

File tree

NOTEBOOK.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,111 @@ JVM/Native only (JS returns error message since browser can't do local file I/O)
146146

147147
- ~~How should BAST versioning handle breaking format changes?~~ **Resolved**: Single monotonically incrementing 32-bit integer, stays at 1 during development, increment only after schema finalization for users
148148

149+
### Design Decisions
150+
151+
**Compression: Will NOT be implemented**
152+
153+
The `Flags.COMPRESSED` flag in the header is reserved but will never be used. Rationale:
154+
- BAST's primary use case is HTTP transport (web-based tools, APIs)
155+
- HTTP already provides transparent gzip/brotli compression
156+
- Adding compression at the BAST layer would be redundant
157+
- Would add CPU overhead on both ends for no benefit
158+
- Simpler format = easier debugging and cross-platform compatibility
159+
160+
**Incremental Updates: Future consideration**
161+
162+
Supporting partial BAST updates when source changes slightly could be valuable for:
163+
- Large projects where only one file changes
164+
- IDE integrations that need fast refresh
165+
- CI/CD pipelines with incremental builds
166+
167+
This is **not currently planned** but noted as an area of potential future interest.
168+
169+
**Lazy Loading: Under evaluation**
170+
171+
See "Future Considerations: Lazy Loading" section below for analysis.
172+
173+
---
174+
175+
## Future Considerations: Lazy Loading
176+
177+
### What is Lazy Loading?
178+
179+
Instead of deserializing the entire BAST file into memory on load, lazy loading would:
180+
1. Memory-map the BAST file (or keep bytes in memory)
181+
2. Parse only the header and string table upfront
182+
3. Deserialize individual nodes on-demand when accessed
183+
4. Cache deserialized nodes for subsequent access
184+
185+
### Current Implementation
186+
187+
The current `BASTReader.read()` approach:
188+
```
189+
1. Read header (32 bytes)
190+
2. Validate checksum
191+
3. Load entire string table into memory
192+
4. Recursively deserialize ALL nodes starting from root
193+
5. Return complete Nebula AST in memory
194+
```
195+
196+
### Potential Lazy Implementation
197+
198+
```
199+
1. Read header (32 bytes)
200+
2. Keep byte array reference (or mmap file)
201+
3. Load string table (required for any node access)
202+
4. Return LazyNebula proxy with root offset
203+
5. On first access to contents: deserialize children
204+
6. Cache deserialized nodes in WeakHashMap
205+
```
206+
207+
### Benefits
208+
209+
| Benefit | Impact | Use Case |
210+
|---------|--------|----------|
211+
| Faster initial load | High | Large BAST files where only part is needed |
212+
| Lower memory peak | Medium | Memory-constrained environments |
213+
| Incremental parsing | Medium | IDE features that only need specific nodes |
214+
| Partial file access | Low | Extracting single definition from large module |
215+
216+
### Drawbacks
217+
218+
| Drawback | Severity | Mitigation |
219+
|----------|----------|------------|
220+
| More complex code | High | Significant refactoring of reader |
221+
| Random access overhead | Medium | Cache frequently accessed nodes |
222+
| Memory mapping complexity | Medium | Platform-specific (JS has no mmap) |
223+
| Debugging difficulty | Medium | Harder to trace deserialization issues |
224+
| Thread safety concerns | Low | Need synchronization for cache |
225+
226+
### Performance Analysis
227+
228+
**Current approach** (eager loading):
229+
- large.riddl (43KB source → 29KB BAST): ~0.78ms warm load
230+
- All 1,296 nodes deserialized upfront
231+
- Memory: ~full AST size in heap
232+
233+
**Lazy approach** (estimated):
234+
- Initial load: ~0.1-0.2ms (header + string table only)
235+
- Per-node access: ~0.001-0.01ms (amortized with caching)
236+
- Full traversal: Similar to eager (~0.8-1.0ms with cache overhead)
237+
238+
### Recommendation
239+
240+
**Do not implement lazy loading at this time.**
241+
242+
Reasons:
243+
1. Current load times are already excellent (sub-millisecond for typical files)
244+
2. RIDDL files are typically small enough to fit entirely in memory
245+
3. Most use cases need the full AST anyway (validation, transformation)
246+
4. Implementation complexity is significant
247+
5. Cross-platform concerns (JS cannot memory-map)
248+
249+
**When to reconsider:**
250+
- If BAST files regularly exceed 10MB
251+
- If use cases emerge that only need partial AST access
252+
- If memory constraints become a real issue
253+
149254
---
150255

151256
## Planned: AsciiDoc Generation Module

0 commit comments

Comments
 (0)