Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969
Draft
abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
Draft
Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
Conversation
…r, benchmark Add per-phase timing (getEntry, parse, findMerge, daoWrite) to F3FileJournal and FileJournal replay. Each phase emits a PM entry at replay completion. Add JacksonJournalParser — a Jackson-based parser that handles flat journal entries ~4.5x faster than the FOAM combinator parser. Entries with nested FObjects or FOAM-specific syntax (triple-quoted strings, backtick templates) fall back to the FOAM parser per-entry. Replace regex comment check with charAt(0)=='/' — 9x faster. Add JournalReplayBenchmark (1M entry end-to-end: FOAM vs Jackson) and JacksonParserCorrectnessTest (property-by-property comparison across 6 model scenarios including User, CSpec, Cron, Group, Enum, Reference, StringArray).
Add replayInlined (FOAM parser) and replayJacksonInlined (Jackson primary + FOAM fallback) benchmark methods that replay journals with parse + DAO put inlined directly in the loop -- no SyncAssemblyLine, no Assembly objects, no synchronized blocks. These provide a baseline for comparing against Assembly-wrapped variants to isolate synchronization overhead.
…ison table Three new benchmark methods: - replayWithPooledPStream: FOAM parser using PooledStringPStream arena allocator - replayAllOptimizations: Jackson primary + FOAM+Pool fallback, no AssemblyLine - replayParallel: N-thread parallel parsing with per-thread parsers and MDAO merge Replace text summary with formatted comparison table showing wall time, entries/sec, and speedup vs FOAM baseline for all seven variants.
Four independent optimizations to the FOAM combinator parser,
each benchmarked independently against 1M journal entries (981 MB):
1. Inline SkipParser (ModelParserFactory.java)
Replace 5-layer combinator stack (Repeat0→Alt→Seq0→Literal→WS)
with direct char checks in a single while-loop.
Result: +24.8% (37.53s → 28.24s)
2. CharLiteral specialization (new CharLiteral.java + AbstractLiteral.java)
Single-char literals ({, }, :, ,) skip the loop and String.charAt
in AbstractLiteral.parse(). Factory auto-selects for length==1.
Result: +2.6% (28.24s → 27.51s)
3. Remove String.intern() (StringParser.java)
intern() hits the JVM global native string table with C++ sync
~30M times. Most journal strings are unique — pure overhead.
Result: +25.1% (27.51s → 21.99s)
4. Arithmetic DoubleParser (DoubleParser.java)
Replace StringBuilder + Double.valueOf(sb.toString()) with direct
long arithmetic accumulation. Zero heap allocation per number.
Result: +14.4% (21.99s → 18.82s)
Also tested and REJECTED:
- PrefixAlt flat char[] dispatch: -12.3% regression (Character.compare
is already JIT-optimized as an intrinsic)
Cumulative: 37.53s → 18.82s (49.9% faster, ~2x throughput)
FOAM gap to Jackson narrowed from 4.8x to ~2.4x
Tests verified:
- GrammarCombinatorsJavaTest: 81/81 passed
- F3FileJournalTest: 32/32 passed
- JournalReplayBenchmark: all 8 assertions passed
StringParser previously called ps.apply(delimiter, x) on every character
to check for the closing quote — a Literal.parse() virtual dispatch per
char. For double-quoted strings without escape sequences (95%+ of journal
strings), now uses String.indexOf() to find the closing quote in one
native call, then extracts the substring directly.
Follows the same pattern as UntilLiteral (indexOf on StringPStream) but
adds a backslash pre-check: if indexOf('\\') finds an escape before the
closing quote, falls back to the existing char-by-char loop.
Added getString() and createAt() to StringPStream for cross-package
access to the underlying string and position-based construction.
Result: 20.94s → 17.45s (+16.7%), FOAM gap to Jackson: 2.0x
…Factory
Replace Seq0(SKIP, Literal(':'), SKIP, valueParser) per property with a
single parser that does direct char checks: inline whitespace skip,
direct ':' comparison, inline whitespace skip, then value parse + set.
Eliminates 4 combinator layers (Seq0 dispatch) x 50 properties x 1M
entries = 200M virtual dispatch calls removed.
Also replace the outer Repeat0(Seq0(SKIP, Alt, SKIP), Literal(',')) with
an inlined property loop: direct whitespace skip, PrefixAlt dispatch,
direct ',' char check. Eliminates Repeat0 + Seq0 + Literal overhead.
Result: 17.45s → 15.53s (+11.0%), FOAM gap to Jackson: 1.7x
Created FastStringPStream that replaces Reference<CharSequence> with direct char[] access. head() becomes chars_[pos_] instead of str.get().charAt(pos). Hypothesis: removing volatile read + method indirection on 700M head() calls per run would measurably speed up. Result: 23.56s vs PooledPStream 24.15s — within noise. No improvement vs baseline 15.87s. The JIT compiler already optimizes Reference.get().charAt() effectively. This confirms the remaining 1.7x gap to Jackson is NOT from Reference indirection — it's from PStream object allocation (tail/setValue creating new objects) and combinator virtual dispatch.
Collaborator
Author
Benchmark Results (1M entries, 981 MB)Per-Phase Breakdown
Comparison Table
Key Observations
|
Extend the replay benchmark with two new paths kept alongside the existing variants for direct comparison. SyncAssemblyLine vs SimpleAsyncAssemblyLine: mirror F3FileJournal's production shape (parse in executeJob, put in endJob). Moving the assembly line from sync to async parallelizes parse across a worker pool while the dedicated end thread keeps install serial. Without touching the FOAM parser, this reaches 2.6x replay throughput at a 10-thread pool. CircularStringPStream: 256-char sliding window PStream added next to FastStringPStream and PooledStringPStream. Ring copy and out-of-window substring fallback outweigh any cache gain; regresses to 0.5x of baseline, confirming storage format is not the parse bottleneck. SimpleAsyncAssemblyLine.java is pulled forward from upstream development so the branch builds standalone.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Journal replay at boot dominates startup time. On deployments with large data journals (8+ GB across multiple DAOs), replay takes ~45 minutes with parse consuming 52% of that time. The FOAM combinator parser processes entries at ~27K entries/sec. Jackson's streaming parser achieves ~116K entries/sec on the same data — a 4.5x gap.
No per-phase timing exists, so it was impossible to identify which replay phase (read, parse, find/merge, DAO write) is the bottleneck.
Solution
Three categories of changes: instrumentation, a Jackson fast-path parser, and 6 FOAM parser optimizations. A comprehensive benchmark suite validates every change.
1. Per-Phase PM Instrumentation
Added
System.nanoTime()accumulators around each phase ofF3FileJournal.replayandFileJournal.replay. Each phase emits a PM entry at replay completion:replay.<file>:getEntry— BufferedReader line assemblyreplay.<file>:parse— JSON parse onlyreplay.<file>:findMerge— dao.find(id) + mergeFObjectreplay.<file>:daoWrite— dao.put / dao.removeCompletion log now includes operation counts (
opCreate,opPut,opPutMerged,opRemove,commentsSkipped) and parser usage (Jackson=N/fallback=N).Replaced
COMMENT.matcher(entry).matches()withentry.charAt(0) == '/'— 9x faster on 1M lines.2. Jackson Fast-Path Parser
JacksonJournalParseruses Jackson'sObjectMapperto tokenize journal JSON and FOAMPropertyInfo.set()/cast()to populate FObjects. For each entry during replay:Map<String, Object>forOrdinal())Flat data entries (vast majority of large journals) take the Jackson path. Config entries with nested objects fall back to FOAM per-entry.
3. FOAM Parser Optimizations
Six changes to the combinator parser internals that benefit all FOAM grammars (JSON, VSS, MCBS, etc.), not just journal replay:
ModelParserFactory.javaStringParser.javaDoubleParser.javaStringParser.java,StringPStream.javaModelParserFactory.javaAbstractLiteral.java,CharLiteral.javaTested and rejected (kept on branch for reproducibility):
4. Parallel File Loading (Experiment)
Benchmark demonstrates that splitting a journal into N chunks and replaying on N threads yields 3.5-4.2x speedup on a 10-core machine. Each thread gets its own parser instances and MDAO. Worth implementing at the EasyDAO/boot level.
Benchmark Results (1M entries, 981 MB)
FOAM parser: 2.4x faster. Gap to Jackson narrowed from 4.8x to 1.8x.
The remaining 1.8x gap is fundamental combinator overhead — PStream object creation on every tail() and setValue(). Inherent to the functional PStream contract that all FOAM grammars rely on.
Changes
Replay instrumentation:
F3FileJournal.js— per-phase PM instrumentation, Jackson fast path with FOAM fallback, charAt comment check, operation countsFileJournal.js— same instrumentation for legacy formatJackson parser:
JacksonJournalParser.js— new: Jackson ObjectMapper with unquoted field names, Integer-to-Long promotion, nested Map detection, triple-quote/backtick normalizationFOAM parser optimizations:
ModelParserFactory.java— inline SkipParser, inline property value parser, inline comma separatorStringParser.java— remove String.intern(), add indexOf fast path for no-escape stringsDoubleParser.java— arithmetic accumulation replaces StringBuilder + Double.valueOfAbstractLiteral.java— factory returns CharLiteral for single-char stringsCharLiteral.java— new: optimized single-character literal parserStringPStream.java— add getString() and createAt() for cross-package indexOf accessJSONParser.js— add parseStringPooled() and parseStringFast() for benchmark variantsBenchmark infrastructure (experiment playground):
JournalReplayBenchmark.js— 9 replay variants with comparison tableBenchmarkModel.js— 50-property model covering all FOAM typesJacksonParserCorrectnessTest.js— property-by-property FOAM vs Jackson comparisonPooledStringPStream.java— arena allocator (experiment, -20% regression)FastStringPStream.java— char[] direct access (experiment, ~0% change)Tests verified:
Running