Skip to content

Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969

Draft
abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
journal-perf-upstream
Draft

Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969
abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
journal-perf-upstream

Conversation

@abdelaziz-mahdy
Copy link
Copy Markdown
Collaborator

Problem

Journal replay at boot dominates startup time. On deployments with large data journals (8+ GB across multiple DAOs), replay takes ~45 minutes with parse consuming 52% of that time. The FOAM combinator parser processes entries at ~27K entries/sec. Jackson's streaming parser achieves ~116K entries/sec on the same data — a 4.5x gap.

No per-phase timing exists, so it was impossible to identify which replay phase (read, parse, find/merge, DAO write) is the bottleneck.

Solution

Three categories of changes: instrumentation, a Jackson fast-path parser, and 6 FOAM parser optimizations. A comprehensive benchmark suite validates every change.

1. Per-Phase PM Instrumentation

Added System.nanoTime() accumulators around each phase of F3FileJournal.replay and FileJournal.replay. Each phase emits a PM entry at replay completion:

  • replay.<file>:getEntry — BufferedReader line assembly
  • replay.<file>:parse — JSON parse only
  • replay.<file>:findMerge — dao.find(id) + mergeFObject
  • replay.<file>:daoWrite — dao.put / dao.remove

Completion log now includes operation counts (opCreate, opPut, opPutMerged, opRemove, commentsSkipped) and parser usage (Jackson=N/fallback=N).

Replaced COMMENT.matcher(entry).matches() with entry.charAt(0) == '/' — 9x faster on 1M lines.

2. Jackson Fast-Path Parser

JacksonJournalParser uses Jackson's ObjectMapper to tokenize journal JSON and FOAM PropertyInfo.set()/cast() to populate FObjects. For each entry during replay:

  1. Jackson parses JSON into Map<String, Object>
  2. If any value is a nested Map or List-of-Maps (nested FObject), returns null — entry falls back to FOAM parser
  3. Otherwise populates the FObject via PropertyInfo with Integer-to-Long promotion (excluding Enum properties which need Integer for forOrdinal())

Flat data entries (vast majority of large journals) take the Jackson path. Config entries with nested objects fall back to FOAM per-entry.

3. FOAM Parser Optimizations

Six changes to the combinator parser internals that benefit all FOAM grammars (JSON, VSS, MCBS, etc.), not just journal replay:

# Optimization File(s) Gain What it eliminates
1 Inline SkipParser ModelParserFactory.java +24.8% 5-layer combinator stack (Repeat0→Alt→Seq0→Literal→WS) replaced with direct char loop
2 Remove String.intern() StringParser.java +25.1% Native string table sync on ~30M unique strings per replay
3 Arithmetic DoubleParser DoubleParser.java +14.4% StringBuilder + Double.valueOf per number replaced with long arithmetic
4 Bulk String Scanning StringParser.java, StringPStream.java +16.7% Per-character delimiter check replaced with indexOf — same pattern as UntilLiteral
5 Inline Property+Comma ModelParserFactory.java +11.0% Seq0 + Literal dispatch per property (200M virtual calls) replaced with direct char checks
6 CharLiteral AbstractLiteral.java, CharLiteral.java +2.6% Single-char literals skip loop and String.charAt in AbstractLiteral.parse()

Tested and rejected (kept on branch for reproducibility):

Experiment Result Why
PooledStringPStream (arena allocator) -20% slower JVM young-gen GC handles short-lived objects better than manual pooling
FastStringPStream (char[] direct access) ~0% change JIT already optimizes Reference.get().charAt()
PrefixAlt flat char[] dispatch -12.3% Character.compare() is a JIT intrinsic
Remove SyncAssemblyLine ~0% Overhead is negligible

4. Parallel File Loading (Experiment)

Benchmark demonstrates that splitting a journal into N chunks and replaying on N threads yields 3.5-4.2x speedup on a 10-core machine. Each thread gets its own parser instances and MDAO. Worth implementing at the EasyDAO/boot level.

Benchmark Results (1M entries, 981 MB)

Variant                                 Wall(s)  Entries/s  vs Original
Original FOAM                            37.53     26,642      1.0x
FOAM + 6 parser optimizations            15.87     63,010      2.4x
Jackson                                   8.64    115,718      4.3x
Parallel (10 threads, Jackson+FOAM)       3.74    267,069     10.0x

FOAM parser: 2.4x faster. Gap to Jackson narrowed from 4.8x to 1.8x.

The remaining 1.8x gap is fundamental combinator overhead — PStream object creation on every tail() and setValue(). Inherent to the functional PStream contract that all FOAM grammars rely on.

Changes

Replay instrumentation:

  • F3FileJournal.js — per-phase PM instrumentation, Jackson fast path with FOAM fallback, charAt comment check, operation counts
  • FileJournal.js — same instrumentation for legacy format

Jackson parser:

  • JacksonJournalParser.js — new: Jackson ObjectMapper with unquoted field names, Integer-to-Long promotion, nested Map detection, triple-quote/backtick normalization

FOAM parser optimizations:

  • ModelParserFactory.java — inline SkipParser, inline property value parser, inline comma separator
  • StringParser.java — remove String.intern(), add indexOf fast path for no-escape strings
  • DoubleParser.java — arithmetic accumulation replaces StringBuilder + Double.valueOf
  • AbstractLiteral.java — factory returns CharLiteral for single-char strings
  • CharLiteral.java — new: optimized single-character literal parser
  • StringPStream.java — add getString() and createAt() for cross-package indexOf access
  • JSONParser.js — add parseStringPooled() and parseStringFast() for benchmark variants

Benchmark infrastructure (experiment playground):

  • JournalReplayBenchmark.js — 9 replay variants with comparison table
  • BenchmarkModel.js — 50-property model covering all FOAM types
  • JacksonParserCorrectnessTest.js — property-by-property FOAM vs Jackson comparison
  • PooledStringPStream.java — arena allocator (experiment, -20% regression)
  • FastStringPStream.java — char[] direct access (experiment, ~0% change)

Tests verified:

  • GrammarCombinatorsJavaTest: 81/81 passed
  • F3FileJournalTest: 32/32 passed
  • JournalReplayBenchmark: 9/9 passed

Running

# Full benchmark (1M entries, ~2 minutes)
./build.sh -W9090 --flags:test server-tests:JournalReplayBenchmark

# Correctness test
./build.sh -W9090 --flags:test server-tests:JacksonParserCorrectnessTest

# Parser combinator tests
./build.sh -W9090 --flags:test server-tests:GrammarCombinatorsJavaTest

…r, benchmark

Add per-phase timing (getEntry, parse, findMerge, daoWrite) to F3FileJournal
and FileJournal replay. Each phase emits a PM entry at replay completion.

Add JacksonJournalParser — a Jackson-based parser that handles flat journal
entries ~4.5x faster than the FOAM combinator parser. Entries with nested
FObjects or FOAM-specific syntax (triple-quoted strings, backtick templates)
fall back to the FOAM parser per-entry.

Replace regex comment check with charAt(0)=='/' — 9x faster.

Add JournalReplayBenchmark (1M entry end-to-end: FOAM vs Jackson) and
JacksonParserCorrectnessTest (property-by-property comparison across 6 model
scenarios including User, CSpec, Cron, Group, Enum, Reference, StringArray).
Add replayInlined (FOAM parser) and replayJacksonInlined (Jackson primary
+ FOAM fallback) benchmark methods that replay journals with parse + DAO
put inlined directly in the loop -- no SyncAssemblyLine, no Assembly
objects, no synchronized blocks. These provide a baseline for comparing
against Assembly-wrapped variants to isolate synchronization overhead.
…ison table

Three new benchmark methods:
- replayWithPooledPStream: FOAM parser using PooledStringPStream arena allocator
- replayAllOptimizations: Jackson primary + FOAM+Pool fallback, no AssemblyLine
- replayParallel: N-thread parallel parsing with per-thread parsers and MDAO merge

Replace text summary with formatted comparison table showing wall time, entries/sec,
and speedup vs FOAM baseline for all seven variants.
Four independent optimizations to the FOAM combinator parser,
each benchmarked independently against 1M journal entries (981 MB):

1. Inline SkipParser (ModelParserFactory.java)
   Replace 5-layer combinator stack (Repeat0→Alt→Seq0→Literal→WS)
   with direct char checks in a single while-loop.
   Result: +24.8% (37.53s → 28.24s)

2. CharLiteral specialization (new CharLiteral.java + AbstractLiteral.java)
   Single-char literals ({, }, :, ,) skip the loop and String.charAt
   in AbstractLiteral.parse(). Factory auto-selects for length==1.
   Result: +2.6% (28.24s → 27.51s)

3. Remove String.intern() (StringParser.java)
   intern() hits the JVM global native string table with C++ sync
   ~30M times. Most journal strings are unique — pure overhead.
   Result: +25.1% (27.51s → 21.99s)

4. Arithmetic DoubleParser (DoubleParser.java)
   Replace StringBuilder + Double.valueOf(sb.toString()) with direct
   long arithmetic accumulation. Zero heap allocation per number.
   Result: +14.4% (21.99s → 18.82s)

Also tested and REJECTED:
- PrefixAlt flat char[] dispatch: -12.3% regression (Character.compare
  is already JIT-optimized as an intrinsic)

Cumulative: 37.53s → 18.82s (49.9% faster, ~2x throughput)
FOAM gap to Jackson narrowed from 4.8x to ~2.4x

Tests verified:
- GrammarCombinatorsJavaTest: 81/81 passed
- F3FileJournalTest: 32/32 passed
- JournalReplayBenchmark: all 8 assertions passed
StringParser previously called ps.apply(delimiter, x) on every character
to check for the closing quote — a Literal.parse() virtual dispatch per
char. For double-quoted strings without escape sequences (95%+ of journal
strings), now uses String.indexOf() to find the closing quote in one
native call, then extracts the substring directly.

Follows the same pattern as UntilLiteral (indexOf on StringPStream) but
adds a backslash pre-check: if indexOf('\\') finds an escape before the
closing quote, falls back to the existing char-by-char loop.

Added getString() and createAt() to StringPStream for cross-package
access to the underlying string and position-based construction.

Result: 20.94s → 17.45s (+16.7%), FOAM gap to Jackson: 2.0x
…Factory

Replace Seq0(SKIP, Literal(':'), SKIP, valueParser) per property with a
single parser that does direct char checks: inline whitespace skip,
direct ':' comparison, inline whitespace skip, then value parse + set.
Eliminates 4 combinator layers (Seq0 dispatch) x 50 properties x 1M
entries = 200M virtual dispatch calls removed.

Also replace the outer Repeat0(Seq0(SKIP, Alt, SKIP), Literal(',')) with
an inlined property loop: direct whitespace skip, PrefixAlt dispatch,
direct ',' char check. Eliminates Repeat0 + Seq0 + Literal overhead.

Result: 17.45s → 15.53s (+11.0%), FOAM gap to Jackson: 1.7x
Created FastStringPStream that replaces Reference<CharSequence> with
direct char[] access. head() becomes chars_[pos_] instead of
str.get().charAt(pos). Hypothesis: removing volatile read + method
indirection on 700M head() calls per run would measurably speed up.

Result: 23.56s vs PooledPStream 24.15s — within noise.
No improvement vs baseline 15.87s.

The JIT compiler already optimizes Reference.get().charAt() effectively.
This confirms the remaining 1.7x gap to Jackson is NOT from
Reference indirection — it's from PStream object allocation
(tail/setValue creating new objects) and combinator virtual dispatch.
@abdelaziz-mahdy abdelaziz-mahdy marked this pull request as draft April 15, 2026 01:18
@abdelaziz-mahdy
Copy link
Copy Markdown
Collaborator Author

Benchmark Results (1M entries, 981 MB)

Per-Phase Breakdown

Variant Wall Read (I/O) Parse MDAO Put Overhead Entries/sec
FOAM Parser (baseline) 15.29s 0.83s (5.4%) 13.85s (90.6%) 0.50s (3.3%) 0.11s (0.7%) 65,397
Jackson (ceiling) 7.11s 0.74s (10.4%) 5.79s (81.5%) 0.44s (6.2%) 0.14s (1.9%) 140,648
FOAM Inlined (no AssemblyLine) 15.35s 0.73s (4.8%) 14.03s (91.4%) 0.47s (3.1%) 0.12s (0.8%) 65,138
Jackson Inlined 7.07s 0.71s (10.0%) 5.76s (81.5%) 0.50s (7.0%) 0.10s (1.5%) 141,360
FOAM + Pooled PStream 23.96s 0.84s (3.5%) 22.47s (93.8%) 0.51s (2.1%) 0.14s (0.6%) 41,737
FOAM + FastPStream (char[]) 20.92s 0.71s (3.4%) 19.65s (93.9%) 0.49s (2.3%) 0.08s (0.4%) 47,805
All Optimizations Combined 7.16s 0.71s (9.9%) 5.90s (82.4%) 0.45s (6.3%) 0.10s (1.5%) 139,609
Parallel (10 threads) 2.69s -- 1.98s (parse) 0.72s (merge) -- 371,124

Comparison Table

Variant Wall(s) Entries/s vs FOAM
FOAM Parser (baseline) 15.29 65,397 1.0x
Jackson (ceiling) 7.11 140,648 2.2x
FOAM Inlined (no AssemblyLine) 15.35 65,138 1.0x
Jackson Inlined 7.07 141,360 2.2x
FOAM + Pooled PStream 23.96 41,737 0.6x
FOAM + FastPStream (char[]) 20.92 47,805 0.7x
All Optimizations Combined 7.16 139,609 2.1x
Parallel (10 threads) 2.69 371,124 5.7x

Key Observations

  • Parse dominates -- 81-94% of wall time across all variants. I/O is 3-10%, MDAO put is 2-7%.
  • SyncAssemblyLine overhead: ~0% -- 15.29s vs 15.35s, not a bottleneck.
  • Pooled/FastPStream regressed -- JVM young-gen GC handles short-lived objects better than manual pooling.
  • Comment check: regex 73.1ms vs charAt 4.7ms = 15.4x speedup.
  • Parallel (10 threads, same file): 2.69s total (1.98s parse + 0.72s merge) = 5.7x vs baseline.

Extend the replay benchmark with two new paths kept alongside the existing
variants for direct comparison.

SyncAssemblyLine vs SimpleAsyncAssemblyLine: mirror F3FileJournal's
production shape (parse in executeJob, put in endJob). Moving the
assembly line from sync to async parallelizes parse across a worker pool
while the dedicated end thread keeps install serial. Without touching
the FOAM parser, this reaches 2.6x replay throughput at a 10-thread pool.

CircularStringPStream: 256-char sliding window PStream added next to
FastStringPStream and PooledStringPStream. Ring copy and out-of-window
substring fallback outweigh any cache gain; regresses to 0.5x of
baseline, confirming storage format is not the parse bottleneck.

SimpleAsyncAssemblyLine.java is pulled forward from upstream
development so the branch builds standalone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant