Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite by abdelaziz-mahdy · Pull Request #4969 · kgrgreer/foam3

abdelaziz-mahdy · 2026-04-15T01:18:26Z

Problem

Journal replay at boot dominates startup time. On deployments with large data journals (8+ GB across multiple DAOs), replay takes ~45 minutes with parse consuming 52% of that time. The FOAM combinator parser processes entries at ~27K entries/sec. Jackson's streaming parser achieves ~116K entries/sec on the same data — a 4.5x gap.

No per-phase timing exists, so it was impossible to identify which replay phase (read, parse, find/merge, DAO write) is the bottleneck.

Solution

Three categories of changes: instrumentation, a Jackson fast-path parser, and 6 FOAM parser optimizations. A comprehensive benchmark suite validates every change.

1. Per-Phase PM Instrumentation

Added System.nanoTime() accumulators around each phase of F3FileJournal.replay and FileJournal.replay. Each phase emits a PM entry at replay completion:

replay.<file>:getEntry — BufferedReader line assembly
replay.<file>:parse — JSON parse only
replay.<file>:findMerge — dao.find(id) + mergeFObject
replay.<file>:daoWrite — dao.put / dao.remove

Completion log now includes operation counts (opCreate, opPut, opPutMerged, opRemove, commentsSkipped) and parser usage (Jackson=N/fallback=N).

Replaced COMMENT.matcher(entry).matches() with entry.charAt(0) == '/' — 9x faster on 1M lines.

2. Jackson Fast-Path Parser

JacksonJournalParser uses Jackson's ObjectMapper to tokenize journal JSON and FOAM PropertyInfo.set()/cast() to populate FObjects. For each entry during replay:

Jackson parses JSON into Map<String, Object>
If any value is a nested Map or List-of-Maps (nested FObject), returns null — entry falls back to FOAM parser
Otherwise populates the FObject via PropertyInfo with Integer-to-Long promotion (excluding Enum properties which need Integer for forOrdinal())

Flat data entries (vast majority of large journals) take the Jackson path. Config entries with nested objects fall back to FOAM per-entry.

3. FOAM Parser Optimizations

Six changes to the combinator parser internals that benefit all FOAM grammars (JSON, VSS, MCBS, etc.), not just journal replay:

#	Optimization	File(s)	Gain	What it eliminates
1	Inline SkipParser	`ModelParserFactory.java`	+24.8%	5-layer combinator stack (Repeat0→Alt→Seq0→Literal→WS) replaced with direct char loop
2	Remove String.intern()	`StringParser.java`	+25.1%	Native string table sync on ~30M unique strings per replay
3	Arithmetic DoubleParser	`DoubleParser.java`	+14.4%	StringBuilder + Double.valueOf per number replaced with long arithmetic
4	Bulk String Scanning	`StringParser.java`, `StringPStream.java`	+16.7%	Per-character delimiter check replaced with indexOf — same pattern as UntilLiteral
5	Inline Property+Comma	`ModelParserFactory.java`	+11.0%	Seq0 + Literal dispatch per property (200M virtual calls) replaced with direct char checks
6	CharLiteral	`AbstractLiteral.java`, `CharLiteral.java`	+2.6%	Single-char literals skip loop and String.charAt in AbstractLiteral.parse()

Tested and rejected (kept on branch for reproducibility):

Experiment	Result	Why
PooledStringPStream (arena allocator)	-20% slower	JVM young-gen GC handles short-lived objects better than manual pooling
FastStringPStream (char[] direct access)	~0% change	JIT already optimizes Reference.get().charAt()
PrefixAlt flat char[] dispatch	-12.3%	Character.compare() is a JIT intrinsic
Remove SyncAssemblyLine	~0%	Overhead is negligible

4. Parallel File Loading (Experiment)

Benchmark demonstrates that splitting a journal into N chunks and replaying on N threads yields 3.5-4.2x speedup on a 10-core machine. Each thread gets its own parser instances and MDAO. Worth implementing at the EasyDAO/boot level.

Benchmark Results (1M entries, 981 MB)

Variant                                 Wall(s)  Entries/s  vs Original
Original FOAM                            37.53     26,642      1.0x
FOAM + 6 parser optimizations            15.87     63,010      2.4x
Jackson                                   8.64    115,718      4.3x
Parallel (10 threads, Jackson+FOAM)       3.74    267,069     10.0x

FOAM parser: 2.4x faster. Gap to Jackson narrowed from 4.8x to 1.8x.

The remaining 1.8x gap is fundamental combinator overhead — PStream object creation on every tail() and setValue(). Inherent to the functional PStream contract that all FOAM grammars rely on.

Changes

Replay instrumentation:

F3FileJournal.js — per-phase PM instrumentation, Jackson fast path with FOAM fallback, charAt comment check, operation counts
FileJournal.js — same instrumentation for legacy format

Jackson parser:

JacksonJournalParser.js — new: Jackson ObjectMapper with unquoted field names, Integer-to-Long promotion, nested Map detection, triple-quote/backtick normalization

FOAM parser optimizations:

ModelParserFactory.java — inline SkipParser, inline property value parser, inline comma separator
StringParser.java — remove String.intern(), add indexOf fast path for no-escape strings
DoubleParser.java — arithmetic accumulation replaces StringBuilder + Double.valueOf
AbstractLiteral.java — factory returns CharLiteral for single-char strings
CharLiteral.java — new: optimized single-character literal parser
StringPStream.java — add getString() and createAt() for cross-package indexOf access
JSONParser.js — add parseStringPooled() and parseStringFast() for benchmark variants

Benchmark infrastructure (experiment playground):

JournalReplayBenchmark.js — 9 replay variants with comparison table
BenchmarkModel.js — 50-property model covering all FOAM types
JacksonParserCorrectnessTest.js — property-by-property FOAM vs Jackson comparison
PooledStringPStream.java — arena allocator (experiment, -20% regression)
FastStringPStream.java — char[] direct access (experiment, ~0% change)

Tests verified:

GrammarCombinatorsJavaTest: 81/81 passed
F3FileJournalTest: 32/32 passed
JournalReplayBenchmark: 9/9 passed

Running

# Full benchmark (1M entries, ~2 minutes)
./build.sh -W9090 --flags:test server-tests:JournalReplayBenchmark

# Correctness test
./build.sh -W9090 --flags:test server-tests:JacksonParserCorrectnessTest

# Parser combinator tests
./build.sh -W9090 --flags:test server-tests:GrammarCombinatorsJavaTest

…r, benchmark Add per-phase timing (getEntry, parse, findMerge, daoWrite) to F3FileJournal and FileJournal replay. Each phase emits a PM entry at replay completion. Add JacksonJournalParser — a Jackson-based parser that handles flat journal entries ~4.5x faster than the FOAM combinator parser. Entries with nested FObjects or FOAM-specific syntax (triple-quoted strings, backtick templates) fall back to the FOAM parser per-entry. Replace regex comment check with charAt(0)=='/' — 9x faster. Add JournalReplayBenchmark (1M entry end-to-end: FOAM vs Jackson) and JacksonParserCorrectnessTest (property-by-property comparison across 6 model scenarios including User, CSpec, Cron, Group, Enum, Reference, StringArray).

Add replayInlined (FOAM parser) and replayJacksonInlined (Jackson primary + FOAM fallback) benchmark methods that replay journals with parse + DAO put inlined directly in the loop -- no SyncAssemblyLine, no Assembly objects, no synchronized blocks. These provide a baseline for comparing against Assembly-wrapped variants to isolate synchronization overhead.

…ison table Three new benchmark methods: - replayWithPooledPStream: FOAM parser using PooledStringPStream arena allocator - replayAllOptimizations: Jackson primary + FOAM+Pool fallback, no AssemblyLine - replayParallel: N-thread parallel parsing with per-thread parsers and MDAO merge Replace text summary with formatted comparison table showing wall time, entries/sec, and speedup vs FOAM baseline for all seven variants.

…-upstream

Four independent optimizations to the FOAM combinator parser, each benchmarked independently against 1M journal entries (981 MB): 1. Inline SkipParser (ModelParserFactory.java) Replace 5-layer combinator stack (Repeat0→Alt→Seq0→Literal→WS) with direct char checks in a single while-loop. Result: +24.8% (37.53s → 28.24s) 2. CharLiteral specialization (new CharLiteral.java + AbstractLiteral.java) Single-char literals ({, }, :, ,) skip the loop and String.charAt in AbstractLiteral.parse(). Factory auto-selects for length==1. Result: +2.6% (28.24s → 27.51s) 3. Remove String.intern() (StringParser.java) intern() hits the JVM global native string table with C++ sync ~30M times. Most journal strings are unique — pure overhead. Result: +25.1% (27.51s → 21.99s) 4. Arithmetic DoubleParser (DoubleParser.java) Replace StringBuilder + Double.valueOf(sb.toString()) with direct long arithmetic accumulation. Zero heap allocation per number. Result: +14.4% (21.99s → 18.82s) Also tested and REJECTED: - PrefixAlt flat char[] dispatch: -12.3% regression (Character.compare is already JIT-optimized as an intrinsic) Cumulative: 37.53s → 18.82s (49.9% faster, ~2x throughput) FOAM gap to Jackson narrowed from 4.8x to ~2.4x Tests verified: - GrammarCombinatorsJavaTest: 81/81 passed - F3FileJournalTest: 32/32 passed - JournalReplayBenchmark: all 8 assertions passed

StringParser previously called ps.apply(delimiter, x) on every character to check for the closing quote — a Literal.parse() virtual dispatch per char. For double-quoted strings without escape sequences (95%+ of journal strings), now uses String.indexOf() to find the closing quote in one native call, then extracts the substring directly. Follows the same pattern as UntilLiteral (indexOf on StringPStream) but adds a backslash pre-check: if indexOf('\\') finds an escape before the closing quote, falls back to the existing char-by-char loop. Added getString() and createAt() to StringPStream for cross-package access to the underlying string and position-based construction. Result: 20.94s → 17.45s (+16.7%), FOAM gap to Jackson: 2.0x

…Factory Replace Seq0(SKIP, Literal(':'), SKIP, valueParser) per property with a single parser that does direct char checks: inline whitespace skip, direct ':' comparison, inline whitespace skip, then value parse + set. Eliminates 4 combinator layers (Seq0 dispatch) x 50 properties x 1M entries = 200M virtual dispatch calls removed. Also replace the outer Repeat0(Seq0(SKIP, Alt, SKIP), Literal(',')) with an inlined property loop: direct whitespace skip, PrefixAlt dispatch, direct ',' char check. Eliminates Repeat0 + Seq0 + Literal overhead. Result: 17.45s → 15.53s (+11.0%), FOAM gap to Jackson: 1.7x

Created FastStringPStream that replaces Reference<CharSequence> with direct char[] access. head() becomes chars_[pos_] instead of str.get().charAt(pos). Hypothesis: removing volatile read + method indirection on 700M head() calls per run would measurably speed up. Result: 23.56s vs PooledPStream 24.15s — within noise. No improvement vs baseline 15.87s. The JIT compiler already optimizes Reference.get().charAt() effectively. This confirms the remaining 1.7x gap to Jackson is NOT from Reference indirection — it's from PStream object allocation (tail/setValue creating new objects) and combinator virtual dispatch.

abdelaziz-mahdy · 2026-04-16T14:29:49Z

Benchmark Results (1M entries, 981 MB)

Per-Phase Breakdown

Variant	Wall	Read (I/O)	Parse	MDAO Put	Overhead	Entries/sec
FOAM Parser (baseline)	15.29s	0.83s (5.4%)	13.85s (90.6%)	0.50s (3.3%)	0.11s (0.7%)	65,397
Jackson (ceiling)	7.11s	0.74s (10.4%)	5.79s (81.5%)	0.44s (6.2%)	0.14s (1.9%)	140,648
FOAM Inlined (no AssemblyLine)	15.35s	0.73s (4.8%)	14.03s (91.4%)	0.47s (3.1%)	0.12s (0.8%)	65,138
Jackson Inlined	7.07s	0.71s (10.0%)	5.76s (81.5%)	0.50s (7.0%)	0.10s (1.5%)	141,360
FOAM + Pooled PStream	23.96s	0.84s (3.5%)	22.47s (93.8%)	0.51s (2.1%)	0.14s (0.6%)	41,737
FOAM + FastPStream (char[])	20.92s	0.71s (3.4%)	19.65s (93.9%)	0.49s (2.3%)	0.08s (0.4%)	47,805
All Optimizations Combined	7.16s	0.71s (9.9%)	5.90s (82.4%)	0.45s (6.3%)	0.10s (1.5%)	139,609
Parallel (10 threads)	2.69s	--	1.98s (parse)	0.72s (merge)	--	371,124

Comparison Table

Variant	Wall(s)	Entries/s	vs FOAM
FOAM Parser (baseline)	15.29	65,397	1.0x
Jackson (ceiling)	7.11	140,648	2.2x
FOAM Inlined (no AssemblyLine)	15.35	65,138	1.0x
Jackson Inlined	7.07	141,360	2.2x
FOAM + Pooled PStream	23.96	41,737	0.6x
FOAM + FastPStream (char[])	20.92	47,805	0.7x
All Optimizations Combined	7.16	139,609	2.1x
Parallel (10 threads)	2.69	371,124	5.7x

Key Observations

Parse dominates -- 81-94% of wall time across all variants. I/O is 3-10%, MDAO put is 2-7%.
SyncAssemblyLine overhead: ~0% -- 15.29s vs 15.35s, not a bottleneck.
Pooled/FastPStream regressed -- JVM young-gen GC handles short-lived objects better than manual pooling.
Comment check: regex 73.1ms vs charAt 4.7ms = 15.4x speedup.
Parallel (10 threads, same file): 2.69s total (1.98s parse + 0.72s merge) = 5.7x vs baseline.

Extend the replay benchmark with two new paths kept alongside the existing variants for direct comparison. SyncAssemblyLine vs SimpleAsyncAssemblyLine: mirror F3FileJournal's production shape (parse in executeJob, put in endJob). Moving the assembly line from sync to async parallelizes parse across a worker pool while the dedicated end thread keeps install serial. Without touching the FOAM parser, this reaches 2.6x replay throughput at a 10-thread pool. CircularStringPStream: 256-char sliding window PStream added next to FastStringPStream and PooledStringPStream. Ring copy and out-of-window substring fallback outweigh any cache gain; regresses to 0.5x of baseline, confirming storage format is not the parse bottleneck. SimpleAsyncAssemblyLine.java is pulled forward from upstream development so the branch builds standalone.

abdelaziz-mahdy added 12 commits April 12, 2026 16:26

feat: add PooledStringPStream arena allocator for parser objects

1e29707

fix: correct license header in PooledStringPStream to 2026 FOAM Authors

b02cda0

feat: add parseStringPooled() to JSONParser for arena-allocated parsing

04f44cf

Merge remote-tracking branch 'upstream/development' into journal-perf…

8ee8ae9

…-upstream

bench: add FastStringPStream (char[] direct access) benchmark variant

0797bd1

abdelaziz-mahdy marked this pull request as draft April 15, 2026 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969

Journal Replay Performance: Instrumentation, Jackson Fast Path, FOAM Parser Optimizations, and Benchmark Suite#4969
abdelaziz-mahdy wants to merge 13 commits intodevelopmentfrom
journal-perf-upstream

abdelaziz-mahdy commented Apr 15, 2026

Uh oh!

abdelaziz-mahdy commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abdelaziz-mahdy commented Apr 15, 2026

Problem

Solution

1. Per-Phase PM Instrumentation

2. Jackson Fast-Path Parser

3. FOAM Parser Optimizations

4. Parallel File Loading (Experiment)

Benchmark Results (1M entries, 981 MB)

Changes

Running

Uh oh!

abdelaziz-mahdy commented Apr 16, 2026

Benchmark Results (1M entries, 981 MB)

Per-Phase Breakdown

Comparison Table

Key Observations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant