Refactor json nodes lazy deserialization #782

JohannesLichtenberger · 2025-10-07T22:02:40Z

No description provided.

- Updated all JSON node types (OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, etc.) to use uniform MemorySegment-based deserialization pattern - Implemented lazy loading for all value types (strings, numbers, booleans, nulls) - Nodes now deserialize using layout-based slicing for better performance - Removed ~100 lines of unused helper methods from NodeKind - Fixed AbstractStringNode hash computation to use toByteArray() instead of getDestination() - All JSON nodes now follow the same pattern as OBJECT and ARRAY for consistency - Build verified successful with no compilation errors

…ialization - Add size prefix (4 bytes) after NodeKind byte to avoid reading variable-sized data - Use 8-byte aligned headers (NodeKind + size + 3-byte padding) for proper alignment - Add end padding to ensure each node's total size is multiple of 8 - Switch all JSON nodes to UNALIGNED VarHandles for compatibility with factory-created nodes - Fix ObjectKeyNode to include 4-byte internal padding before hash field - Fix JsonNodeFactoryImpl to write internal padding when creating ObjectKeyNode - Fix setBooleanValue to handle both BooleanNode and ObjectBooleanNode types - Remove complex size calculation methods (calculateStopBitDataSize, calculateNumberDataSize) Benefits: - No double-reading of variable-sized content (strings, numbers) - Faster deserialization with direct MemorySegment slicing - Simpler, more maintainable code - Tests: PathSummaryTest and JsonNodeTrxGetPreviousRevisionNumberTest passing

…ules The net.openhft.hashing library needs access to sun.nio.ch.DirectBuffer when hashing DirectByteBuffer instances created from MemorySegments. Without these --add-opens flags, tests fail with IllegalAccessError. This fix allows: - Access to sun.nio.ch for DirectBuffer operations - Access to java.nio for ByteBuffer operations Tests now pass successfully.

…dding format - Add NodeKind byte before size prefix - Use 3 bytes padding (total 8 bytes with NodeKind) - Skip NodeKind byte before deserialize - Tests now pass with proper 8-byte alignment

…adding format - Fixed StringNodeTest, NumberNodeTest, BooleanNodeTest, NullNodeTest - Fixed ObjectNumberNodeTest, ObjectStringNodeTest, ObjectBooleanNodeTest, ObjectNullNodeTest, ObjectKeyNodeTest - Corrected serialization order for value nodes (siblings before/after value depending on node type) - All JSON node tests now pass with proper 8-byte alignment

- Created JsonNodeTestHelper with writeHeader(), writeEndPadding(), updateSizePrefix(), and finalizeSerialization() methods - Updated all 11 JSON node tests to use the helper methods - Reduced ~20 lines of duplicated code per test to 1-2 lines - Tests remain fully passing

…izer class - Created JsonNodeSerializer in main source with writeSizePrefix(), readSizePrefix(), writeEndPadding(), updateSizePrefix(), and calculateEndPadding() - Removed duplicate private methods from NodeKind.java - Updated NodeKind.java to use JsonNodeSerializer methods - Updated JsonNodeTestHelper to delegate to JsonNodeSerializer - Eliminated code duplication between production and test code - All tests still pass

- Added NodeKind byte before serialization in all 4 round-trip tests - Added bytesIn.readByte() to skip NodeKind byte before deserialization - Ensures proper 8-byte alignment for MemorySegment access - All 17 tests now pass

- Added serializeNumber() and deserializeNumber() static methods to NodeKind - Added helper methods serializeBigInteger() and deserializeBigInteger() - Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods - Removed duplicate serialization/deserialization code from NumberNode - Removed duplicate serialization/deserialization code from ObjectNumberNode - Both node types now use centralized logic from NodeKind for consistency

…obal() - Updated both constructors to use Arena.ofAuto() for automatic memory management - Arena.ofAuto() automatically releases memory when no longer reachable - Improves memory management by allowing automatic cleanup instead of global lifetime

…rializeNumber() - Changed NumberNode.serializeNumber() to NodeKind.serializeNumber() - Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber() - Fixes compilation errors after refactoring number serialization to NodeKind

…y offset - Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong - Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong - This fixes JsonRedBlackTreeIntegrationTest failures - RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding for efficient storage since parent key offsets are typically small values

- Revert GrowingMemorySegment to use Arena.ofAuto() by default * Nodes store MemorySegment references that outlive BytesOut instances * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable * Prevents premature deallocation bugs - Add Arena parameter constructors for explicit arena control * GrowingMemorySegment(Arena, int) for custom arena * MemorySegmentBytesOut(Arena, int) for custom arena * Enables using confined arenas for temporary buffers with clear lifecycles - Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined() * Use confined arena for temporary serialization buffers * Normal records: data copied to slotMemory, temp buffer freed immediately * Overflow records: explicitly copied to Arena.global() for persistence * Provides immediate memory cleanup for ~99% of serialization operations This hybrid approach balances manual control (where beneficial) with automatic management (where lifecycles are complex). All tests pass.