-
-
Notifications
You must be signed in to change notification settings - Fork 248
Refactor json nodes lazy deserialization #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor json nodes lazy deserialization #782
Conversation
- Updated all JSON node types (OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, etc.) to use uniform MemorySegment-based deserialization pattern - Implemented lazy loading for all value types (strings, numbers, booleans, nulls) - Nodes now deserialize using layout-based slicing for better performance - Removed ~100 lines of unused helper methods from NodeKind - Fixed AbstractStringNode hash computation to use toByteArray() instead of getDestination() - All JSON nodes now follow the same pattern as OBJECT and ARRAY for consistency - Build verified successful with no compilation errors
…ialization - Add size prefix (4 bytes) after NodeKind byte to avoid reading variable-sized data - Use 8-byte aligned headers (NodeKind + size + 3-byte padding) for proper alignment - Add end padding to ensure each node's total size is multiple of 8 - Switch all JSON nodes to UNALIGNED VarHandles for compatibility with factory-created nodes - Fix ObjectKeyNode to include 4-byte internal padding before hash field - Fix JsonNodeFactoryImpl to write internal padding when creating ObjectKeyNode - Fix setBooleanValue to handle both BooleanNode and ObjectBooleanNode types - Remove complex size calculation methods (calculateStopBitDataSize, calculateNumberDataSize) Benefits: - No double-reading of variable-sized content (strings, numbers) - Faster deserialization with direct MemorySegment slicing - Simpler, more maintainable code - Tests: PathSummaryTest and JsonNodeTrxGetPreviousRevisionNumberTest passing
…ules The net.openhft.hashing library needs access to sun.nio.ch.DirectBuffer when hashing DirectByteBuffer instances created from MemorySegments. Without these --add-opens flags, tests fail with IllegalAccessError. This fix allows: - Access to sun.nio.ch for DirectBuffer operations - Access to java.nio for ByteBuffer operations Tests now pass successfully.
…dding format - Add NodeKind byte before size prefix - Use 3 bytes padding (total 8 bytes with NodeKind) - Skip NodeKind byte before deserialize - Tests now pass with proper 8-byte alignment
…adding format - Fixed StringNodeTest, NumberNodeTest, BooleanNodeTest, NullNodeTest - Fixed ObjectNumberNodeTest, ObjectStringNodeTest, ObjectBooleanNodeTest, ObjectNullNodeTest, ObjectKeyNodeTest - Corrected serialization order for value nodes (siblings before/after value depending on node type) - All JSON node tests now pass with proper 8-byte alignment
- Created JsonNodeTestHelper with writeHeader(), writeEndPadding(), updateSizePrefix(), and finalizeSerialization() methods - Updated all 11 JSON node tests to use the helper methods - Reduced ~20 lines of duplicated code per test to 1-2 lines - Tests remain fully passing
…izer class - Created JsonNodeSerializer in main source with writeSizePrefix(), readSizePrefix(), writeEndPadding(), updateSizePrefix(), and calculateEndPadding() - Removed duplicate private methods from NodeKind.java - Updated NodeKind.java to use JsonNodeSerializer methods - Updated JsonNodeTestHelper to delegate to JsonNodeSerializer - Eliminated code duplication between production and test code - All tests still pass
- Added NodeKind byte before serialization in all 4 round-trip tests - Added bytesIn.readByte() to skip NodeKind byte before deserialization - Ensures proper 8-byte alignment for MemorySegment access - All 17 tests now pass
- Added serializeNumber() and deserializeNumber() static methods to NodeKind - Added helper methods serializeBigInteger() and deserializeBigInteger() - Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods - Removed duplicate serialization/deserialization code from NumberNode - Removed duplicate serialization/deserialization code from ObjectNumberNode - Both node types now use centralized logic from NodeKind for consistency
…obal() - Updated both constructors to use Arena.ofAuto() for automatic memory management - Arena.ofAuto() automatically releases memory when no longer reachable - Improves memory management by allowing automatic cleanup instead of global lifetime
…rializeNumber() - Changed NumberNode.serializeNumber() to NodeKind.serializeNumber() - Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber() - Fixes compilation errors after refactoring number serialization to NodeKind
…y offset - Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong - Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong - This fixes JsonRedBlackTreeIntegrationTest failures - RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding for efficient storage since parent key offsets are typically small values
- Revert GrowingMemorySegment to use Arena.ofAuto() by default * Nodes store MemorySegment references that outlive BytesOut instances * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable * Prevents premature deallocation bugs - Add Arena parameter constructors for explicit arena control * GrowingMemorySegment(Arena, int) for custom arena * MemorySegmentBytesOut(Arena, int) for custom arena * Enables using confined arenas for temporary buffers with clear lifecycles - Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined() * Use confined arena for temporary serialization buffers * Normal records: data copied to slotMemory, temp buffer freed immediately * Overflow records: explicitly copied to Arena.global() for persistence * Provides immediate memory cleanup for ~99% of serialization operations This hybrid approach balances manual control (where beneficial) with automatic management (where lifecycles are complex). All tests pass.
- Updated all JSON node types (OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, etc.) to use uniform MemorySegment-based deserialization pattern - Implemented lazy loading for all value types (strings, numbers, booleans, nulls) - Nodes now deserialize using layout-based slicing for better performance - Removed ~100 lines of unused helper methods from NodeKind - Fixed AbstractStringNode hash computation to use toByteArray() instead of getDestination() - All JSON nodes now follow the same pattern as OBJECT and ARRAY for consistency - Build verified successful with no compilation errors
…ialization - Add size prefix (4 bytes) after NodeKind byte to avoid reading variable-sized data - Use 8-byte aligned headers (NodeKind + size + 3-byte padding) for proper alignment - Add end padding to ensure each node's total size is multiple of 8 - Switch all JSON nodes to UNALIGNED VarHandles for compatibility with factory-created nodes - Fix ObjectKeyNode to include 4-byte internal padding before hash field - Fix JsonNodeFactoryImpl to write internal padding when creating ObjectKeyNode - Fix setBooleanValue to handle both BooleanNode and ObjectBooleanNode types - Remove complex size calculation methods (calculateStopBitDataSize, calculateNumberDataSize) Benefits: - No double-reading of variable-sized content (strings, numbers) - Faster deserialization with direct MemorySegment slicing - Simpler, more maintainable code - Tests: PathSummaryTest and JsonNodeTrxGetPreviousRevisionNumberTest passing
…ules The net.openhft.hashing library needs access to sun.nio.ch.DirectBuffer when hashing DirectByteBuffer instances created from MemorySegments. Without these --add-opens flags, tests fail with IllegalAccessError. This fix allows: - Access to sun.nio.ch for DirectBuffer operations - Access to java.nio for ByteBuffer operations Tests now pass successfully.
…dding format - Add NodeKind byte before size prefix - Use 3 bytes padding (total 8 bytes with NodeKind) - Skip NodeKind byte before deserialize - Tests now pass with proper 8-byte alignment
…adding format - Fixed StringNodeTest, NumberNodeTest, BooleanNodeTest, NullNodeTest - Fixed ObjectNumberNodeTest, ObjectStringNodeTest, ObjectBooleanNodeTest, ObjectNullNodeTest, ObjectKeyNodeTest - Corrected serialization order for value nodes (siblings before/after value depending on node type) - All JSON node tests now pass with proper 8-byte alignment
- Created JsonNodeTestHelper with writeHeader(), writeEndPadding(), updateSizePrefix(), and finalizeSerialization() methods - Updated all 11 JSON node tests to use the helper methods - Reduced ~20 lines of duplicated code per test to 1-2 lines - Tests remain fully passing
…izer class - Created JsonNodeSerializer in main source with writeSizePrefix(), readSizePrefix(), writeEndPadding(), updateSizePrefix(), and calculateEndPadding() - Removed duplicate private methods from NodeKind.java - Updated NodeKind.java to use JsonNodeSerializer methods - Updated JsonNodeTestHelper to delegate to JsonNodeSerializer - Eliminated code duplication between production and test code - All tests still pass
- Added NodeKind byte before serialization in all 4 round-trip tests - Added bytesIn.readByte() to skip NodeKind byte before deserialization - Ensures proper 8-byte alignment for MemorySegment access - All 17 tests now pass
- Added serializeNumber() and deserializeNumber() static methods to NodeKind - Added helper methods serializeBigInteger() and deserializeBigInteger() - Updated NUMBER_VALUE and OBJECT_NUMBER_VALUE serialization to use shared methods - Removed duplicate serialization/deserialization code from NumberNode - Removed duplicate serialization/deserialization code from ObjectNumberNode - Both node types now use centralized logic from NodeKind for consistency
…obal() - Updated both constructors to use Arena.ofAuto() for automatic memory management - Arena.ofAuto() automatically releases memory when no longer reachable - Improves memory management by allowing automatic cleanup instead of global lifetime
…rializeNumber() - Changed NumberNode.serializeNumber() to NodeKind.serializeNumber() - Changed ObjectNumberNode.serializeNumber() to NodeKind.serializeNumber() - Fixes compilation errors after refactoring number serialization to NodeKind
…y offset - Changed serializeDelegateWithoutIDs to use putVarLong instead of writeLong - Changed deserializeNodeDelegateWithoutIDs to use getVarLong instead of readLong - This fixes JsonRedBlackTreeIntegrationTest failures - RB nodes (CASRB, PATHRB, NAMERB, RB_NODE_VALUE) need variable-length encoding for efficient storage since parent key offsets are typically small values
- Revert GrowingMemorySegment to use Arena.ofAuto() by default * Nodes store MemorySegment references that outlive BytesOut instances * Arena.ofAuto() allows GC to manage cleanup when segments become unreachable * Prevents premature deallocation bugs - Add Arena parameter constructors for explicit arena control * GrowingMemorySegment(Arena, int) for custom arena * MemorySegmentBytesOut(Arena, int) for custom arena * Enables using confined arenas for temporary buffers with clear lifecycles - Optimize KeyValueLeafPage.processEntries() with Arena.ofConfined() * Use confined arena for temporary serialization buffers * Normal records: data copied to slotMemory, temp buffer freed immediately * Overflow records: explicitly copied to Arena.global() for persistence * Provides immediate memory cleanup for ~99% of serialization operations This hybrid approach balances manual control (where beneficial) with automatic management (where lifecycles are complex). All tests pass.
…lization' into refactor-json-nodes-lazy-deserialization # Conflicts: # JsonShredderTest_testChicagoDescendantAxisParallel_2024_08_12_221741.jfr.zip # analysis-5-trxs.jfr
…alization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode) - Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash) - Add size prefix and padding for proper 8-byte alignment - Update ELEMENT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based ElementNode instances - Fix GrowingMemorySegment.grow() buffer overflow bug (copy only valid bytes) - Update ElementNodeTest and NodePageTest for new implementation All XML tests passing (271 tests, 47 skipped)
…rialization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 36 bytes (16B NodeDelegate + 20B NameNode) - Value bytes stored separately (variable length) with compressed flag - Remove typeKey from serialized data (computed on-the-fly as 'xs:untyped' hash) - Add size prefix and padding for proper 8-byte alignment - Update ATTRIBUTE serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based AttributeNode instances - Update AttributeNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)
…rialization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 36 bytes (16B NodeDelegate + 20B NameNode) - Add size prefix and padding for proper 8-byte alignment - Update NAMESPACE serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl to create MemorySegment-based NamespaceNode instances - Update NamespaceNodeTest for new implementation All XML tests passing
…alization - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys) - Value data stored separately (not in MemorySegment) - Update COMMENT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createCommentNode for MemorySegment creation - Update CommentNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)
…tion - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 68 bytes (16B NodeDelegate + 32B StructNode + 20B NameNode) - Optional fields: childCount (8B), hash (8B), descendantCount (8B) - Value data stored separately (not in MemorySegment) - Update PROCESSING_INSTRUCTION serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createPINode for MemorySegment creation - Update PINodeTest for new implementation All XML tests passing (271 tests, 47 skipped)
…zation - Replace delegate pattern with MemorySegment-backed storage using VarHandles - Core layout: 32 bytes (16B NodeDelegate + 16B sibling keys) - Text nodes cannot have children, so no child-related fields - Value data stored separately (not in MemorySegment) - Update TEXT serialization/deserialization in NodeKind - Update XmlNodeFactoryImpl.createTextNode for MemorySegment creation - Update TextNodeTest for new implementation All XML tests passing (271 tests, 47 skipped)
- Store compressed bytes directly when value is compressed - Decompress only when getRawValue() is called - Clear compressed data after decompression to save memory - Maintain backward compatibility with uncompressed values Benefits: - Reduces memory pressure for compressed text values - Defers decompression cost until value is actually accessed - Improves deserialization performance for nodes that are never read All XML tests passing (271 tests, 47 skipped)
… package - Renamed JsonNodeTestHelper to NodeTestHelper for broader applicability - Moved from io.sirix.node.json to io.sirix.node package - Now can be used by both JSON and XML node tests - Updated all references in JSON test files (11 files) - Updated all references in XML test files (ElementNodeTest, NamespaceNodeTest) - Added missing import statements to all affected test files Benefits: - Better code organization - test helper is now in the parent package - Can be reused across JSON and XML node tests - Reduces code duplication All tests passing
- PINode: Removed unused QNm import (method returns null anyway, but QNm is needed for return type) - CommentNode: Compression import is actually used for decompression, kept it - TextNode: All imports are used - ElementNode: All imports are used Note: During refactoring to MemorySegment-backed storage, some imports became unnecessary but most are still required for the new implementation. All tests passing
…torage - Remove childCount and descendantCount serialization/deserialization from all value nodes (StringNode, NumberNode, BooleanNode, NullNode, ObjectStringNode, ObjectNumberNode, ObjectBooleanNode, ObjectNullNode) as these are always leaf nodes - Fix ObjectKeyNode to properly store and retrieve descendantCount from MemorySegment instead of returning hardcoded value 1, enabling support for complex subtrees as values - Update memory layouts to place fixed-length fields before variable-length content (StringNode, ObjectStringNode, NumberNode, ObjectNumberNode) - Fix VarHandle offset calculation for descendantCount in ObjectKeyNode - Make increment/decrement methods for childCount and descendantCount no-ops in value nodes - Update JsonNodeFactoryImpl and NodeKind serialization to match new layouts - Update test data creation to match new serialization format - All 852 tests passing
} | ||
|
||
@Override | ||
public boolean hasParent() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Memory Layout Mismatch and Redundant Offset
In ObjectKeyNode
, the memory layout comments for the hash and descendant count fields (offsets 52 and 60) are inconsistent with the calculated hashOffset
(56). This results in a 4-byte shift for these fields in the MemorySegment
. The descendantCountOffset
variable is also redundant, as VarHandles
correctly apply internal offsets from a single base address.
Additional Locations (1)
// Lazy decompress the value | ||
value = Compression.decompress(compressedValue); | ||
// Clear compressed data to save memory | ||
compressedValue = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Null Handling Issue in Node Value Retrieval
The getRawValue()
method in CommentNode
and TextNode
can cause a NullPointerException
. When the node's value is compressed and not yet loaded, the method tries to decompress a null
value
field instead of the stored compressed bytes.
No description provided.