-
Notifications
You must be signed in to change notification settings - Fork 311
HPCC-35567 Fix invalid JSON spray split points on small files. #20799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: candidate-9.12.x
Are you sure you want to change the base?
HPCC-35567 Fix invalid JSON spray split points on small files. #20799
Conversation
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35567 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in JSON spray split point calculation for small files where unsigned integer underflow was causing trailing empty split points to have incorrectly large lengths. The root cause was the JSON partitioner incorrectly resetting the cursor.inputOffset to 0, which was then used to calculate split point lengths.
Key Changes
- Modified the conditional logic for when
cursor.inputOffsetis reset to 0 to preserve its value when EOF is encountered - Added defensive
assertexcheck to guard against length underflow - Improved code clarity by introducing an intermediate variable for the adjusted start offset calculation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| dali/ft/daftformat.ipp | Modified findSplitPoint to conditionally reset cursor.inputOffset only for first split or before EOF check, preserving the offset value when EOF is reached |
| dali/ft/daftformat.cpp | Added defensive assertion and intermediate variable to prevent unsigned underflow when calculating partition point lengths |
ghalliday
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith 1 question
dali/ft/daftformat.ipp
Outdated
| if (eof) // leave cursor.inputOffset untouched | ||
| return; | ||
|
|
||
| cursor.inputOffset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct for the case !foundRowEnd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I agree it is not.
I don't think correct if !checkFoundRowStartRes either..
The creation of split points for small JSON files was causing what should have been trailing empty split points to contain huge lengths due to unsigned underflow. This was caused by the JSOM partitioner incorrectly resetting the cursor inputOffset which in turn was used to calculate length on all furture splits. Signed-off-by: Jake Smith <[email protected]>
4c509dc to
193056e
Compare
The creation of split points for small JSON files was causing what should have been trailing empty split points to contain huge lengths due to unsigned underflow.
This was caused by the JSOM partitioner incorrectly resetting the cursor inputOffset which in turn was used to calculate length on all furture splits.
Type of change:
Checklist:
Smoketest:
Testing: