[Bug] [Zeta] Fix negative array size exception#10827
[Bug] [Zeta] Fix negative array size exception#10827lm-ylj wants to merge 6 commits intoapache:devfrom
Conversation
|
@lm-ylj Thank you for your submission. This is a great repair, One compatibility concern remains with the current change. Legacy data with arity <= 127 is encoded as 1 byte, but the new code reads it as 4 bytes using readInt(). This causes the stream position to shift by 3 bytes and breaks deserialization of all subsequent fields. Impact:
Fix scope:
|
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for chasing the negative-array-size issue. I pulled the latest head locally and rechecked the real cross-task-group row transport path through RecordSerializer.
Runtime path:
Source / Transform / Sink row transport
-> RecordSerializer.write()
-> writes type, tableId, rowKind, arity, fields
-> Hazelcast byte transport
-> RecordSerializer.read()
-> reconstructs SeaTunnelRow
The bug is real, but the current fix still changes the on-wire row layout in a backward-incompatible way. Before this PR, arity was encoded in 1 byte. In the current head, the serializer switches to writeInt() / readInt(), so a new reader will consume 4 bytes from an old stream and shift the remaining field offsets. That makes rolling-upgrade / mixed-version clusters unsafe even for valid legacy rows with arity <= 127.
Blocking items:
- Please keep backward-compatible row decoding/encoding. The safer shape is to preserve the legacy 1-byte path for legacy-compatible arity values and use a sentinel + int only for larger arity.
- Please add compatibility coverage for old-writer -> new-reader, especially around 127 / 128, not only a new-format round trip.
Because this touches seatunnel-engine serialization, I do not recommend merging the current head until the compatibility path is fixed.
|
I appreciate your feedback and suggestions. I will address and fix this backward compatibility issue accordingly. |
DanielLeens
left a comment
There was a problem hiding this comment.
Hi @lm-ylj, thanks for the quick follow-up, and thanks for confirming you plan to address the compatibility concern.
I re-checked the current head locally after your latest reply. There is still no new code on top of commit 9b710c2fe, so the technical conclusion remains unchanged for now.
What this PR is trying to fix is real:
- User pain: when a
SeaTunnelRowhas more than 127 fields, the current engine serializer storesarityin a signed byte, and deserialization can end up constructingnew SeaTunnelRow(negativeValue), which triggersNegativeArraySizeException. - Fix approach in the current head: switch
arityfromwriteByte/readBytetowriteInt/readInt, and add a regression test forarity = 128. - One-line summary: the PR fixes the overflow case for single-version clusters, but it currently does so by changing the engine wire format in a backward-incompatible way.
Runtime chain I re-verified locally:
Source / Transform produces SeaTunnelRow
-> SeaTunnelSourceCollector.collect() [SeaTunnelSourceCollector.java:93-112]
-> sendRecordToNext(new Record<>(row))
-> task-group queue publish
-> RecordEventProducer.onData() [RecordEventProducer.java:29-53]
-> Hazelcast serializer hook
-> RecordSerializerHook.createSerializer() [RecordSerializerHook.java:32-35]
-> RecordSerializer.write() [RecordSerializer.java:40-58]
-> remote task-group receives bytes
-> RecordSerializer.read() [RecordSerializer.java:66-92]
-> RecordEventHandler.handleRecord() [RecordEventHandler.java:52-66]
Because this serializer sits on the real Zeta cross-task-group transport path, compatibility is the blocking concern here.
The main blocker is still:
- The current head changes the on-wire layout from
1 byteto4 bytesforarity, but the new reader does not preserve any legacy decode path. That means a new node reading bytes written by an old node will consume 4 bytes where the old stream only wrote 1, shifting all following field boundaries. This makes rolling upgrades / mixed-version clusters unsafe.
What I recommend:
- Preferred option: keep the legacy 1-byte encoding for compatible values, and only use a sentinel +
intpath for larger arity values. - Please also add compatibility coverage for
old writer -> new reader, especially around the127 / 128boundary, not only a new-format round trip.
CI note:
- I did not see a code-path failure from this PR itself in the current status rollup. The visible failing item is a PR labeler workflow, which does not change the engine compatibility conclusion above.
Conclusion: merge after fixes
Blocking items:
- Fix the serializer to keep backward-compatible decoding/encoding for historical row bytes.
- Add old-writer -> new-reader compatibility tests for the boundary cases.
Non-blocking suggestions:
- No extra non-blocking asks from my side right now. Once the compatibility path is fixed, I’m happy to re-check it again.
Overall, this is still a worthwhile fix to continue, but because it touches seatunnel-engine serialization, I do not recommend merging the current head until the compatibility path is corrected.
|
@lm-ylj Thank you for the quick fix. The write path is already deterministic:
Given that contract, both checks below in if (encodedArity != EXTENDED_ROW_ARITY_MARKER) { ... }
if (extensionMagic != EXTENDED_ROW_ARITY_MAGIC) { ... }Also, pre-fix records with Suggested simplification: private int readRowArity(ObjectDataInput in) throws IOException {
byte b = in.readByte();
if (b >= 0) {
return b;
}
// Extended encoding written by writeRowArity: {-1, MAGIC, int}
in.readInt(); // magic
return in.readInt(); // arity
}This keeps behavior aligned with the writer invariant and makes the decoding path easier to read and maintain. |
Updated |
|
+1 LGTM |
Purpose of this pull request
close #10826
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.