Skip to content

Conversation

@nozjkoitop
Copy link
Contributor

This PR is a refreshed version of the previously closed PR: #16607

It fixes a failure when MSQ queries run with "includeSegmentSource": "REALTIME" while a supervisor is ingesting and the query reads realtime data.

Before Druid 31, the UI surfaced a ClassCastException (stack trace below). In newer versions it appears as an NPE, but the underlying cause is the same

Caused by: java.lang.ClassCastException: class [B cannot be cast to class org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder ([B is in module java.base of loader 'bootstrap'; org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder is in unnamed module of loader java.net.URLClassLoader @38792286)
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolderObjectStrategy.toBytes(HllSketchHolderObjectStrategy.java:31)
	at org.apache.druid.segment.serde.ComplexMetricSerde.toBytes(ComplexMetricSerde.java:119)
	at org.apache.druid.frame.field.ComplexFieldWriter.writeTo(ComplexFieldWriter.java:65)
	at org.apache.druid.frame.write.RowBasedFrameWriter.writeDataUsingFieldWriters(RowBasedFrameWriter.java:291)
	at org.apache.druid.frame.write.RowBasedFrameWriter.writeData(RowBasedFrameWriter.java:246)
	at org.apache.druid.frame.write.RowBasedFrameWriter.addSelection(RowBasedFrameWriter.java:122)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.populateFrameWriterAndFlushIfNeeded(ScanQueryFrameProcessor.java:348)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.populateFrameWriterAndFlushIfNeededWithExceptionHandling(ScanQueryFrameProcessor.java:329)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runWithLoadedSegment(ScanQueryFrameProcessor.java:231)
	at org.apache.druid.msq.querykit.BaseLeafFrameProcessor.runIncrementally(BaseLeafFrameProcessor.java:87)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runIncrementally(ScanQueryFrameProcessor.java:158)
	at org.apache.druid.frame.processor.FrameProcessors$1FrameProcessorWithBaggage.runIncrementally(FrameProcessors.java:75)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:230)
	... 8 more

On the realtime segment path, the selector’s getObject() (used by ComplexFieldWriter) can return a byte[] that is already the serialized form of an HLL sketch.

However, later serde assumes the complex metric value is the object form (HllSketchHolder) and passes it into HllSketchHolderObjectStrategy.toBytes(), which expects an HllSketchHolder. When it receives a byte[] instead, it results in the ClassCastException shown above.

How to reproduce

Run an MSQ select for a realtime data with "includeSegmentSource": "REALTIME" in the context

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Segment Format and Ser/De Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 19, 2025
@nozjkoitop nozjkoitop changed the title Feature fix msq with hllsketch in realtime Fixed ClassCastException during Multi-Stage Queries on real-time segments Dec 19, 2025
@nozjkoitop
Copy link
Contributor Author

Hey @LakshSingla @adarshsanjeev can you help in reviewing this? As you already participated in my previous attempt to fix that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant