Skip to content

Realtime scans from MSQ cannot reliably read complex types #18340

@gianm

Description

@gianm

When ScanQueryFrameProcessor delegates work to a realtime server, it sends a scan query and writes the response into a frame. Their serialization when selected in a scan query is done by Jackson, and it is common for such objects to have "one-way" serialization (they can be serialized using Jackson, but cannot be deserialized).

For aggregating queries, typically these objects are deserialized by the associated aggregator's AggregatorFactory#deserialize method rather than by Jackson. This is problematic for scan queries, because they don't use AggregatorFactory. This leads to errors when attempting to write the resulting objects to a frame, because they aren't the expected type.

To fix it, we need to serialize and deserialize these complex objects in a coherent way. There are a couple options that come to mind,

  • We could add a new parameter to scan that causes it to serialize complex objects using TypeStrategy rather than allowing Jackson to handle it. MSQ realtime delegation would use this parameter. Then, when reading, we use the same TypeStrategy to deserialize the object.
  • We could rewrite scan queries in MSQ's realtime delegation code to wrap all complex types in complex_encode_base64, which would accomplish more or less the same thing- it would use the TypeStrategy to serialize the object.

This bug affects all MSQ queries that use realtime data and delegate scan involving complex types. An example query would be like:

SELECT "hllSketch"
FROM tbl
WHERE "foo" = 'bar'

Queries that involve aggregations on complex types wouldn't be affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area - MSQFor multi stage queries - https://github.com/apache/druid/issues/12262BugMSQ

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions