Skip to content

TTL-based Result Level Caching For Queries Touching Realtime Segments #18602

@jtuglu1

Description

@jtuglu1

Description

Want to support result-set caching of queries hitting realtime data nodes. Want to create a way to partition the result set of a query (from either realtime/historical data nodes) into cacheable granular intervals that can either pulled from cache and stitched into the query result, or issued as a query to data nodes.

Providing a TTL query context header would dictate how "recent" of an interval we'd want to serve from cache, and otherwise query from data nodes. Something like cacheTTL: "PT1M" would tell the brokers to serve from cache all results that were > PT1M ago, and issue queries to data nodes for data <PT1M.

Motivation

This would allow for result-set caching of queries against realtime segments, significantly boosting performance in exchange for staler data, configurable by the user. For longer-running stream ingestion jobs (e.g. where a realtime segment contains last 1h of data) the "staleness" imposed by this feature would likely be negligible, assuming low-to-no late records.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions