Skip to content

Result Rows Caching #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Result Rows Caching #57

wants to merge 4 commits into from

Conversation

winsmith
Copy link
Contributor

@winsmith winsmith commented Apr 25, 2025

This is an attempt to try out various ways of caching individual result rows for query results.

  • Time series query results have a granularity as defined by the query, and each row has a timestamp
  • Top N query results have a granularity as defined by the query, and each row has a timestamp
  • GroupBy query results have a granularity as defined by the query, and each row has a timestamp

so these are possible candidates for a way of caching where we cache individual rows and only calculate the ones that are missing or outdated. (The druid server could more accurately tell which rows are outdated, but we're ignoring that for the sake of experiment and simplicity.)

Process

  1. a query comes in, it contains at least one relative interval or absolute interval
  2. we generate IntervalIndependentHash, a hash from a copy of the query where we remove all intervals, because intervals are irrelevant for this type of caching
  3. we generate a list of all time segments we need to fulfill the query inside its intervals
  4. for each time segment, we query the cache for IntervalIndependentHash + granularity + window + iso8601 date for existing rows
  5. we generate new intervals for all missing rows
  6. we run a query with these intervals
  7. we store all rows that we don't deem volatile in the cache
  8. we build and return a full query result

This can be enhanced later with windowed caching where we cache complete results for fixed, non-overlapping time windows (e.g. per-day, per-week blocks).

Tasks

  • implement Query.intervalIndependentHash
  • implement TimeInterval.timeSegments(with: granularity)
  • implement methods of generating new time intervals from old timeintervals minus time segments (should time segments be their own struct?)
  • implement combining of query results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant