Result Rows Caching #57
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an attempt to try out various ways of caching individual result rows for query results.
row
has atimestamp
row
has atimestamp
row
has atimestamp
so these are possible candidates for a way of caching where we cache individual rows and only calculate the ones that are missing or outdated. (The druid server could more accurately tell which rows are outdated, but we're ignoring that for the sake of experiment and simplicity.)
Process
IntervalIndependentHash
, a hash from a copy of the query where we remove all intervals, because intervals are irrelevant for this type of cachingIntervalIndependentHash
+granularity
+window
+iso8601 date
for existing rowsThis can be enhanced later with windowed caching where we cache complete results for fixed, non-overlapping time windows (e.g. per-day, per-week blocks).
Tasks
Query.intervalIndependentHash
TimeInterval.timeSegments(with: granularity)