continuous-test: Add a handful of verification queries for OOO data#14394
continuous-test: Add a handful of verification queries for OOO data#14394
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| flagext.DefaultValues(&cfg) | ||
| cfg.MaxQueryAge = 2 * 24 * time.Hour | ||
|
|
||
| now := time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0) |
There was a problem hiding this comment.
Test timestamp computed from nanoseconds instead of seconds
Low Severity
time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0) converts a time.Duration (which is nanoseconds) to int64 and passes it to time.Unix which expects seconds. This creates a timestamp ~27 million years in the future instead of the intended "10 days + 2 seconds." The pre-existing test on line 35 uses the correct pattern: time.Unix(10*86400, 0). The tests still pass because all assertions use relative time comparisons, but the now value is wildly different from what was intended.
Additional Locations (1)
There was a problem hiding this comment.
Haha, good catch, but all these tests are very careful to not depend on an overly specific definition of now. So, it's of little consequence other than aesthetic.
There was a problem hiding this comment.
I think, this suggestion was quite good, actually. Without a code comment, that mentions that this now value is meaningless, it could be confusing for a future code reader, who may think it's an honest bug.
Could we move the duration to nsec argument, to keep it both readable and sound:
// 10d and 10s (864002e9 nanos) after epoch
now := time.Unix(0, int64(10*24*time.Hour + 2*time.Second))
narqo
left a comment
There was a problem hiding this comment.
I've left one nit, but the changes work for me, overall 🔥
| flagext.DefaultValues(&cfg) | ||
| cfg.MaxQueryAge = 2 * 24 * time.Hour | ||
|
|
||
| now := time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0) |
There was a problem hiding this comment.
I think, this suggestion was quite good, actually. Without a code comment, that mentions that this now value is meaningless, it could be confusing for a future code reader, who may think it's an honest bug.
Could we move the duration to nsec argument, to keep it both readable and sound:
// 10d and 10s (864002e9 nanos) after epoch
now := time.Unix(0, int64(10*24*time.Hour + 2*time.Second))

What this PR does
This PR adds several queries that assert on the new OOO data that the continuous tester exercises.
We assert:
We never enable results cache, and we minimize assertions on data that the regular continuous test might catch.
These queries have been tested on and off in a dev cell since into last week. This seemed to be a good balance of a small set of queries, that asserts the mixture of inorder and out-of-order samples in the same series.
Remember that this test is still disabled by default, as we harden it further.
Which issue(s) this PR fixes or relates to
Contrib https://github.com/grafana/mimir-squad/issues/3373
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Low Risk
Changes are isolated to the optional continuous test and mostly add extra query/verification logic and unit tests; main risk is increased query load when the test is enabled.
Overview
Adds post-write verification to the
WriteReadOOOTestby issuing additional instant and range queries against the OOO sine-wave metric and validating results viaverifySamplesSum(always with results cache disabled).Introduces helpers to compute query windows for both in-order (last 24h + instants) and out-of-order dense regions (24h window ending at the OOO lag border + instants near/before the border), including clamping to
MaxQueryAgeand step-alignment to avoid false positives; adds span-based logging for query executions.Expands tests to cover the new time-range selection behavior and to assert that
Runperforms the expected write(s) and query calls under empty and partial history scenarios.Written by Cursor Bugbot for commit 6d929f4. This will update automatically on new commits. Configure here.