Skip to content

continuous-test: Add a handful of verification queries for OOO data#14394

Open
alexweav wants to merge 13 commits intomainfrom
alexweav/ooo-cont-test-query
Open

continuous-test: Add a handful of verification queries for OOO data#14394
alexweav wants to merge 13 commits intomainfrom
alexweav/ooo-cont-test-query

Conversation

@alexweav
Copy link
Contributor

@alexweav alexweav commented Feb 17, 2026

What this PR does

This PR adds several queries that assert on the new OOO data that the continuous tester exercises.
We assert:

  • A range and mixture of instants on in-order written points, spanning our OOO window.
  • A range with a finer step asserting on the dense, partially OOO written region
  • A few instant queries on the "border" between inorder and out-of-order data

We never enable results cache, and we minimize assertions on data that the regular continuous test might catch.

These queries have been tested on and off in a dev cell since into last week. This seemed to be a good balance of a small set of queries, that asserts the mixture of inorder and out-of-order samples in the same series.

Remember that this test is still disabled by default, as we harden it further.

Which issue(s) this PR fixes or relates to

Contrib https://github.com/grafana/mimir-squad/issues/3373

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Low Risk
Changes are isolated to the optional continuous test and mostly add extra query/verification logic and unit tests; main risk is increased query load when the test is enabled.

Overview
Adds post-write verification to the WriteReadOOOTest by issuing additional instant and range queries against the OOO sine-wave metric and validating results via verifySamplesSum (always with results cache disabled).

Introduces helpers to compute query windows for both in-order (last 24h + instants) and out-of-order dense regions (24h window ending at the OOO lag border + instants near/before the border), including clamping to MaxQueryAge and step-alignment to avoid false positives; adds span-based logging for query executions.

Expands tests to cover the new time-range selection behavior and to assert that Run performs the expected write(s) and query calls under empty and partial history scenarios.

Written by Cursor Bugbot for commit 6d929f4. This will update automatically on new commits. Configure here.

@alexweav alexweav added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Feb 17, 2026
@alexweav alexweav requested a review from a team as a code owner February 17, 2026 22:55
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

flagext.DefaultValues(&cfg)
cfg.MaxQueryAge = 2 * 24 * time.Hour

now := time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test timestamp computed from nanoseconds instead of seconds

Low Severity

time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0) converts a time.Duration (which is nanoseconds) to int64 and passes it to time.Unix which expects seconds. This creates a timestamp ~27 million years in the future instead of the intended "10 days + 2 seconds." The pre-existing test on line 35 uses the correct pattern: time.Unix(10*86400, 0). The tests still pass because all assertions use relative time comparisons, but the now value is wildly different from what was intended.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

@alexweav alexweav Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, good catch, but all these tests are very careful to not depend on an overly specific definition of now. So, it's of little consequence other than aesthetic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this suggestion was quite good, actually. Without a code comment, that mentions that this now value is meaningless, it could be confusing for a future code reader, who may think it's an honest bug.

Could we move the duration to nsec argument, to keep it both readable and sound:

// 10d and 10s (864002e9 nanos) after epoch
now := time.Unix(0, int64(10*24*time.Hour + 2*time.Second))

Copy link
Contributor

@narqo narqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one nit, but the changes work for me, overall 🔥

flagext.DefaultValues(&cfg)
cfg.MaxQueryAge = 2 * 24 * time.Hour

now := time.Unix(int64((10*24*time.Hour)+(2*time.Second)), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this suggestion was quite good, actually. Without a code comment, that mentions that this now value is meaningless, it could be confusing for a future code reader, who may think it's an honest bug.

Could we move the duration to nsec argument, to keep it both readable and sound:

// 10d and 10s (864002e9 nanos) after epoch
now := time.Unix(0, int64(10*24*time.Hour + 2*time.Second))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-not-needed PRs that don't need a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments