Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement cache rollup task using TA timeseries models #1135

Merged
merged 1 commit into from
Mar 25, 2025

Conversation

joseph-sentry
Copy link
Contributor

we want to make it so the cache rollup task is capable of reading test analytics information from the timeseries db

this also changes the format of the dataframe being cached, so we'll also change the format of the path at which we will store the cached dataframe

the logic for reading from the timeseries db is:

  • if no branch is specified -> read from the repo wide continuous aggs
  • if a branch is specified
    • if it's one of the more popular main branch names -> read from the branch scoped continuous aggregates
    • else, directly aggregate from the individual testruns

@joseph-sentry joseph-sentry requested a review from a team March 12, 2025 15:36
Copy link

sentry-autofix bot commented Mar 12, 2025

✅ Sentry found no issues in your recent changes ✅

Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.73%. Comparing base (c9ed88b) to head (36e33aa).
Report is 4 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1135   +/-   ##
=======================================
  Coverage   97.72%   97.73%           
=======================================
  Files         449      451    +2     
  Lines       36866    36947   +81     
=======================================
+ Hits        36028    36110   +82     
+ Misses        838      837    -1     
Flag Coverage Δ
integration 42.86% <32.09%> (-0.03%) ⬇️
unit 90.46% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CacheTestRollupsTask().run_impl(
_db_session=None,
repoid=1,
branch="main",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot the test assertions for this case :-)

serialized_table: BytesIO

if branch:
if branch in {"main", "master", "develop"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Repository has a dedicated field for the main branch, we should probably use that instead of hardcoding a set here.
If you still want to hardcode the list, better to define it as a top level const so its more discoverable.

Copy link
Contributor Author

@joseph-sentry joseph-sentry Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Repository has a dedicated field for the main branch, we should probably use that instead of hardcoding a set here.

it would make sense to do this, except we want to use the continuous aggregates, and to do this we would need to access the repo.branch from timescale, which isn't possible right now. I have ideas on how to do this in the future, but for now, I think this will be good enough for most of our users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does make sense, yes.
maybe a boolean is_main_branch or something that you feed in when processing, at which time you have access to the repo metadata.

we want to make it so the cache rollup task is capable of reading
test analytics information from the timeseries db

this also changes the format of the dataframe being cached, so we'll
also change the format of the path at which we will store the cached
dataframe

the logic for reading from the timeseries db is:
- if no branch is specified -> read from the repo wide continuous aggs
- if a branch is specified
    - if it's one of the more popular main branch names -> read from
      the branch scoped continuous aggregates
    - else, directly aggregate from the individual testruns
@codecov-notifications
Copy link

codecov-notifications bot commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@joseph-sentry joseph-sentry added this pull request to the merge queue Mar 24, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2025
@joseph-sentry joseph-sentry added this pull request to the merge queue Mar 25, 2025
Merged via the queue into main with commit 6196550 Mar 25, 2025
28 of 29 checks passed
@joseph-sentry joseph-sentry deleted the joseph/cache-rollup branch March 25, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants