3210 batch uploads creation #1006

adrian-codecov · 2025-01-13T23:00:53Z

This is the more complete version of uploads, flags + measurements batch insertion. This consolidates all the efforts + changes related to improving N+1 issues. This change adds a fn to add uploads, flags + measurements when this is a v4 upload. All should be batched and improve lock + performance issues

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

…ads-creation

codecov · 2025-01-13T23:07:06Z

Codecov Report

Attention: Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.79%. Comparing base (ac302e7) to head (0555c32).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
tasks/upload.py	98.18%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
- Coverage   97.79%   97.79%   -0.01%     
==========================================
  Files         447      447              
  Lines       36175    36207      +32     
==========================================
+ Hits        35376    35407      +31     
- Misses        799      800       +1

Flag	Coverage Δ
integration	`42.19% <98.18%> (+0.04%)`	⬆️
unit	`90.39% <18.18%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

⚠️ Impact Analysis from Codecov is deprecated and will be sunset on Jan 31 2025. See more

codecov-staging · 2025-01-13T23:07:16Z

Codecov Report

Attention: Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
tasks/upload.py	98.18%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

codecov-qa · 2025-01-13T23:07:25Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
1772	1	1771	4

View the top 1 failed tests by shortest run time

services/tests/test_report.py::TestReportService::test_create_report_upload

Stack Traces | 0.036s run time

self = <worker.services.tests.test_report.TestReportService object at 0x7fd85ed76270>
dbsession = <sqlalchemy.orm.session.Session object at 0x7fd85536ea80>

    @pytest.mark.django_db
    def test_create_report_upload(self, dbsession):
        arguments = {
            "branch": "master",
            "build": "646048900",
            "build_url": "http://github..../actions/runs/646048900",
            "cmd_args": "n,F,Q,C",
            "commit": "1280bf4b8d596f41b101ac425758226c021876da",
            "job": "thisjob",
            "flags": ["unittest"],
            "name": "this name contains more than 100 chars 1111111111111111111111111111111111111111111111111111111111111this is more than 100",
            "owner": "greenlantern",
            "package": "github-action-20210309-2b87ace",
            "pr": "33",
            "repo": "reponame",
            "reportid": "6e2b6449-4e60-43f8-80ae-2c03a5c03d92",
            "service": "github-actions",
            "slug": "greenlantern/reponame",
            "url": ".../C00AE6C87E34AF41A6D38D154C609782/1280bf4b8d596f41b101ac425758226c021876da/6e2b6449-4e60-43f8-80ae-2c03a5c03d92.txt",
            "using_global_token": "false",
            "version": "v4",
        }
        commit = CommitFactory.create()
        dbsession.add(commit)
        dbsession.flush()
        current_report_row = CommitReport(commit_id=commit.id_)
        dbsession.add(current_report_row)
        dbsession.flush()
        report_service = ReportService({})
        res = report_service.create_report_upload(arguments, current_report_row)
        dbsession.flush()
        assert res.build_code == "646048900"
        assert (
            res.build_url
            == "http://github..../actions/runs/646048900"
        )
        assert res.env is None
        assert res.job_code == "thisjob"
        assert (
            res.name
            == "this name contains more than 100 chars 1111111111111111111111111111111111111111111111111111111111111"
        )
        assert res.provider == "github-actions"
        assert res.report_id == current_report_row.id_
        assert res.state == "started"
        assert (
            res.storage_path
            == ".../C00AE6C87E34AF41A6D38D154C609782/1280bf4b8d596f41b101ac425758226c021876da/6e2b6449-4e60-43f8-80ae-2c03a5c03d92.txt"
        )
        assert res.order_number is None
        assert res.totals is None
        assert res.upload_extras == {}
        assert res.upload_type == "uploaded"
>       assert len(res.flags) == 1
E       assert 0 == 1
E        +  where 0 = len([])
E        +    where [] = <database.models.reports.Upload object at 0x7fd855b0fd50>.flags

services/tests/test_report.py:3730: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

codecov-public-qa · 2025-01-13T23:07:34Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
1772	1	1771	4

View the top 1 failed tests by shortest run time

services/tests/test_report.py::TestReportService::test_create_report_upload

Stack Traces | 0.036s run time

self = <worker.services.tests.test_report.TestReportService object at 0x7fd85ed76270>
dbsession = <sqlalchemy.orm.session.Session object at 0x7fd85536ea80>

    @pytest.mark.django_db
    def test_create_report_upload(self, dbsession):
        arguments = {
            "branch": "master",
            "build": "646048900",
            "build_url": "http://github..../actions/runs/646048900",
            "cmd_args": "n,F,Q,C",
            "commit": "1280bf4b8d596f41b101ac425758226c021876da",
            "job": "thisjob",
            "flags": ["unittest"],
            "name": "this name contains more than 100 chars 1111111111111111111111111111111111111111111111111111111111111this is more than 100",
            "owner": "greenlantern",
            "package": "github-action-20210309-2b87ace",
            "pr": "33",
            "repo": "reponame",
            "reportid": "6e2b6449-4e60-43f8-80ae-2c03a5c03d92",
            "service": "github-actions",
            "slug": "greenlantern/reponame",
            "url": ".../C00AE6C87E34AF41A6D38D154C609782/1280bf4b8d596f41b101ac425758226c021876da/6e2b6449-4e60-43f8-80ae-2c03a5c03d92.txt",
            "using_global_token": "false",
            "version": "v4",
        }
        commit = CommitFactory.create()
        dbsession.add(commit)
        dbsession.flush()
        current_report_row = CommitReport(commit_id=commit.id_)
        dbsession.add(current_report_row)
        dbsession.flush()
        report_service = ReportService({})
        res = report_service.create_report_upload(arguments, current_report_row)
        dbsession.flush()
        assert res.build_code == "646048900"
        assert (
            res.build_url
            == "http://github..../actions/runs/646048900"
        )
        assert res.env is None
        assert res.job_code == "thisjob"
        assert (
            res.name
            == "this name contains more than 100 chars 1111111111111111111111111111111111111111111111111111111111111"
        )
        assert res.provider == "github-actions"
        assert res.report_id == current_report_row.id_
        assert res.state == "started"
        assert (
            res.storage_path
            == ".../C00AE6C87E34AF41A6D38D154C609782/1280bf4b8d596f41b101ac425758226c021876da/6e2b6449-4e60-43f8-80ae-2c03a5c03d92.txt"
        )
        assert res.order_number is None
        assert res.totals is None
        assert res.upload_extras == {}
        assert res.upload_type == "uploaded"
>       assert len(res.flags) == 1
E       assert 0 == 1
E        +  where 0 = len([])
E        +    where [] = <database.models.reports.Upload object at 0x7fd855b0fd50>.flags

services/tests/test_report.py:3730: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

github-actions · 2025-01-13T23:07:52Z

✅ All tests successful. No failed tests were found.

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

…ads-creation

adrian-codecov · 2025-01-13T23:27:15Z

services/tests/test_report.py

@@ -3727,10 +3727,6 @@ def test_create_report_upload(self, dbsession):
        assert res.totals is None
        assert res.upload_extras == {}
        assert res.upload_type == "uploaded"
-        assert len(res.flags) == 1


This no longer gets evaluated by this test/method since we got rid of the override

giovanni-guidini · 2025-01-14T09:29:29Z

tasks/upload.py

+        report_service: ReportService,
+    ) -> list[UploadArguments]:
+        """
+        This method possibly batch inserts uploads, flags and user measurements.


we know it "possibly" does something from the name. What are the conditions that the upload will or will not be inserted?

It's described in the sentence after that, https://github.com/codecov/worker/pull/1006/files#diff-7ea195247ce946074a2441630db2be4bda78feed427fc0a1f1bd170cfc84a7a7R557 😅, only happens for v4 uploads since cli uploads are created in api. Should I rephrase it a bit so it's clearer?

giovanni-guidini · 2025-01-14T09:34:46Z

tasks/upload.py

+        self,
+        db_session: Session,
+        repoid: int,
+        upload_flag_map: dict[Upload, list | str | None] = {},


using a mutable structure as default arg value is not a good idea. The same dict instance will be shared across all function calls.

It's better to make the default None and check for that, and then assign the real default of {} if needed.

upload_flag_map: dict[Upload, list | str | None] | None = None, ... if upload_flag_map is None: upload_flag_map = {}

Thanks! Good suggestion, and my bad, you'd given me this protip some time ago as well 🫡

giovanni-guidini · 2025-01-14T09:35:06Z

tasks/upload.py

@@ -215,6 +215,13 @@ def _should_debounce_processing(upload_context: UploadContext) -> Optional[float
    return None


+class CreateUploadResponse(TypedDict):


How's the naming here? I was not amazed by it 😂

…ads-creation

Swatinem

personal preference:
I would like us to move the core business logic out of "celery tasks" to better decouple the thing we are doing from the specific task scheduling framework. This should help with whatever "platform revamp" we are aiming towards.

tasks/upload.py

Swatinem · 2025-01-14T15:49:10Z

tasks/upload.py

+        # List + helper mapping to track possible upload + flags to insert later
+        uploads_list: list[Upload] = []
+        upload_flag_map: dict[Upload, list | str | None] = {}
+
        for arguments in upload_context.arguments_list():
            arguments = upload_context.normalize_arguments(commit, arguments)
            if "upload_id" not in arguments:
                upload = report_service.create_report_upload(arguments, commit_report)


if I remember correctly, create_report_upload internally creates the Upload model, so I’m not sure the bulk_save_objects based on the uploads_list is effective at all?

Yeah you are right, I was doing that from a previous implementation.

I'd like to use the bulk_save but I need the upload's primary key for the "arguments["upload_id"] = upload.id_" line, so doing that's why we're doing the add/flush (and therefore the bulk_save is kinda redundant). I'll get rid of it for now, but can you think of a way to make it work?

I've been giving it some thought and the closest thing Ive thought of is to skip the add/flush, do an arguments_upload mapping (without the primary key), do the bulk_save, then requery the db for the uploads we just created and add the upload_id to the arguments based on the mapping, then continue execution. But that requerying seems inefficient and I feel makes this whole thing a bit more complicated.

I could just get rid of the bulk_save and still benefit from the other changes though

another idea here would be, since getting the arguments without an upload_id is one of the legacy upload paths:
we could just patch that legacy upload path to create the Upload object and trigger a PreProcess to do carry-forwarding, then we can delete the code responsible for this here.
its not ideal either :-(

Yeah this is another option. I think the counter of that is that we couldn't batch insert since it would be done at an upload level, that being said, we already do this with the CLI, so arguably another point of performance there.

To me though, it seems the responsibility to make an upload shouldn't be in API (although I get why we currently do it this way), instead done by an inexistent "upload service" that would come w/ the new services. I think worker is the closest thing to that atm, but yeah I preferred to do it here since most of the code was here.

tasks/upload.py

adrian-codecov · 2025-01-14T18:09:38Z

I would like us to move the core business logic out of "celery tasks" to better decouple the thing we are doing from the specific task scheduling framework. This should help with whatever "platform revamp" we are aiming towards.

I agree this is the general direction, just didn't contemplate this to be part of the scope for this ticket. My compromise was that by wrapping logic in more readable fns, I was thinking it could help the copy pasting when the time came to refactor the worker logic when the time comes.

Where would you suggest this could live, or how were you suggesting this be done otherwise? Curious of your thoughts 👀

Swatinem · 2025-01-15T08:15:09Z

I believe this could live in services.processing, maybe in a new file called orchestration.py or some such?

There is a bunch more code cleanup to be done here, related to some unification of how the uploads are being handled in API, like the normalize_arguments code can probably be removed completely by now, based on this comment:

worker/services/processing/types.py

Lines 24 to 26 in 5634caa

    
           # TODO(swatinem): remove these fields completely being passed from API: 
        
           # `redis_key` being removed in https://github.com/codecov/codecov-api/pull/960 
        
           redis_key: NotRequired[str]

adrian-codecov · 2025-01-15T16:34:12Z

Yeah that makes sense. I want to keep the code as close as it was in terms of functionality and/or structure and I'm also okay not moving the code for now to not over-abstract since we haven't really defined what the new services architecture would look like, unless you feel strongly about it. Just don't want to bloat the scope of the work here for now.

I can leave a comment in the code or something to ensure we capture this moving into a "processing" type service eventually if that makes sense

adrian-codecov added 4 commits January 10, 2025 22:22

Batching uploads + flag creation

ebec451

adjust logic wip

fd63531

Merge branch 'main' of github.com:codecov/worker into 3210-batch-uplo…

5275394

…ads-creation

Add changes to make it a bit more readable

4c5b5de

adrian-codecov added 2 commits January 13, 2025 17:26

Adjust test

be467b5

Merge branch 'main' of github.com:codecov/worker into 3210-batch-uplo…

6dadb74

…ads-creation

adrian-codecov commented Jan 13, 2025

View reviewed changes

adrian-codecov mentioned this pull request Jan 14, 2025

bulkc reate flags #994

Closed

giovanni-guidini reviewed Jan 14, 2025

View reviewed changes

adrian-codecov added 2 commits January 14, 2025 09:44

Add logic to conditionally create dict rather than default its value

b91fc5c

Merge branch 'main' of github.com:codecov/worker into 3210-batch-uplo…

3034d71

…ads-creation

Swatinem reviewed Jan 14, 2025

View reviewed changes

Address feedback

0555c32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3210 batch uploads creation #1006

3210 batch uploads creation #1006

adrian-codecov commented Jan 13, 2025

codecov bot commented Jan 13, 2025 •

edited

Loading

codecov-staging bot commented Jan 13, 2025 •

edited by codecov-notifications bot

Loading

codecov-qa bot commented Jan 13, 2025 •

edited

Loading

codecov-public-qa bot commented Jan 13, 2025 •

edited

Loading

github-actions bot commented Jan 13, 2025

adrian-codecov Jan 13, 2025

giovanni-guidini Jan 14, 2025

adrian-codecov Jan 14, 2025

giovanni-guidini Jan 14, 2025

adrian-codecov Jan 14, 2025

giovanni-guidini Jan 14, 2025

adrian-codecov Jan 14, 2025

Swatinem left a comment

Swatinem Jan 14, 2025

adrian-codecov Jan 14, 2025

Swatinem Jan 15, 2025

adrian-codecov Jan 15, 2025

adrian-codecov commented Jan 14, 2025

Swatinem commented Jan 15, 2025

adrian-codecov commented Jan 15, 2025

		@@ -215,6 +215,13 @@ def _should_debounce_processing(upload_context: UploadContext) -> Optional[float
		return None


		class CreateUploadResponse(TypedDict):

3210 batch uploads creation #1006

Are you sure you want to change the base?

3210 batch uploads creation #1006

Conversation

adrian-codecov commented Jan 13, 2025

Legal Boilerplate

codecov bot commented Jan 13, 2025 • edited Loading

Codecov Report

codecov-staging bot commented Jan 13, 2025 • edited by codecov-notifications bot Loading

Codecov Report

codecov-qa bot commented Jan 13, 2025 • edited Loading

❌ 1 Tests Failed:

codecov-public-qa bot commented Jan 13, 2025 • edited Loading

❌ 1 Tests Failed:

github-actions bot commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Swatinem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrian-codecov commented Jan 14, 2025

Swatinem commented Jan 15, 2025

adrian-codecov commented Jan 15, 2025

codecov bot commented Jan 13, 2025 •

edited

Loading

codecov-staging bot commented Jan 13, 2025 •

edited by codecov-notifications bot

Loading

codecov-qa bot commented Jan 13, 2025 •

edited

Loading

codecov-public-qa bot commented Jan 13, 2025 •

edited

Loading