Skip to content

[WIP] feat: Filter copr_build jobs based on paths and files changed #2780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LecrisUT
Copy link
Contributor

@LecrisUT LecrisUT commented Apr 16, 2025

TODO:

  • Skip the check if /packit command is used
  • Actually check the files changed
  • Write new tests or update the old ones to cover new functionality.
  • Update or write new documentation in packit/packit.dev.

Depends-on: packit/ogr#921

Fixes packit/packit#1997
Fixes #2006

RELEASE NOTES BEGIN

copr_build jobs are triggered only if there are changed files under the paths field

RELEASE NOTES END

Comment on lines 170 to 174
# FIXME: Implement the relevant git diff
return []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently stuck on this part. Isn't the git repo cloned somewhere when the package_config was evaluated? I was hoping on using that to do the git diff, but I haven't navigated to such a part. Should that be an interface in ogr instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalProjectMixin should give you self.local_project.git_repo which is git.Repo.

class LocalProjectMixin(Config):
_local_project: Optional[LocalProject] = None
@property
def local_project(self) -> LocalProject:
if not self._local_project:
builder = LocalProjectBuilder(
cache=(
RepositoryCache(
cache_path=self.service_config.repository_cache,
add_new=self.service_config.add_repositories_to_repository_cache,
)
if self.service_config.repository_cache
else None
),
)
working_dir = Path(
Path(self.service_config.command_handler_work_dir) / SANDCASTLE_LOCAL_PROJECT_DIR,
)
kwargs = {
"repo_name": CALCULATE,
"full_name": CALCULATE,
"namespace": CALCULATE,
"working_dir": working_dir,
"git_repo": CALCULATE,
}
if self.project:
kwargs["git_project"] = self.project
else:
kwargs["git_url"] = self.project_url
self._local_project = builder.build(**kwargs)
return self._local_project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on our arch meeting we were discussing we could get this rather via API, this would mean implementing the method for getting a commit in ogr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can go either way on LocalProject vs ogr

ogr

pro

  • easier interface
  • don't have to deal with fork fetches

con

  • it downloads the diff content thatis ignored
  • may be slower and rate limited because it goes through web api

LocalProject

pro

  • already used by osh (didn't check if we would be reusing it though)
  • still downloads the diffs, but it's more efficient at that
  • no rate/file limits

con

  • handling PRs is a bit clunky
  • if the git is not cached it could be more expensive to re-clone the project every time

I am leaning towards the ogr approach. What's your thoughts on the choice of these 2?

Copy link
Member

@lbarcziova lbarcziova Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the ogr/API approach, we would be checking this only if this is configured in the config, so I believe it would not be such a big additional API load in the end, and sounds more straightforward; I don't like cloning the repo already in the pre-check that much

Copy link
Contributor

Copy link
Contributor

Copy link
Contributor

This change depends on a change that failed to merge.

Change packit/ogr#921 is needed.

Comment on lines +159 to +161
elif self.job_config.trigger == JobConfigTriggerType.commit:
branch = self.job_config.branch or self.project.default_branch
changes = self.copr_build_helper.project.get_commit(branch)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to note here. If a PR is merged as a rebase, only the top commit will be checked for changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. Could we get and check all the commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Maybe it's possible. At least with Pagure we know that there is a link between commit and PR, and maybe there is one for Github as well. Is there a PR merged as rebase (didn't find better wording) in https://github.com/packit/hello-world ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use the commits directly from the webhook payload here, wdyt? It would require changing the parsing and extending the class. Then in the checker we would check this based on the event type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to handle it in ogr because it should be more generalized for Gitlab and other forges. We do have a hook we can use there:
https://github.com/PyGithub/PyGithub/blob/3657eeb9a002ccf90f4e86755e29345ba484369c/github/Commit.py#L282-L292

I managed to confirm that indeed it works for rebase commits on GH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same can also be done for Gitlab. Another benefit is having the information directly, without additional API calls.

We do have a hook we can use there:

is this deterministic though? (e.g. one commit associated to multiple PRs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same can also be done for Gitlab. Another benefit is having the information directly, without additional API calls.

True, but at the end of the day we are still making the calls to either the PRs or the commits to get the list of changed files.

Oh, I just realized that the push hook would catch multiple commits being pushed 🤔. Maybe we do need it.

is this deterministic though? (e.g. one commit associated to multiple PRs)

No, but it gives you all PRs associated with the commit (I think it's linked only if a PR closes multiple PRs as merged)

Comment on lines +180 to +184
for changed_file in self.get_files_changed():
# Check if any of the files changed are under the paths that are being tracked
if any(changed_file.is_relative_to(p) for p in paths):
return True
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what do we do if we have an empty commit, like the Fedora rebuild commits? Should it be a special case to skip this check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the Fedora rebuild commits?

those wouldn't be happening in upstream, so I don't think we have to think about that, or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if they want to have that behaviour. I occasionally do that 1, but if it's documented that it is or it is not supported, than that's fine as well.

Footnotes

  1. https://github.com/LecrisUT/FedoraRPM-atuin/commit/99555ccec9f6ed94fe474767e3b88f18aa835387

@@ -15,6 +15,7 @@ def __init__(
project_url: str,
commit_sha: str,
commit_sha_before: str,
commits: list[str],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing around with the idea. Probably will be better to make this a list[dataclass] and save the changed files specifically? Seems the same data is there both for Github and Gitlab. But do we not have a dataclass of the webhook data type from a third-party that we could use?

Copy link
Contributor

@lbarcziova
Copy link
Member

I briefly looked into the code and I am thinking if it wouldn't be more fitting to have the logic for checking this in this method, which would mean it would work generally, not just for Copr build jobs.

@LecrisUT
Copy link
Contributor Author

I briefly looked into the code and I am thinking if it wouldn't be more fitting to have the logic for checking this in this method, which would mean it would work generally, not just for Copr build jobs.

Probably, but it's not as extendable as with the Checker, needing another place to be aware of that check are being done. I see that there already are checks like pr_labels_match_configuration. What about extending the Checker design (or rather reimplement a similar/simpler class) or at least aggregate the checks in a list of lambdas?

@lbarcziova lbarcziova moved this from new to in-progress in Packit Kanban Board May 29, 2025
@lbarcziova
Copy link
Member

@LecrisUT can you elaborate more on what you are suggesting? I agree the code I linked could be refactored, but it is something different than the Checker's logic, as that is tight to the handlers, while the checks I linked are meant for actually getting the relevant job configurations to the particular event.

@LecrisUT
Copy link
Contributor Author

LecrisUT commented May 30, 2025

@LecrisUT can you elaborate more on what you are suggesting? I agree the code I linked could be refactored, but it is something different than the Checker's logic, as that is tight to the handlers, while the checks I linked are meant for actually getting the relevant job configurations to the particular event.

Indeed, not using the same Checker, but mainly not having all the checks implemented in the function, i.e. something like

    def get_jobs_matching_event(self) -> list[JobConfig]:
        jobs_matching_trigger = []
        checkers = JobChecker.get_all_checkers()
        for job in self.event.packages_config.get_job_views():
            if all(c.check(job) for c in checkers):
                jobs_matching_trigger.append(job)

The Checker design that I like is being able to just create a class and boom a new check is introduced, but I would design it a bit simpler, like

class JobChecker:
    all_checkers: Final[ClassVar[list[type[JobChecker]]]] = []

    def __init_subclass__(cls):
        # Register all checkers when the class is created
        cls.all_checkers.append(cls)

    @classmethod
    def get_all_checkers(cls) -> list[JobChecker]:
        return  [checker_cls() for checker_cls in cls.all_checkers]

    def check(job: JobView) -> bool: ...

(Of course with a better interface, constructor, etc.)

@lbarcziova
Copy link
Member

@LecrisUT understood, thanks for the explanation! Yes, that sounds like it could improve the readability and extensibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: in-progress
Development

Successfully merging this pull request may close these issues.

[RFE] Filter copr builds on path change When using monorepo, react only on changes done to configured paths
3 participants