-
Notifications
You must be signed in to change notification settings - Fork 53
[WIP] feat: Filter copr_build
jobs based on paths
and files changed
#2780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# FIXME: Implement the relevant git diff | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently stuck on this part. Isn't the git repo cloned somewhere when the package_config
was evaluated? I was hoping on using that to do the git diff
, but I haven't navigated to such a part. Should that be an interface in ogr
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocalProjectMixin
should give you self.local_project.git_repo
which is git.Repo
.
packit-service/packit_service/worker/mixin.py
Lines 212 to 246 in 11d4722
class LocalProjectMixin(Config): | |
_local_project: Optional[LocalProject] = None | |
@property | |
def local_project(self) -> LocalProject: | |
if not self._local_project: | |
builder = LocalProjectBuilder( | |
cache=( | |
RepositoryCache( | |
cache_path=self.service_config.repository_cache, | |
add_new=self.service_config.add_repositories_to_repository_cache, | |
) | |
if self.service_config.repository_cache | |
else None | |
), | |
) | |
working_dir = Path( | |
Path(self.service_config.command_handler_work_dir) / SANDCASTLE_LOCAL_PROJECT_DIR, | |
) | |
kwargs = { | |
"repo_name": CALCULATE, | |
"full_name": CALCULATE, | |
"namespace": CALCULATE, | |
"working_dir": working_dir, | |
"git_repo": CALCULATE, | |
} | |
if self.project: | |
kwargs["git_project"] = self.project | |
else: | |
kwargs["git_url"] = self.project_url | |
self._local_project = builder.build(**kwargs) | |
return self._local_project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can go either way on LocalProject
vs ogr
ogr
pro
- easier interface
- don't have to deal with fork fetches
con
- it downloads the diff content thatis ignored
- may be slower and rate limited because it goes through web api
LocalProject
pro
- already used by osh (didn't check if we would be reusing it though)
- still downloads the diffs, but it's more efficient at that
- no rate/file limits
con
- handling PRs is a bit clunky
- if the git is not cached it could be more expensive to re-clone the project every time
I am leaning towards the ogr approach. What's your thoughts on the choice of these 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the ogr/API approach, we would be checking this only if this is configured in the config, so I believe it would not be such a big additional API load in the end, and sounds more straightforward; I don't like cloning the repo already in the pre-check that much
Build failed. ❌ pre-commit FAILURE in 1m 48s |
Build failed. ✔️ pre-commit SUCCESS in 1m 52s |
Signed-off-by: Cristian Le <[email protected]>
This change depends on a change that failed to merge. Change packit/ogr#921 is needed. |
elif self.job_config.trigger == JobConfigTriggerType.commit: | ||
branch = self.job_config.branch or self.project.default_branch | ||
changes = self.copr_build_helper.project.get_commit(branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to note here. If a PR is merged as a rebase, only the top commit will be checked for changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. Could we get and check all the commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Maybe it's possible. At least with Pagure we know that there is a link between commit and PR, and maybe there is one for Github as well. Is there a PR merged as rebase (didn't find better wording) in https://github.com/packit/hello-world ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to handle it in ogr
because it should be more generalized for Gitlab and other forges. We do have a hook we can use there:
https://github.com/PyGithub/PyGithub/blob/3657eeb9a002ccf90f4e86755e29345ba484369c/github/Commit.py#L282-L292
I managed to confirm that indeed it works for rebase commits on GH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same can also be done for Gitlab. Another benefit is having the information directly, without additional API calls.
We do have a hook we can use there:
is this deterministic though? (e.g. one commit associated to multiple PRs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same can also be done for Gitlab. Another benefit is having the information directly, without additional API calls.
True, but at the end of the day we are still making the calls to either the PRs or the commits to get the list of changed files.
Oh, I just realized that the push hook would catch multiple commits being pushed 🤔. Maybe we do need it.
is this deterministic though? (e.g. one commit associated to multiple PRs)
No, but it gives you all PRs associated with the commit (I think it's linked only if a PR closes multiple PRs as merged)
for changed_file in self.get_files_changed(): | ||
# Check if any of the files changed are under the paths that are being tracked | ||
if any(changed_file.is_relative_to(p) for p in paths): | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what do we do if we have an empty commit, like the Fedora rebuild commits? Should it be a special case to skip this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like the Fedora rebuild commits?
those wouldn't be happening in upstream, so I don't think we have to think about that, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if they want to have that behaviour. I occasionally do that 1, but if it's documented that it is or it is not supported, than that's fine as well.
Footnotes
Signed-off-by: Cristian Le <[email protected]>
@@ -15,6 +15,7 @@ def __init__( | |||
project_url: str, | |||
commit_sha: str, | |||
commit_sha_before: str, | |||
commits: list[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing around with the idea. Probably will be better to make this a list[dataclass]
and save the changed files specifically? Seems the same data is there both for Github and Gitlab. But do we not have a dataclass of the webhook data type from a third-party that we could use?
Build failed. ✔️ pre-commit SUCCESS in 1m 54s |
I briefly looked into the code and I am thinking if it wouldn't be more fitting to have the logic for checking this in this method, which would mean it would work generally, not just for Copr build jobs. |
Probably, but it's not as extendable as with the Checker, needing another place to be aware of that check are being done. I see that there already are checks like |
@LecrisUT can you elaborate more on what you are suggesting? I agree the code I linked could be refactored, but it is something different than the Checker's logic, as that is tight to the handlers, while the checks I linked are meant for actually getting the relevant job configurations to the particular event. |
Indeed, not using the same def get_jobs_matching_event(self) -> list[JobConfig]:
jobs_matching_trigger = []
checkers = JobChecker.get_all_checkers()
for job in self.event.packages_config.get_job_views():
if all(c.check(job) for c in checkers):
jobs_matching_trigger.append(job) The class JobChecker:
all_checkers: Final[ClassVar[list[type[JobChecker]]]] = []
def __init_subclass__(cls):
# Register all checkers when the class is created
cls.all_checkers.append(cls)
@classmethod
def get_all_checkers(cls) -> list[JobChecker]:
return [checker_cls() for checker_cls in cls.all_checkers]
def check(job: JobView) -> bool: ... (Of course with a better interface, constructor, etc.) |
@LecrisUT understood, thanks for the explanation! Yes, that sounds like it could improve the readability and extensibility. |
TODO:
/packit
command is usedpackit/packit.dev
.Depends-on: packit/ogr#921
Fixes packit/packit#1997
Fixes #2006
RELEASE NOTES BEGIN
copr_build
jobs are triggered only if there are changed files under thepaths
fieldRELEASE NOTES END