[RHAIENG-4591] ci(piplock-renewal) updates piplock-renewal to skip pr it changes are only timestamps#3419
[RHAIENG-4591] ci(piplock-renewal) updates piplock-renewal to skip pr it changes are only timestamps#3419
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe pull request modifies the Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Actionable IssuesToken exposure risk in workflow logs: Review whether Diff filtering logic validation: Verify the regex or pattern matching for detecting Silent exit behavior: Confirm the conditional exit-before-PR-creation doesn't mask legitimate failures. Add explicit logging when the workflow exits due to filtered-out diffs for visibility. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f2dd98c to
7d2321d
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3419 +/- ##
=====================================
Coverage 3.59% 3.59%
=====================================
Files 29 29
Lines 3310 3310
Branches 527 527
=====================================
Hits 119 119
Misses 3189 3189
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
7d2321d to
1154df8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
manifests/tools/update_imagestream_annotations_from_pylock.py (1)
430-435: Resolve the canonical full SHA before checking local availability.Line 433 checks the original short prefix first. A local object with the same 7+ char prefix can make this skip the canonical fetch and later read annotations from the wrong commit. Resolve against the canonical GitHub repo first, then use that resolved revision for both
cat-fileandgit show.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines 430 - 435, In _ensure_commit_from_canonical_upstream, resolve the canonical full SHA from the remote before checking local availability: call _git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id (detail), use that resolved_rev for subsequent checks and for any git show/cat-file operations instead of the original short prefix; only then call _git_commit_exists(resolved_rev) and return (True, None) if present locally, otherwise return the fetch result/detail so callers use the canonical full SHA. Ensure you reference _ensure_commit_from_canonical_upstream, _git_fetch_commit_from, and _git_commit_exists when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/piplock-renewal.yaml:
- Around line 118-121: The workflow currently restores only the worktree with
the command referencing git restore --source=HEAD --worktree . Add a step to
also reset the index/staging area so staged regenerated lockfiles are cleared
(e.g., run the equivalent of git restore --source=HEAD --staged . or git reset
-- .) after the worktree restore in the block that follows the successful
lockfile metadata-only check (the block that calls
scripts/ci/lockfile_renewal_diff_is_metadata_only.py) so both worktree and index
are returned to HEAD.
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The current code unconditionally accepts a 40-char SHA
from body.get("sha") but doesn't verify it matches the requested short SHA/ref;
update the logic in the function that reads sha from body (the body.get("sha") /
sha variable) to also check that the returned full SHA startswith the requested
short SHA or ref (e.g., compare sha.lower().startswith(requested_ref.lower() or
short_sha.lower())), and only return sha.lower() when that prefix check passes;
otherwise treat it as a mismatch (raise/return None or handle as existing error
path).
---
Nitpick comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 430-435: In _ensure_commit_from_canonical_upstream, resolve the
canonical full SHA from the remote before checking local availability: call
_git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id
(detail), use that resolved_rev for subsequent checks and for any git
show/cat-file operations instead of the original short prefix; only then call
_git_commit_exists(resolved_rev) and return (True, None) if present locally,
otherwise return the fetch result/detail so callers use the canonical full SHA.
Ensure you reference _ensure_commit_from_canonical_upstream,
_git_fetch_commit_from, and _git_commit_exists when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 61a539fd-1849-41df-a53e-6447130375c1
📒 Files selected for processing (4)
.github/workflows/piplock-renewal.yamlmanifests/tools/update_imagestream_annotations_from_pylock.pyscripts/ci/lockfile_renewal_diff_is_metadata_only.pytests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py
1154df8 to
cf369cb
Compare
cf369cb to
7aa0560
Compare
There was a problem hiding this comment.
@coderabbitai can we do this in fewer lines of code? read suggestion below; then suggest 3 more alternatives
Proposed fix: git diff -I regex
Git (2.36+) supports -I <regex> which ignores hunks where all removed/added lines match the regex. ubuntu-latest (24.04) ships Git 2.43+, so this is available.
Replace the current check in the "Create Pull Request" step with:
- name: Create Pull Request
env:
GH_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}
run: |
git add .
if git diff --cached --quiet; then
echo "No changes to commit."
exit 0
fi
# Skip PR when the only changes are --exclude-newer timestamps in pylock headers
if git diff --cached -I '^#.*--exclude-newer' --quiet; then
echo "Only --exclude-newer timestamp changes detected, skipping PR."
exit 0
fi
BRANCH_NAME="lockfile-update-$(date +%Y%m%d-%H%M)"
...The logic becomes:
git diff --cached --quiet-- if no changes at all, bail (existing check)git diff --cached -I '^#.*--exclude-newer' --quiet-- if changes exist but ALL of them are the timestamp comment line, bail (new check)- Otherwise, real changes exist, proceed with commit + PR
Why this works
- The
--exclude-newertimestamp only appears in a#-prefixed comment line in eachpylock.toml - If actual packages changed, there will be additional hunks with
[[packages]]/ hash / URL changes that do NOT match the regex - ImageStream annotations, Dockerfiles, and kustomization only change when actual package versions change, so those would also survive the filter
- No changes needed to the lock generator itself
There was a problem hiding this comment.
@jiridanek, the git diff -I approach is the cleanest. Here are three more alternatives, ordered from most to least concise:
Alt 1 — grep pipeline on the patch (no git-version dependency)
if git diff --cached | grep -E '^[+-][^+-]' | grep -qvE '^[+-][[:space:]]*#.*--exclude-newer'; then
echo "Substantive changes detected, creating PR."
else
echo "Only timestamp comments changed, skipping PR."
exit 0
fiHow it works: Selects only added/removed content lines from the diff (^[+-][^+-] skips the ---/+++ header lines), then checks whether any of them do not match the timestamp comment. Pure POSIX shell, works on any git version. The double-negative (grep -qv) short-circuits on the first mismatch.
Risk: Regex must be tight enough; a line like +# --exclude-newer is ignored would be a false negative. Combine with -E '^[+-][[:space:]]*#[[:space:]]*uv pip compile.*--exclude-newer' to be more precise.
Alt 2 — sed normalize-in-place → git diff → restore
find . -name '*.toml' -exec sed -i \
's/--exclude-newer=[^[:space:]]*/--exclude-newer=PLACEHOLDER/g' {} +
if git diff --cached --quiet; then
echo "Only timestamp changes, skipping PR."
git restore --staged --worktree . # undo the sed mutation
exit 0
fi
git restore --staged --worktree . # restore before real commit pathHow it works: Mutates the files in-place via sed, re-runs git diff --cached (which now sees normalized content), then restores unconditionally. Any real package change survives normalization and makes the diff non-empty.
Risk: The git restore must happen on both the exit-0 and the fall-through path, or you'll commit the sed-mutated files. A trap 'git restore --staged --worktree .' EXIT at the top of the step is safer.
Alt 3 — diff with process substitution (no file mutation)
all_metadata=true
while IFS= read -r f; do
if ! diff -q \
<(git show HEAD:"$f" | sed 's/--exclude-newer=[^[:space:]]*/TS/g') \
<(sed 's/--exclude-newer=[^[:space:]]*/TS/g' "$f") \
> /dev/null 2>&1; then
all_metadata=false
break
fi
done < <(git diff --cached --name-only)
if $all_metadata; then
echo "Only timestamp changes, skipping PR."
exit 0
fiHow it works: For each staged file, streams both the HEAD blob and the worktree file through an inline sed normalization and compares them with diff -q. Nothing is written back to disk, so no restore step is needed.
Risk: Process substitution (<(…)) is bash-specific (not POSIX sh), but ubuntu-latest runners use bash by default. New files (not present in HEAD) will cause git show HEAD:"$f" to fail; add a guard if new lockfiles can appear.
Comparison
| Approach | Lines in YAML | Git version | Modifies files | Handles created-at |
|---|---|---|---|---|
git diff -I (proposed) |
~3 | ≥ 2.36 | No | Needs second -I flag |
grep pipeline |
~4 | Any | No | Needs second grep -vE clause |
sed + normalize-in-place |
~7 | Any | Yes (restored) | Extend the sed expression |
diff + process substitution |
~10 | Any | No | Extend the sed expression |
The -I proposal wins on brevity. To also handle created-at headers you'd chain a second flag:
git diff --cached \
-I '^#.*--exclude-newer' \
-I '^created-at[[:space:]]*=' \
--quietAll four approaches let you delete scripts/ci/lockfile_renewal_diff_is_metadata_only.py and its test module entirely.
There was a problem hiding this comment.
@coderabbitai the thing is, two files may have substantive updates, three others just modify timestamp; what we want is undo the three other files and commit the two with substantive update
There was a problem hiding this comment.
@jiridanek, good clarification — the per-file mixed case is exactly why the whole-diff boolean isn't enough. Here are the approaches, starting with the most concise.
Primary: per-file git diff -I loop (replaces the Python script entirely)
- name: Discard ephemeral-only lockfile changes
run: |
while IFS= read -r f; do
if git diff HEAD \
-I '^#.*--exclude-newer' \
-I '^created-at[[:space:]]*=' \
--quiet -- "$f"; then
git restore --source=HEAD -- "$f"
echo "Restored (metadata-only): $f"
else
echo "Keeping (substantive): $f"
fi
done < <(git diff --name-only HEAD)git diff -I <pattern> --quiet -- "$f" exits 0 when every modified hunk in that file has all its changed lines matching the pattern (i.e. the file is effectively unchanged after ignoring those lines). Files with real package changes will have [[packages]]/hash hunks that don't match, so they survive. Metadata-only files get individually restored before the PR step.
Alt: modify the existing Python script for per-file restore
If you want to keep the Python+tests approach (for correctness guarantees on the normalization logic), change diff_vs_head_is_metadata_only to restore as it goes:
def restore_ephemeral_only_files(*, repo_root: Path) -> bool | None:
"""Restore files whose only changes are ephemeral metadata. Returns True if any were restored."""
paths = _changed_paths_vs_head(repo_root)
if paths is None:
return None
restored_any = False
for path in paths:
if _is_binary_change(repo_root, path):
continue # leave binary changes alone
head_bytes = _blob_at_head(repo_root, path)
if head_bytes is None:
continue
work_path = repo_root / path
if not work_path.is_file():
continue
try:
head_text = head_bytes.decode("utf-8")
work_text = work_path.read_text(encoding="utf-8")
except UnicodeDecodeError:
continue
if normalize_ephemeral_lockfile_text(head_text) == normalize_ephemeral_lockfile_text(work_text):
subprocess.run(["git", "restore", "--source=HEAD", "--worktree", "--", path], cwd=repo_root, check=True)
print(f"Restored (metadata-only): {path}")
restored_any = True
else:
print(f"Keeping (substantive): {path}")
return restored_anyThe workflow step then unconditionally calls the script (no if/then branching needed) and lets the subsequent git add . && git diff --cached --quiet guard handle the "nothing left to commit" case.
Comparison
| Approach | Lines in YAML | Python script needed | Per-file restore | Testable normalization logic |
|---|---|---|---|---|
git diff -I loop |
~10 | No | ✅ | ❌ (regex in YAML) |
| Modified Python script | ~3 | Yes (modified) | ✅ | ✅ |
The git diff -I loop is the most concise for the workflow, but if the created-at / --exclude-newer normalization patterns are likely to evolve, keeping them in the tested Python script is safer long-term.
There was a problem hiding this comment.
Yeah agreed with @jiridanek, using git diff is good enough to find the diff, make code changes smaller.
There was a problem hiding this comment.
@jiridanek, good instinct to reconsider. The existing issue #3179 in this repo already documents the same root cause precisely: uv pip compile without --upgrade is no longer a stable no-op in CI — when the AIPCC index publishes new rebuild wheels (e.g. pyarrow-23.0.1-9 alongside the existing -8), uv picks them up.
The issue already identified the correct fix: "always pass --exclude-newer to uv pip compile, in both CI checks and fresh locking" — for CI checks, reuse the timestamp from when the lockfile was last intentionally locked. The open question was where to store the timestamp — extracted from the lockfile header comment, embedded in a sidecar file, or passed as an env variable.
Here's how each of your proposed alternatives stacks up:
Option A — --custom-compile-command (suppress the header churn)
--custom-compile-command sets the header comment at the top of the output file generated by uv pip compile and is intended to reflect custom build scripts that wrap it. In practice, using it would replace the entire # uv pip compile ... --exclude-newer=<timestamp> line with a fixed string — the header becomes deterministic and stops churning. But it also means the timestamp is no longer stored in the lockfile at all, so CI has no way to read it back. You'd need a second storage location anyway.
Option B — .timestamp sidecar file (recommended)
Store a single exclude-newer = <iso-timestamp> value in a committed sidecar (e.g. pylock.timestamp or a repo-level uv-lock-timestamp.env). Both jobs read from it:
check-generated-code: reads the sidecar and passes--exclude-newer=$(cat pylock.timestamp)→ deterministic, no spurious wheel pickups.make refresh-lock-files: updates the sidecar tonow, then compiles → only the sidecar file and actual package changes appear in the renewal PR.
Combined with --custom-compile-command make refresh-lock-files (to make the header a fixed string), the pylock.toml files become fully stable between renewals that don't change pins. This eliminates the need for the metadata-only diff detection script entirely.
Option C — exclude-newer in pyproject.toml
exclude-newer is also supported in pyproject.toml under [tool.uv], though when specified in persistent configuration, local date times are not allowed. This approach works for a single global cutoff but is awkward if different notebooks need different timestamps (e.g. after a partial upgrade).
Option D — UV_EXCLUDE_NEWER environment variable
UV_EXCLUDE_NEWER is equivalent to the --exclude-newer command-line argument. The sidecar approach can write to this env var rather than a CLI flag — either works.
Recommendation
The sidecar file (Option B) combined with --custom-compile-command is the most robust:
pylock.timestamp(one per lockfile group, or a single repo-level one) — committed, human-readable, machine-parseable.- CI check reads it:
--exclude-newer=$(cat pylock.timestamp)— no new wheels, idempotent. - Renewal job sets it to now:
date -u +%Y-%m-%dT%H:%M:%SZ > pylock.timestamp— only changes when a renewal actually runs. --custom-compile-commandstrips the ephemeral timestamp from thepylock.tomlheader line — no more header churn.
This makes the entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script unnecessary.
There was a problem hiding this comment.
@coderabbitai the timestamp written to file should not be the timestamp when the locking happened, but actually the timestamp when the most recent package in the lockfile was published
that's enough to keep the churn out; and that means we actually don't need to store the timestamp; it's enough if the ci script extracts/computes it and then uses it
I remember that lockfiles did not have the timestamps before I added the --before flag... but maybe that's actually something that has to do with the url, upload-time = 2026-04-01T00:22:14Z, that stuff was missing, maybe it's because now we say we want json in pylocks_generator.py.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find pylocks_generator.py
fd pylocks_generator.py
# Check for upload-time in any pylock files
rg "upload.time" --type toml -l | head -5
# Check the pylocks_generator for JSON/format flags
fd pylocks_generator.py --exec cat {}
# Check uv pip compile invocations for relevant flags
rg "exclude.newer|upload.time|--emit-index-annotation|--emit-find-links|--output-format" -n -C2 | head -60
Repository: opendatahub-io/notebooks
Length of output: 34638
🏁 Script executed:
#!/bin/bash
# Check if pylock.toml files have upload-time fields
rg "upload.time" --iglob "*.toml" -l | head -10
rg "upload.time" --iglob "*.toml" -m 3 | head -20
# Check a sample pylock.toml for its structure
fd "pylock.cpu.toml" | head -3 | xargs -I{} head -30 {}
# Check if uv pip compile pylock.toml format includes upload-time
rg "upload.time|upload_time" scripts/pylocks_generator.py
# Check for any max upload-time computation or related logic
rg "upload.time|max_upload|newest" scripts/ -n -C2
Repository: opendatahub-io/notebooks
Length of output: 16529
@jiridanek, confirmed — upload-time is already present in every wheel entry in the committed pylock.toml files (it appeared because of the ?format=json addition that forces Pulp to return PEP 691 JSON):
{ url = "...aiohttp-3.13.5-2-cp312-cp312-linux_x86_64.whl", upload-time = 2026-04-01T00:22:14Z, ... }Your instinct is exactly right. The current --exclude-newer header value is the wall-clock at lock time (2026-04-16T12:35:57Z), while the actual newest package in that lock uploaded on 2026-04-01T00:30:57Z — a ~15-day window where AIPCC can publish new rebuild wheels (e.g., -3, -4) that cause spurious CI diffs with no real dependency changes.
Using max(upload-time) across all entries is the tightest stable cutoff: it's derived from content, not from when you ran the command, and it only advances when a genuinely newer package lands in the lockfile.
Concrete change to pylocks_generator.py
Add a parser and wire it into resolve_exclude_newer:
_UPLOAD_TIME_RE = re.compile(r"\bupload-time\s*=\s*\"?(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z)\"?")
def parse_max_upload_time_from_lockfile(path: Path) -> str | None:
"""Return the latest upload-time found in a pylock.toml, or None."""
if not path.is_file():
return None
try:
text = path.read_text(encoding="utf-8", errors="replace")
except OSError:
return None
times = _UPLOAD_TIME_RE.findall(text)
return max(times) if times else None
def resolve_exclude_newer(lockfile: Path, *, ci_check: bool, live_timestamp: str) -> str:
if not ci_check:
return live_timestamp
# Prefer max(upload-time) from content — stable and content-derived.
# Fall back to the header comment timestamp, then to live_timestamp.
return (
parse_max_upload_time_from_lockfile(lockfile)
or parse_exclude_newer_from_lockfile(lockfile)
or live_timestamp
)The regex handles both formats found in the repo: pylock.toml uses upload-time = 2026-04-01T00:22:14Z (unquoted), while uv.lock uses upload-time = "2026-04-01T00:22:14Z" (quoted).
What this eliminates
| Problem | Today | With max(upload-time) |
|---|---|---|
Header --exclude-newer churn |
Triggers lockfile_renewal_diff_is_metadata_only.py |
Header still changes, but CI never reads it |
| Rebuild wheels picked up in CI | Yes, within the ~15-day gap | No — cutoff is tight to actual content |
| Sidecar timestamp file needed | No | No |
scripts/ci/lockfile_renewal_diff_is_metadata_only.py |
Required | Can be deleted |
The entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script and its test module become unnecessary once CI uses max(upload-time) as the cutoff. The --exclude-newer header comment in the generated file will still change on each renewal run (since the locking timestamp updates), but that's benign — no CI check reads it anymore.
One small caveat: public-index pylock.toml files (e.g., runtimes/rocm-tensorflow/ubi9-python-3.12/pylock.toml) also have upload-time from PyPI, so parse_max_upload_time_from_lockfile works for both RH-index and public-index paths.
There was a problem hiding this comment.
@coderabbitai any reason not to load the lockfile as toml? why parse regex?
There was a problem hiding this comment.
@jiridanek, there's no good reason — TOML parsing is strictly better here. I defaulted to regex out of caution about dependencies, but that's unnecessary since tomllib is stdlib from Python 3.11+.
With TOML parsing the upload-time values come back as proper datetime objects, so max() is a direct comparison rather than a lexicographic string sort (which happens to work for ISO-8601 but is fragile):
import tomllib
from datetime import datetime, timezone
from pathlib import Path
def parse_max_upload_time_from_lockfile(path: Path) -> datetime | None:
"""Return the latest upload-time across all wheel entries in a pylock.toml."""
if not path.is_file():
return None
try:
with path.open("rb") as f:
data = tomllib.load(f)
except (OSError, tomllib.TOMLDecodeError):
return None
times: list[datetime] = []
for pkg in data.get("packages", []):
for wheel in pkg.get("wheels", []):
t = wheel.get("upload-time")
if isinstance(t, datetime):
times.append(t)
return max(times) if times else NoneThen at the call site:
max_t = parse_max_upload_time_from_lockfile(lockfile)
exclude_newer = max_t.strftime("%Y-%m-%dT%H:%M:%SZ") if max_t else live_timestampThe only edge case worth noting: tomllib requires the file to be opened in binary mode ("rb"), which the snippet above already does. For Python < 3.11 the drop-in backport is tomli (pip install tomli), but given this repo uses uv with modern runtimes, 3.11+ is safe to assume.
7aa0560 to
3551576
Compare
3551576 to
82cc4e7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py`:
- Around line 76-86: Add a third param case to test_main_exit_code that mocks
md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns
the error exit code; update the param list to include pytest.param(None, 2,
id="git-or-read-error") and change the monkeypatch lambda in the test to return
None when main_code is None (use lambda repo_root: None) while keeping the
existing metadata-only and substantive cases and the md.ROOT patch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 73c1df7f-1187-4533-b159-11e149c6a74e
📒 Files selected for processing (4)
.github/workflows/piplock-renewal.yamlmanifests/tools/update_imagestream_annotations_from_pylock.pyscripts/ci/lockfile_renewal_diff_is_metadata_only.pytests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py
✅ Files skipped from review due to trivial changes (1)
- scripts/ci/lockfile_renewal_diff_is_metadata_only.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/piplock-renewal.yaml
- manifests/tools/update_imagestream_annotations_from_pylock.py
| @pytest.mark.parametrize( | ||
| ("main_code", "expected"), | ||
| [ | ||
| pytest.param(0, 0, id="metadata-only"), | ||
| pytest.param(1, 1, id="substantive"), | ||
| ], | ||
| ) | ||
| def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None: | ||
| monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0) | ||
| monkeypatch.setattr(md, "ROOT", Path("/tmp")) | ||
| assert md.main() == expected |
There was a problem hiding this comment.
Cover the None result from diff_vs_head_is_metadata_only.
Line 84 collapses the mocked result to True/False, so the git/read-error branch in main() is untested. Add the None case directly.
Proposed test update
`@pytest.mark.parametrize`(
- ("main_code", "expected"),
+ ("diff_result", "expected"),
[
- pytest.param(0, 0, id="metadata-only"),
- pytest.param(1, 1, id="substantive"),
+ pytest.param(True, 0, id="metadata-only"),
+ pytest.param(False, 1, id="substantive"),
+ pytest.param(None, 1, id="git-error"),
],
)
-def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None:
- monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0)
+def test_main_exit_code(
+ monkeypatch: pytest.MonkeyPatch, diff_result: bool | None, expected: int
+) -> None:
+ monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: diff_result)
monkeypatch.setattr(md, "ROOT", Path("/tmp"))
assert md.main() == expected🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py` around
lines 76 - 86, Add a third param case to test_main_exit_code that mocks
md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns
the error exit code; update the param list to include pytest.param(None, 2,
id="git-or-read-error") and change the monkeypatch lambda in the test to return
None when main_code is None (use lambda repo_root: None) while keeping the
existing metadata-only and substantive cases and the md.ROOT patch.
8ad4dec to
93008d5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
manifests/tools/update_imagestream_annotations_from_pylock.py (2)
150-152:⚠️ Potential issue | 🟠 MajorReject GitHub refs that do not expand the requested SHA prefix.
commits/{ref}may resolve a SHA, branch, or tag ref, so a hex-like branch/tag can return an unrelated commit. Only acceptbody["sha"]when it is 40 hex chars and starts withrev_clean; GitHub documentsrefas accepting commit SHA, branch, or tag refs: https://docs.github.com/en/rest/commits/commits#get-a-commit.Proposed fix
sha = body.get("sha") - if isinstance(sha, str) and len(sha) == 40: - return sha.lower() + if isinstance(sha, str): + sha = sha.lower() + if len(sha) == 40 and sha.startswith(rev_clean): + return sha🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines 150 - 152, The code currently accepts any 40-char string from body.get("sha"); change the validation to also ensure the returned SHA is hex and begins with the expected prefix variable rev_clean: after obtaining sha = body.get("sha"), confirm isinstance(sha, str), len(sha) == 40, sha.lower().startswith(rev_clean), and that sha consists only of hex characters (e.g., via a regex or all(c in hexdigits)), then return sha.lower(); otherwise reject/return None so refs that resolve to unrelated branch/tag SHAs are not accepted.
430-438:⚠️ Potential issue | 🟠 MajorExpand short SHA to full OID before existence checks and git operations.
Short SHA collision risk is real. While
_git_fetch_commit_frominternally calls_rev_for_git_fetchto resolve short SHAs to full OIDs,_git_commit_exists(rev)at line 434 checks with the short SHA directly, and_git_show_first_existing(sha, ...)at line 742 uses the unexpanded short ref. If the short SHA collides with a different local object,cat-file -eandgit showmay match the wrong commit. Resolve once via_rev_for_git_fetch, capture the full OID, and propagate it through all git operations.Proposed fix
-def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str | None]: +def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str, str | None]: """Ensure ``rev`` is available for ``git show`` via ``git fetch`` from the variant canonical repo.""" url = _CANONICAL_REPO_URL[variant] - if _git_commit_exists(rev): - return True, None - ok, detail = _git_fetch_commit_from(url, rev) + resolved_rev = _rev_for_git_fetch(url, rev) + if _git_commit_exists(resolved_rev): + return True, resolved_rev, None + ok, detail = _git_fetch_commit_from(url, resolved_rev) if ok: - return True, None - return False, detail + return True, resolved_rev, None + return False, resolved_rev, detail @@ - ok_upstream, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha) + ok_upstream, git_rev, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha) @@ - print(f"ok {path.name} tag {idx}: commit {sha} is available locally", file=sys.stderr) - shown = _git_show_first_existing(sha, rel_paths) + print(f"ok {path.name} tag {idx}: commit {git_rev} is available locally", file=sys.stderr) + shown = _git_show_first_existing(git_rev, rel_paths)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines 430 - 438, The function _ensure_commit_from_canonical_upstream must expand a possibly short rev to a full OID first using _rev_for_git_fetch (or the same resolver used by _git_fetch_commit_from), then use that resolved full OID for existence checks and fetches; change _ensure_commit_from_canonical_upstream to call _rev_for_git_fetch(url, rev) to get full_rev, then call _git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and return/propagate the full OID to callers so _git_show_first_existing and any other git operations always receive the expanded full OID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 416-427: The git fetch subprocess call in
update_imagestream_annotations_from_pylock.py can hang; modify the
subprocess.run invocation (the call that builds cmd from
_rev_for_git_fetch/fetch_rev and runs git fetch) to pass a reasonable timeout
value and wrap the call to catch subprocess.TimeoutExpired; on timeout return
False with an error string that indicates a timeout and includes the quoted
command (same format as current error path), and preserve existing handling for
non-zero returncodes and stdout/stderr.
---
Duplicate comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The code currently accepts any 40-char string from
body.get("sha"); change the validation to also ensure the returned SHA is hex
and begins with the expected prefix variable rev_clean: after obtaining sha =
body.get("sha"), confirm isinstance(sha, str), len(sha) == 40,
sha.lower().startswith(rev_clean), and that sha consists only of hex characters
(e.g., via a regex or all(c in hexdigits)), then return sha.lower(); otherwise
reject/return None so refs that resolve to unrelated branch/tag SHAs are not
accepted.
- Around line 430-438: The function _ensure_commit_from_canonical_upstream must
expand a possibly short rev to a full OID first using _rev_for_git_fetch (or the
same resolver used by _git_fetch_commit_from), then use that resolved full OID
for existence checks and fetches; change _ensure_commit_from_canonical_upstream
to call _rev_for_git_fetch(url, rev) to get full_rev, then call
_git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and
return/propagate the full OID to callers so _git_show_first_existing and any
other git operations always receive the expanded full OID.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 111b4c2f-c328-49b7-b94d-4460f6007cc6
📒 Files selected for processing (2)
.github/workflows/piplock-renewal.yamlmanifests/tools/update_imagestream_annotations_from_pylock.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/piplock-renewal.yaml
93008d5 to
1915616
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: b930fa68-87aa-400e-8236-75f3c5de3d52
📒 Files selected for processing (2)
.github/workflows/piplock-renewal.yamlmanifests/tools/update_imagestream_annotations_from_pylock.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/piplock-renewal.yaml
1915616 to
4eb4e52
Compare
|
@mtchoum1 I am not sure if you are still working on this PR, but for this PR it shouldn't be this complicated, here is my untested version.
|
4eb4e52 to
7635098
Compare
7635098 to
b48f64c
Compare
b48f64c to
0e6e33c
Compare

Description
How Has This Been Tested?
./uv run pytest tests/unit/scripts/ci/test_lockfile_renewal_diff_is_metadata_only.py (unit coverage for metadata-only detection)
GitHub Actions: Ran Lock Files Renewal Action on the fork; job refresh-lock-files completed successfully — run log.
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit