[RHAIENG-4591] ci(piplock-renewal) updates piplock-renewal to skip pr it changes are only timestamps by mtchoum1 · Pull Request #3419 · opendatahub-io/notebooks

mtchoum1 · 2026-04-21T19:15:14Z

Description

piplock-renewal workflow 1: After make test, detect when the working tree differs from HEAD only because of ephemeral UV/pylock metadata (e.g. --exclude-newer=… timestamps). If so, restore the worktree and do not open a PR, avoiding noisy multi-file renewals and rebase friction when package pins are unchanged.
scripts/ci/lockfile_renewal_diff_is_metadata_only.py (+ unit tests): Implements that normalization/compare logic and documents exit codes for shell use.
manifests/tools/update_imagestream_annotations_from_pylock.py: Expand abbreviated commit SHAs to full OIDs via the GitHub REST API when fetching from github.com, so git fetch does not fail treating short hex as a ref name; wire GITHUB_TOKEN into the ImageStream refresh step for API auth/rate limits. Documentation updated for fork CI and RHDS-only commits.

How Has This Been Tested?

./uv run pytest tests/unit/scripts/ci/test_lockfile_renewal_diff_is_metadata_only.py (unit coverage for metadata-only detection)

GitHub Actions: Ran Lock Files Renewal Action on the fork; job refresh-lock-files completed successfully — run log.

Self checklist (all need to be checked):

Ensure that you have run make test (gmake on macOS) before asking for review
Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Chores
- Enhanced automated workflow authentication and filtering to prevent unnecessary pull requests with only metadata changes, reducing noise in the development process.

openshift-ci · 2026-04-21T19:15:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiridanek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-21T19:15:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

The pull request modifies the piplock-renewal.yaml GitHub Actions workflow to add authentication and filtering capabilities. It exports GITHUB_TOKEN for GitHub API access during the ImageStream annotation refresh step. A new pre-PR filtering step detects and removes diffs caused by uv pip compile's ephemeral header metadata (--exclude-newer lines and created-at timestamps), restoring affected files to HEAD when only non-substantive changes remain. The workflow exits before PR creation if no meaningful diffs persist after filtering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Actionable Issues

Token exposure risk in workflow logs: Review whether GITHUB_TOKEN export could leak into workflow logs or artifact outputs. Ensure the token is marked as a secret in any subprocess output or logging.

Diff filtering logic validation: Verify the regex or pattern matching for detecting --exclude-newer and created-at lines is exhaustive and won't incorrectly filter legitimate changes across different uv pip compile versions.

Silent exit behavior: Confirm the conditional exit-before-PR-creation doesn't mask legitimate failures. Add explicit logging when the workflow exits due to filtered-out diffs for visibility.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	Title uses imperative mood and includes RHAIENG-4591 ticket reference with proper format, but contains a typo: 'it' should be 'if'.	Correct the typo in the title: change 'it changes' to 'if changes' so it reads 'ci(piplock-renewal) updates piplock-renewal to skip pr if changes are only timestamps'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	Description covers all required template sections with detailed implementation details, testing evidence, and merge criteria checkmarks; second self-checklist item is intentionally unchecked due to inapplicability to this repo (no downstream sync for workflows).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Branch Prefix Policy	✅ Passed	PR title starts with [RHAIENG-4591], a JIRA reference explicitly permitted for main branch; no branch prefix violation detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-04-21T19:17:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.59%. Comparing base (0214631) to head (0e6e33c).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #3419   +/-   ##
=====================================
  Coverage   3.59%   3.59%           
=====================================
  Files         29      29           
  Lines       3310    3310           
  Branches     527     527           
=====================================
  Hits         119     119           
  Misses      3189    3189           
  Partials       2       2

Flag	Coverage Δ
python	`3.59% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0214631...0e6e33c. Read the comment docs.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

manifests/tools/update_imagestream_annotations_from_pylock.py (1)
430-435: Resolve the canonical full SHA before checking local availability.

Line 433 checks the original short prefix first. A local object with the same 7+ char prefix can make this skip the canonical fetch and later read annotations from the wrong commit. Resolve against the canonical GitHub repo first, then use that resolved revision for both cat-file and git show.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
430 - 435, In _ensure_commit_from_canonical_upstream, resolve the canonical full
SHA from the remote before checking local availability: call
_git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id
(detail), use that resolved_rev for subsequent checks and for any git
show/cat-file operations instead of the original short prefix; only then call
_git_commit_exists(resolved_rev) and return (True, None) if present locally,
otherwise return the fetch result/detail so callers use the canonical full SHA.
Ensure you reference _ensure_commit_from_canonical_upstream,
_git_fetch_commit_from, and _git_commit_exists when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/piplock-renewal.yaml:
- Around line 118-121: The workflow currently restores only the worktree with
the command referencing git restore --source=HEAD --worktree . Add a step to
also reset the index/staging area so staged regenerated lockfiles are cleared
(e.g., run the equivalent of git restore --source=HEAD --staged . or git reset
-- .) after the worktree restore in the block that follows the successful
lockfile metadata-only check (the block that calls
scripts/ci/lockfile_renewal_diff_is_metadata_only.py) so both worktree and index
are returned to HEAD.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The current code unconditionally accepts a 40-char SHA
from body.get("sha") but doesn't verify it matches the requested short SHA/ref;
update the logic in the function that reads sha from body (the body.get("sha") /
sha variable) to also check that the returned full SHA startswith the requested
short SHA or ref (e.g., compare sha.lower().startswith(requested_ref.lower() or
short_sha.lower())), and only return sha.lower() when that prefix check passes;
otherwise treat it as a mismatch (raise/return None or handle as existing error
path).

---

Nitpick comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 430-435: In _ensure_commit_from_canonical_upstream, resolve the
canonical full SHA from the remote before checking local availability: call
_git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id
(detail), use that resolved_rev for subsequent checks and for any git
show/cat-file operations instead of the original short prefix; only then call
_git_commit_exists(resolved_rev) and return (True, None) if present locally,
otherwise return the fetch result/detail so callers use the canonical full SHA.
Ensure you reference _ensure_commit_from_canonical_upstream,
_git_fetch_commit_from, and _git_commit_exists when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 61a539fd-1849-41df-a53e-6447130375c1

📥 Commits

Reviewing files that changed from the base of the PR and between c8a27dd and 1154df8.

📒 Files selected for processing (4)

.github/workflows/piplock-renewal.yaml
manifests/tools/update_imagestream_annotations_from_pylock.py
scripts/ci/lockfile_renewal_diff_is_metadata_only.py
tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py

jiridanek · 2026-04-22T16:50:21Z

@coderabbitai can we do this in fewer lines of code? read suggestion below; then suggest 3 more alternatives

Proposed fix: git diff -I regex

Git (2.36+) supports -I <regex> which ignores hunks where all removed/added lines match the regex. ubuntu-latest (24.04) ships Git 2.43+, so this is available.

Replace the current check in the "Create Pull Request" step with:

- name: Create Pull Request env: GH_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }} run: | git add . if git diff --cached --quiet; then echo "No changes to commit." exit 0 fi # Skip PR when the only changes are --exclude-newer timestamps in pylock headers if git diff --cached -I '^#.*--exclude-newer' --quiet; then echo "Only --exclude-newer timestamp changes detected, skipping PR." exit 0 fi BRANCH_NAME="lockfile-update-$(date +%Y%m%d-%H%M)" ...

The logic becomes:

git diff --cached --quiet -- if no changes at all, bail (existing check)

git diff --cached -I '^#.*--exclude-newer' --quiet -- if changes exist but ALL of them are the timestamp comment line, bail (new check)

Otherwise, real changes exist, proceed with commit + PR

Why this works

The --exclude-newer timestamp only appears in a #-prefixed comment line in each pylock.toml

If actual packages changed, there will be additional hunks with [[packages]] / hash / URL changes that do NOT match the regex

ImageStream annotations, Dockerfiles, and kustomization only change when actual package versions change, so those would also survive the filter

No changes needed to the lock generator itself

@jiridanek, the git diff -I approach is the cleanest. Here are three more alternatives, ordered from most to least concise:

Alt 1 — grep pipeline on the patch (no git-version dependency)

if git diff --cached | grep -E '^[+-][^+-]' | grep -qvE '^[+-][[:space:]]*#.*--exclude-newer'; then echo "Substantive changes detected, creating PR." else echo "Only timestamp comments changed, skipping PR." exit 0 fi

How it works: Selects only added/removed content lines from the diff (^[+-][^+-] skips the ---/+++ header lines), then checks whether any of them do not match the timestamp comment. Pure POSIX shell, works on any git version. The double-negative (grep -qv) short-circuits on the first mismatch.

Risk: Regex must be tight enough; a line like +# --exclude-newer is ignored would be a false negative. Combine with -E '^[+-][[:space:]]*#[[:space:]]*uv pip compile.*--exclude-newer' to be more precise.

Alt 2 — sed normalize-in-place → git diff → restore

find . -name '*.toml' -exec sed -i \ 's/--exclude-newer=[^[:space:]]*/--exclude-newer=PLACEHOLDER/g' {} + if git diff --cached --quiet; then echo "Only timestamp changes, skipping PR." git restore --staged --worktree . # undo the sed mutation exit 0 fi git restore --staged --worktree . # restore before real commit path

How it works: Mutates the files in-place via sed, re-runs git diff --cached (which now sees normalized content), then restores unconditionally. Any real package change survives normalization and makes the diff non-empty.

Risk: The git restore must happen on both the exit-0 and the fall-through path, or you'll commit the sed-mutated files. A trap 'git restore --staged --worktree .' EXIT at the top of the step is safer.

Alt 3 — diff with process substitution (no file mutation)

all_metadata=true while IFS= read -r f; do if ! diff -q \ <(git show HEAD:"$f" | sed 's/--exclude-newer=[^[:space:]]*/TS/g') \ <(sed 's/--exclude-newer=[^[:space:]]*/TS/g' "$f") \ > /dev/null 2>&1; then all_metadata=false break fi done < <(git diff --cached --name-only) if $all_metadata; then echo "Only timestamp changes, skipping PR." exit 0 fi

How it works: For each staged file, streams both the HEAD blob and the worktree file through an inline sed normalization and compares them with diff -q. Nothing is written back to disk, so no restore step is needed.

Risk: Process substitution (<(…)) is bash-specific (not POSIX sh), but ubuntu-latest runners use bash by default. New files (not present in HEAD) will cause git show HEAD:"$f" to fail; add a guard if new lockfiles can appear.

Comparison

Approach Lines in YAML Git version Modifies files Handles created-at

git diff -I (proposed) ~3 ≥ 2.36 No Needs second -I flag

grep pipeline ~4 Any No Needs second grep -vE clause

sed + normalize-in-place ~7 Any Yes (restored) Extend the sed expression

diff + process substitution ~10 Any No Extend the sed expression

The -I proposal wins on brevity. To also handle created-at headers you'd chain a second flag:

git diff --cached \ -I '^#.*--exclude-newer' \ -I '^created-at[[:space:]]*=' \ --quiet

All four approaches let you delete scripts/ci/lockfile_renewal_diff_is_metadata_only.py and its test module entirely.

@coderabbitai the thing is, two files may have substantive updates, three others just modify timestamp; what we want is undo the three other files and commit the two with substantive update

@jiridanek, good clarification — the per-file mixed case is exactly why the whole-diff boolean isn't enough. Here are the approaches, starting with the most concise.

Primary: per-file git diff -I loop (replaces the Python script entirely)

- name: Discard ephemeral-only lockfile changes run: | while IFS= read -r f; do if git diff HEAD \ -I '^#.*--exclude-newer' \ -I '^created-at[[:space:]]*=' \ --quiet -- "$f"; then git restore --source=HEAD -- "$f" echo "Restored (metadata-only): $f" else echo "Keeping (substantive): $f" fi done < <(git diff --name-only HEAD)

git diff -I <pattern> --quiet -- "$f" exits 0 when every modified hunk in that file has all its changed lines matching the pattern (i.e. the file is effectively unchanged after ignoring those lines). Files with real package changes will have [[packages]]/hash hunks that don't match, so they survive. Metadata-only files get individually restored before the PR step.

Alt: modify the existing Python script for per-file restore

If you want to keep the Python+tests approach (for correctness guarantees on the normalization logic), change diff_vs_head_is_metadata_only to restore as it goes:

def restore_ephemeral_only_files(*, repo_root: Path) -> bool | None: """Restore files whose only changes are ephemeral metadata. Returns True if any were restored.""" paths = _changed_paths_vs_head(repo_root) if paths is None: return None restored_any = False for path in paths: if _is_binary_change(repo_root, path): continue # leave binary changes alone head_bytes = _blob_at_head(repo_root, path) if head_bytes is None: continue work_path = repo_root / path if not work_path.is_file(): continue try: head_text = head_bytes.decode("utf-8") work_text = work_path.read_text(encoding="utf-8") except UnicodeDecodeError: continue if normalize_ephemeral_lockfile_text(head_text) == normalize_ephemeral_lockfile_text(work_text): subprocess.run(["git", "restore", "--source=HEAD", "--worktree", "--", path], cwd=repo_root, check=True) print(f"Restored (metadata-only): {path}") restored_any = True else: print(f"Keeping (substantive): {path}") return restored_any

The workflow step then unconditionally calls the script (no if/then branching needed) and lets the subsequent git add . && git diff --cached --quiet guard handle the "nothing left to commit" case.

Comparison

Approach Lines in YAML Python script needed Per-file restore Testable normalization logic

git diff -I loop ~10 No ✅ ❌ (regex in YAML)

Modified Python script ~3 Yes (modified) ✅ ✅

The git diff -I loop is the most concise for the workflow, but if the created-at / --exclude-newer normalization patterns are likely to evolve, keeping them in the tested Python script is safer long-term.

Yeah agreed with @jiridanek, using git diff is good enough to find the diff, make code changes smaller.

@jiridanek, good instinct to reconsider. The existing issue #3179 in this repo already documents the same root cause precisely: uv pip compile without --upgrade is no longer a stable no-op in CI — when the AIPCC index publishes new rebuild wheels (e.g. pyarrow-23.0.1-9 alongside the existing -8), uv picks them up.

The issue already identified the correct fix: "always pass --exclude-newer to uv pip compile, in both CI checks and fresh locking" — for CI checks, reuse the timestamp from when the lockfile was last intentionally locked. The open question was where to store the timestamp — extracted from the lockfile header comment, embedded in a sidecar file, or passed as an env variable.

Here's how each of your proposed alternatives stacks up:

Option A — --custom-compile-command (suppress the header churn)

--custom-compile-command sets the header comment at the top of the output file generated by uv pip compile and is intended to reflect custom build scripts that wrap it. In practice, using it would replace the entire # uv pip compile ... --exclude-newer=<timestamp> line with a fixed string — the header becomes deterministic and stops churning. But it also means the timestamp is no longer stored in the lockfile at all, so CI has no way to read it back. You'd need a second storage location anyway.

Option B — .timestamp sidecar file (recommended)

Store a single exclude-newer = <iso-timestamp> value in a committed sidecar (e.g. pylock.timestamp or a repo-level uv-lock-timestamp.env). Both jobs read from it:

check-generated-code: reads the sidecar and passes --exclude-newer=$(cat pylock.timestamp) → deterministic, no spurious wheel pickups.

make refresh-lock-files: updates the sidecar to now, then compiles → only the sidecar file and actual package changes appear in the renewal PR.

Combined with --custom-compile-command make refresh-lock-files (to make the header a fixed string), the pylock.toml files become fully stable between renewals that don't change pins. This eliminates the need for the metadata-only diff detection script entirely.

Option C — exclude-newer in pyproject.toml

exclude-newer is also supported in pyproject.toml under [tool.uv], though when specified in persistent configuration, local date times are not allowed. This approach works for a single global cutoff but is awkward if different notebooks need different timestamps (e.g. after a partial upgrade).

Option D — UV_EXCLUDE_NEWER environment variable

UV_EXCLUDE_NEWER is equivalent to the --exclude-newer command-line argument. The sidecar approach can write to this env var rather than a CLI flag — either works.

Recommendation

The sidecar file (Option B) combined with --custom-compile-command is the most robust:

pylock.timestamp (one per lockfile group, or a single repo-level one) — committed, human-readable, machine-parseable.

CI check reads it: --exclude-newer=$(cat pylock.timestamp) — no new wheels, idempotent.

Renewal job sets it to now: date -u +%Y-%m-%dT%H:%M:%SZ > pylock.timestamp — only changes when a renewal actually runs.

--custom-compile-command strips the ephemeral timestamp from the pylock.toml header line — no more header churn.

This makes the entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script unnecessary.

@coderabbitai the timestamp written to file should not be the timestamp when the locking happened, but actually the timestamp when the most recent package in the lockfile was published

that's enough to keep the churn out; and that means we actually don't need to store the timestamp; it's enough if the ci script extracts/computes it and then uses it

I remember that lockfiles did not have the timestamps before I added the --before flag... but maybe that's actually something that has to do with the url, upload-time = 2026-04-01T00:22:14Z, that stuff was missing, maybe it's because now we say we want json in pylocks_generator.py.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find pylocks_generator.py fd pylocks_generator.py # Check for upload-time in any pylock files rg "upload.time" --type toml -l | head -5 # Check the pylocks_generator for JSON/format flags fd pylocks_generator.py --exec cat {} # Check uv pip compile invocations for relevant flags rg "exclude.newer|upload.time|--emit-index-annotation|--emit-find-links|--output-format" -n -C2 | head -60

Repository: opendatahub-io/notebooks
Length of output: 34638

🏁 Script executed:

#!/bin/bash # Check if pylock.toml files have upload-time fields rg "upload.time" --iglob "*.toml" -l | head -10 rg "upload.time" --iglob "*.toml" -m 3 | head -20 # Check a sample pylock.toml for its structure fd "pylock.cpu.toml" | head -3 | xargs -I{} head -30 {} # Check if uv pip compile pylock.toml format includes upload-time rg "upload.time|upload_time" scripts/pylocks_generator.py # Check for any max upload-time computation or related logic rg "upload.time|max_upload|newest" scripts/ -n -C2

Repository: opendatahub-io/notebooks
Length of output: 16529

@jiridanek, confirmed — upload-time is already present in every wheel entry in the committed pylock.toml files (it appeared because of the ?format=json addition that forces Pulp to return PEP 691 JSON):

{ url = "...aiohttp-3.13.5-2-cp312-cp312-linux_x86_64.whl", upload-time = 2026-04-01T00:22:14Z, ... }

Your instinct is exactly right. The current --exclude-newer header value is the wall-clock at lock time (2026-04-16T12:35:57Z), while the actual newest package in that lock uploaded on 2026-04-01T00:30:57Z — a ~15-day window where AIPCC can publish new rebuild wheels (e.g., -3, -4) that cause spurious CI diffs with no real dependency changes.

Using max(upload-time) across all entries is the tightest stable cutoff: it's derived from content, not from when you ran the command, and it only advances when a genuinely newer package lands in the lockfile.

Concrete change to pylocks_generator.py

Add a parser and wire it into resolve_exclude_newer:

_UPLOAD_TIME_RE = re.compile(r"\bupload-time\s*=\s*\"?(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z)\"?") def parse_max_upload_time_from_lockfile(path: Path) -> str | None: """Return the latest upload-time found in a pylock.toml, or None.""" if not path.is_file(): return None try: text = path.read_text(encoding="utf-8", errors="replace") except OSError: return None times = _UPLOAD_TIME_RE.findall(text) return max(times) if times else None def resolve_exclude_newer(lockfile: Path, *, ci_check: bool, live_timestamp: str) -> str: if not ci_check: return live_timestamp # Prefer max(upload-time) from content — stable and content-derived. # Fall back to the header comment timestamp, then to live_timestamp. return ( parse_max_upload_time_from_lockfile(lockfile) or parse_exclude_newer_from_lockfile(lockfile) or live_timestamp )

The regex handles both formats found in the repo: pylock.toml uses upload-time = 2026-04-01T00:22:14Z (unquoted), while uv.lock uses upload-time = "2026-04-01T00:22:14Z" (quoted).

What this eliminates

Problem Today With max(upload-time)

Header --exclude-newer churn Triggers lockfile_renewal_diff_is_metadata_only.py Header still changes, but CI never reads it

Rebuild wheels picked up in CI Yes, within the ~15-day gap No — cutoff is tight to actual content

Sidecar timestamp file needed No No

scripts/ci/lockfile_renewal_diff_is_metadata_only.py Required Can be deleted

The entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script and its test module become unnecessary once CI uses max(upload-time) as the cutoff. The --exclude-newer header comment in the generated file will still change on each renewal run (since the locking timestamp updates), but that's benign — no CI check reads it anymore.

One small caveat: public-index pylock.toml files (e.g., runtimes/rocm-tensorflow/ubi9-python-3.12/pylock.toml) also have upload-time from PyPI, so parse_max_upload_time_from_lockfile works for both RH-index and public-index paths.

@coderabbitai any reason not to load the lockfile as toml? why parse regex?

@jiridanek, there's no good reason — TOML parsing is strictly better here. I defaulted to regex out of caution about dependencies, but that's unnecessary since tomllib is stdlib from Python 3.11+.

With TOML parsing the upload-time values come back as proper datetime objects, so max() is a direct comparison rather than a lexicographic string sort (which happens to work for ISO-8601 but is fragile):

import tomllib from datetime import datetime, timezone from pathlib import Path def parse_max_upload_time_from_lockfile(path: Path) -> datetime | None: """Return the latest upload-time across all wheel entries in a pylock.toml.""" if not path.is_file(): return None try: with path.open("rb") as f: data = tomllib.load(f) except (OSError, tomllib.TOMLDecodeError): return None times: list[datetime] = [] for pkg in data.get("packages", []): for wheel in pkg.get("wheels", []): t = wheel.get("upload-time") if isinstance(t, datetime): times.append(t) return max(times) if times else None

Then at the call site:

max_t = parse_max_upload_time_from_lockfile(lockfile) exclude_newer = max_t.strftime("%Y-%m-%dT%H:%M:%SZ") if max_t else live_timestamp

The only edge case worth noting: tomllib requires the file to be opened in binary mode ("rb"), which the snippet above already does. For Python < 3.11 the drop-in backport is tomli (pip install tomli), but given this repo uses uv with modern runtimes, 3.11+ is safe to assume.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py`:
- Around line 76-86: Add a third param case to test_main_exit_code that mocks
md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns
the error exit code; update the param list to include pytest.param(None, 2,
id="git-or-read-error") and change the monkeypatch lambda in the test to return
None when main_code is None (use lambda repo_root: None) while keeping the
existing metadata-only and substantive cases and the md.ROOT patch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 73c1df7f-1187-4533-b159-11e149c6a74e

📥 Commits

Reviewing files that changed from the base of the PR and between 1154df8 and 3551576.

📒 Files selected for processing (4)

.github/workflows/piplock-renewal.yaml
manifests/tools/update_imagestream_annotations_from_pylock.py
scripts/ci/lockfile_renewal_diff_is_metadata_only.py
tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py

✅ Files skipped from review due to trivial changes (1)

scripts/ci/lockfile_renewal_diff_is_metadata_only.py

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/piplock-renewal.yaml
manifests/tools/update_imagestream_annotations_from_pylock.py

coderabbitai · 2026-04-22T18:12:01Z

+@pytest.mark.parametrize(
+    ("main_code", "expected"),
+    [
+        pytest.param(0, 0, id="metadata-only"),
+        pytest.param(1, 1, id="substantive"),
+    ],
+)
+def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None:
+    monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0)
+    monkeypatch.setattr(md, "ROOT", Path("/tmp"))
+    assert md.main() == expected


⚠️ Potential issue | 🟡 Minor

Cover the None result from diff_vs_head_is_metadata_only.

Line 84 collapses the mocked result to True/False, so the git/read-error branch in main() is untested. Add the None case directly.

Proposed test update

`@pytest.mark.parametrize`( - ("main_code", "expected"), + ("diff_result", "expected"), [ - pytest.param(0, 0, id="metadata-only"), - pytest.param(1, 1, id="substantive"), + pytest.param(True, 0, id="metadata-only"), + pytest.param(False, 1, id="substantive"), + pytest.param(None, 1, id="git-error"), ], ) -def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None: - monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0) +def test_main_exit_code( + monkeypatch: pytest.MonkeyPatch, diff_result: bool | None, expected: int +) -> None: + monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: diff_result) monkeypatch.setattr(md, "ROOT", Path("/tmp")) assert md.main() == expected

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py` around lines 76 - 86, Add a third param case to test_main_exit_code that mocks md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns the error exit code; update the param list to include pytest.param(None, 2, id="git-or-read-error") and change the monkeypatch lambda in the test to return None when main_code is None (use lambda repo_root: None) while keeping the existing metadata-only and substantive cases and the md.ROOT patch.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

manifests/tools/update_imagestream_annotations_from_pylock.py (2)

150-152: ⚠️ Potential issue | 🟠 Major

Reject GitHub refs that do not expand the requested SHA prefix.

commits/{ref} may resolve a SHA, branch, or tag ref, so a hex-like branch/tag can return an unrelated commit. Only accept body["sha"] when it is 40 hex chars and starts with rev_clean; GitHub documents ref as accepting commit SHA, branch, or tag refs: https://docs.github.com/en/rest/commits/commits#get-a-commit.

Proposed fix

         sha = body.get("sha")
-        if isinstance(sha, str) and len(sha) == 40:
-            return sha.lower()
+        if isinstance(sha, str):
+            sha = sha.lower()
+            if len(sha) == 40 and sha.startswith(rev_clean):
+                return sha

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
150 - 152, The code currently accepts any 40-char string from body.get("sha");
change the validation to also ensure the returned SHA is hex and begins with the
expected prefix variable rev_clean: after obtaining sha = body.get("sha"),
confirm isinstance(sha, str), len(sha) == 40, sha.lower().startswith(rev_clean),
and that sha consists only of hex characters (e.g., via a regex or all(c in
hexdigits)), then return sha.lower(); otherwise reject/return None so refs that
resolve to unrelated branch/tag SHAs are not accepted.

430-438: ⚠️ Potential issue | 🟠 Major

Expand short SHA to full OID before existence checks and git operations.

Short SHA collision risk is real. While _git_fetch_commit_from internally calls _rev_for_git_fetch to resolve short SHAs to full OIDs, _git_commit_exists(rev) at line 434 checks with the short SHA directly, and _git_show_first_existing(sha, ...) at line 742 uses the unexpanded short ref. If the short SHA collides with a different local object, cat-file -e and git show may match the wrong commit. Resolve once via _rev_for_git_fetch, capture the full OID, and propagate it through all git operations.

Proposed fix

-def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str | None]:
+def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str, str | None]:
     """Ensure ``rev`` is available for ``git show`` via ``git fetch`` from the variant canonical repo."""
     url = _CANONICAL_REPO_URL[variant]
-    if _git_commit_exists(rev):
-        return True, None
-    ok, detail = _git_fetch_commit_from(url, rev)
+    resolved_rev = _rev_for_git_fetch(url, rev)
+    if _git_commit_exists(resolved_rev):
+        return True, resolved_rev, None
+    ok, detail = _git_fetch_commit_from(url, resolved_rev)
     if ok:
-        return True, None
-    return False, detail
+        return True, resolved_rev, None
+    return False, resolved_rev, detail
@@
-                ok_upstream, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha)
+                ok_upstream, git_rev, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha)
@@
-                print(f"ok {path.name} tag {idx}: commit {sha} is available locally", file=sys.stderr)
-                shown = _git_show_first_existing(sha, rel_paths)
+                print(f"ok {path.name} tag {idx}: commit {git_rev} is available locally", file=sys.stderr)
+                shown = _git_show_first_existing(git_rev, rel_paths)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
430 - 438, The function _ensure_commit_from_canonical_upstream must expand a
possibly short rev to a full OID first using _rev_for_git_fetch (or the same
resolver used by _git_fetch_commit_from), then use that resolved full OID for
existence checks and fetches; change _ensure_commit_from_canonical_upstream to
call _rev_for_git_fetch(url, rev) to get full_rev, then call
_git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and
return/propagate the full OID to callers so _git_show_first_existing and any
other git operations always receive the expanded full OID.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 416-427: The git fetch subprocess call in
update_imagestream_annotations_from_pylock.py can hang; modify the
subprocess.run invocation (the call that builds cmd from
_rev_for_git_fetch/fetch_rev and runs git fetch) to pass a reasonable timeout
value and wrap the call to catch subprocess.TimeoutExpired; on timeout return
False with an error string that indicates a timeout and includes the quoted
command (same format as current error path), and preserve existing handling for
non-zero returncodes and stdout/stderr.

---

Duplicate comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The code currently accepts any 40-char string from
body.get("sha"); change the validation to also ensure the returned SHA is hex
and begins with the expected prefix variable rev_clean: after obtaining sha =
body.get("sha"), confirm isinstance(sha, str), len(sha) == 40,
sha.lower().startswith(rev_clean), and that sha consists only of hex characters
(e.g., via a regex or all(c in hexdigits)), then return sha.lower(); otherwise
reject/return None so refs that resolve to unrelated branch/tag SHAs are not
accepted.
- Around line 430-438: The function _ensure_commit_from_canonical_upstream must
expand a possibly short rev to a full OID first using _rev_for_git_fetch (or the
same resolver used by _git_fetch_commit_from), then use that resolved full OID
for existence checks and fetches; change _ensure_commit_from_canonical_upstream
to call _rev_for_git_fetch(url, rev) to get full_rev, then call
_git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and
return/propagate the full OID to callers so _git_show_first_existing and any
other git operations always receive the expanded full OID.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 111b4c2f-c328-49b7-b94d-4460f6007cc6

📥 Commits

Reviewing files that changed from the base of the PR and between 3551576 and 93008d5.

📒 Files selected for processing (2)

.github/workflows/piplock-renewal.yaml
manifests/tools/update_imagestream_annotations_from_pylock.py

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/piplock-renewal.yaml

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b930fa68-87aa-400e-8236-75f3c5de3d52

📥 Commits

Reviewing files that changed from the base of the PR and between 93008d5 and 1915616.

📒 Files selected for processing (2)

.github/workflows/piplock-renewal.yaml
manifests/tools/update_imagestream_annotations_from_pylock.py

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/piplock-renewal.yaml

ysok · 2026-04-23T18:58:38Z

@mtchoum1 I am not sure if you are still working on this PR, but for this PR it shouldn't be this complicated, here is my untested version. My Cursor prompt: Read the Jira https://redhat.atlassian.net/browse/RHAIENG-4591 and propose code changes.

openshift-ci Bot requested review from dibryant and jiridanek April 21, 2026 19:15

openshift-ci Bot added the size/l label Apr 21, 2026

github-actions Bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Apr 21, 2026

mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from f2dd98c to 7d2321d Compare April 21, 2026 19:15

openshift-ci Bot added size/l and removed size/l labels Apr 21, 2026

mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 7d2321d to 1154df8 Compare April 22, 2026 13:03