Skip to content

[RHAIENG-4591] ci(piplock-renewal) updates piplock-renewal to skip pr it changes are only timestamps#3419

Open
mtchoum1 wants to merge 1 commit intomainfrom
mtchoum1/timestamp-skip
Open

[RHAIENG-4591] ci(piplock-renewal) updates piplock-renewal to skip pr it changes are only timestamps#3419
mtchoum1 wants to merge 1 commit intomainfrom
mtchoum1/timestamp-skip

Conversation

@mtchoum1
Copy link
Copy Markdown
Contributor

@mtchoum1 mtchoum1 commented Apr 21, 2026

Description

  • piplock-renewal workflow 1: After make test, detect when the working tree differs from HEAD only because of ephemeral UV/pylock metadata (e.g. --exclude-newer=… timestamps). If so, restore the worktree and do not open a PR, avoiding noisy multi-file renewals and rebase friction when package pins are unchanged.
  • scripts/ci/lockfile_renewal_diff_is_metadata_only.py (+ unit tests): Implements that normalization/compare logic and documents exit codes for shell use.
  • manifests/tools/update_imagestream_annotations_from_pylock.py: Expand abbreviated commit SHAs to full OIDs via the GitHub REST API when fetching from github.com, so git fetch does not fail treating short hex as a ref name; wire GITHUB_TOKEN into the ImageStream refresh step for API auth/rate limits. Documentation updated for fork CI and RHDS-only commits.

How Has This Been Tested?

./uv run pytest tests/unit/scripts/ci/test_lockfile_renewal_diff_is_metadata_only.py (unit coverage for metadata-only detection)

GitHub Actions: Ran Lock Files Renewal Action on the fork; job refresh-lock-files completed successfully — run log.

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Chores
    • Enhanced automated workflow authentication and filtering to prevent unnecessary pull requests with only metadata changes, reducing noise in the development process.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiridanek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from dibryant and jiridanek April 21, 2026 19:15
@openshift-ci openshift-ci Bot added the size/l label Apr 21, 2026
@github-actions github-actions Bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Apr 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The pull request modifies the piplock-renewal.yaml GitHub Actions workflow to add authentication and filtering capabilities. It exports GITHUB_TOKEN for GitHub API access during the ImageStream annotation refresh step. A new pre-PR filtering step detects and removes diffs caused by uv pip compile's ephemeral header metadata (--exclude-newer lines and created-at timestamps), restoring affected files to HEAD when only non-substantive changes remain. The workflow exits before PR creation if no meaningful diffs persist after filtering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes


Actionable Issues

Token exposure risk in workflow logs: Review whether GITHUB_TOKEN export could leak into workflow logs or artifact outputs. Ensure the token is marked as a secret in any subprocess output or logging.

Diff filtering logic validation: Verify the regex or pattern matching for detecting --exclude-newer and created-at lines is exhaustive and won't incorrectly filter legitimate changes across different uv pip compile versions.

Silent exit behavior: Confirm the conditional exit-before-PR-creation doesn't mask legitimate failures. Add explicit logging when the workflow exits due to filtered-out diffs for visibility.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title uses imperative mood and includes RHAIENG-4591 ticket reference with proper format, but contains a typo: 'it' should be 'if'. Correct the typo in the title: change 'it changes' to 'if changes' so it reads 'ci(piplock-renewal) updates piplock-renewal to skip pr if changes are only timestamps'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed Description covers all required template sections with detailed implementation details, testing evidence, and merge criteria checkmarks; second self-checklist item is intentionally unchecked due to inapplicability to this repo (no downstream sync for workflows).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Branch Prefix Policy ✅ Passed PR title starts with [RHAIENG-4591], a JIRA reference explicitly permitted for main branch; no branch prefix violation detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from f2dd98c to 7d2321d Compare April 21, 2026 19:15
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 21, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.59%. Comparing base (0214631) to head (0e6e33c).
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #3419   +/-   ##
=====================================
  Coverage   3.59%   3.59%           
=====================================
  Files         29      29           
  Lines       3310    3310           
  Branches     527     527           
=====================================
  Hits         119     119           
  Misses      3189    3189           
  Partials       2       2           
Flag Coverage Δ
python 3.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0214631...0e6e33c. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 7d2321d to 1154df8 Compare April 22, 2026 13:03
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
manifests/tools/update_imagestream_annotations_from_pylock.py (1)

430-435: Resolve the canonical full SHA before checking local availability.

Line 433 checks the original short prefix first. A local object with the same 7+ char prefix can make this skip the canonical fetch and later read annotations from the wrong commit. Resolve against the canonical GitHub repo first, then use that resolved revision for both cat-file and git show.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
430 - 435, In _ensure_commit_from_canonical_upstream, resolve the canonical full
SHA from the remote before checking local availability: call
_git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id
(detail), use that resolved_rev for subsequent checks and for any git
show/cat-file operations instead of the original short prefix; only then call
_git_commit_exists(resolved_rev) and return (True, None) if present locally,
otherwise return the fetch result/detail so callers use the canonical full SHA.
Ensure you reference _ensure_commit_from_canonical_upstream,
_git_fetch_commit_from, and _git_commit_exists when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/piplock-renewal.yaml:
- Around line 118-121: The workflow currently restores only the worktree with
the command referencing git restore --source=HEAD --worktree . Add a step to
also reset the index/staging area so staged regenerated lockfiles are cleared
(e.g., run the equivalent of git restore --source=HEAD --staged . or git reset
-- .) after the worktree restore in the block that follows the successful
lockfile metadata-only check (the block that calls
scripts/ci/lockfile_renewal_diff_is_metadata_only.py) so both worktree and index
are returned to HEAD.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The current code unconditionally accepts a 40-char SHA
from body.get("sha") but doesn't verify it matches the requested short SHA/ref;
update the logic in the function that reads sha from body (the body.get("sha") /
sha variable) to also check that the returned full SHA startswith the requested
short SHA or ref (e.g., compare sha.lower().startswith(requested_ref.lower() or
short_sha.lower())), and only return sha.lower() when that prefix check passes;
otherwise treat it as a mismatch (raise/return None or handle as existing error
path).

---

Nitpick comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 430-435: In _ensure_commit_from_canonical_upstream, resolve the
canonical full SHA from the remote before checking local availability: call
_git_fetch_commit_from(url, rev) first and, if it returns a resolved commit id
(detail), use that resolved_rev for subsequent checks and for any git
show/cat-file operations instead of the original short prefix; only then call
_git_commit_exists(resolved_rev) and return (True, None) if present locally,
otherwise return the fetch result/detail so callers use the canonical full SHA.
Ensure you reference _ensure_commit_from_canonical_upstream,
_git_fetch_commit_from, and _git_commit_exists when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 61a539fd-1849-41df-a53e-6447130375c1

📥 Commits

Reviewing files that changed from the base of the PR and between c8a27dd and 1154df8.

📒 Files selected for processing (4)
  • .github/workflows/piplock-renewal.yaml
  • manifests/tools/update_imagestream_annotations_from_pylock.py
  • scripts/ci/lockfile_renewal_diff_is_metadata_only.py
  • tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py

Comment thread .github/workflows/piplock-renewal.yaml Outdated
Comment thread manifests/tools/update_imagestream_annotations_from_pylock.py Outdated
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 1154df8 to cf369cb Compare April 22, 2026 13:22
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 22, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from cf369cb to 7aa0560 Compare April 22, 2026 13:23
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 22, 2026
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai can we do this in fewer lines of code? read suggestion below; then suggest 3 more alternatives

Proposed fix: git diff -I regex

Git (2.36+) supports -I <regex> which ignores hunks where all removed/added lines match the regex. ubuntu-latest (24.04) ships Git 2.43+, so this is available.

Replace the current check in the "Create Pull Request" step with:

      - name: Create Pull Request
        env:
          GH_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}
        run: |
          git add .
          if git diff --cached --quiet; then
            echo "No changes to commit."
            exit 0
          fi

          # Skip PR when the only changes are --exclude-newer timestamps in pylock headers
          if git diff --cached -I '^#.*--exclude-newer' --quiet; then
            echo "Only --exclude-newer timestamp changes detected, skipping PR."
            exit 0
          fi

          BRANCH_NAME="lockfile-update-$(date +%Y%m%d-%H%M)"
          ...

The logic becomes:

  1. git diff --cached --quiet -- if no changes at all, bail (existing check)
  2. git diff --cached -I '^#.*--exclude-newer' --quiet -- if changes exist but ALL of them are the timestamp comment line, bail (new check)
  3. Otherwise, real changes exist, proceed with commit + PR

Why this works

  • The --exclude-newer timestamp only appears in a #-prefixed comment line in each pylock.toml
  • If actual packages changed, there will be additional hunks with [[packages]] / hash / URL changes that do NOT match the regex
  • ImageStream annotations, Dockerfiles, and kustomization only change when actual package versions change, so those would also survive the filter
  • No changes needed to the lock generator itself

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, the git diff -I approach is the cleanest. Here are three more alternatives, ordered from most to least concise:


Alt 1 — grep pipeline on the patch (no git-version dependency)

if git diff --cached | grep -E '^[+-][^+-]' | grep -qvE '^[+-][[:space:]]*#.*--exclude-newer'; then
  echo "Substantive changes detected, creating PR."
else
  echo "Only timestamp comments changed, skipping PR."
  exit 0
fi

How it works: Selects only added/removed content lines from the diff (^[+-][^+-] skips the ---/+++ header lines), then checks whether any of them do not match the timestamp comment. Pure POSIX shell, works on any git version. The double-negative (grep -qv) short-circuits on the first mismatch.

Risk: Regex must be tight enough; a line like +# --exclude-newer is ignored would be a false negative. Combine with -E '^[+-][[:space:]]*#[[:space:]]*uv pip compile.*--exclude-newer' to be more precise.


Alt 2 — sed normalize-in-place → git diff → restore

find . -name '*.toml' -exec sed -i \
  's/--exclude-newer=[^[:space:]]*/--exclude-newer=PLACEHOLDER/g' {} +
if git diff --cached --quiet; then
  echo "Only timestamp changes, skipping PR."
  git restore --staged --worktree .  # undo the sed mutation
  exit 0
fi
git restore --staged --worktree .   # restore before real commit path

How it works: Mutates the files in-place via sed, re-runs git diff --cached (which now sees normalized content), then restores unconditionally. Any real package change survives normalization and makes the diff non-empty.

Risk: The git restore must happen on both the exit-0 and the fall-through path, or you'll commit the sed-mutated files. A trap 'git restore --staged --worktree .' EXIT at the top of the step is safer.


Alt 3 — diff with process substitution (no file mutation)

all_metadata=true
while IFS= read -r f; do
  if ! diff -q \
       <(git show HEAD:"$f" | sed 's/--exclude-newer=[^[:space:]]*/TS/g') \
       <(sed 's/--exclude-newer=[^[:space:]]*/TS/g' "$f") \
       > /dev/null 2>&1; then
    all_metadata=false
    break
  fi
done < <(git diff --cached --name-only)

if $all_metadata; then
  echo "Only timestamp changes, skipping PR."
  exit 0
fi

How it works: For each staged file, streams both the HEAD blob and the worktree file through an inline sed normalization and compares them with diff -q. Nothing is written back to disk, so no restore step is needed.

Risk: Process substitution (<(…)) is bash-specific (not POSIX sh), but ubuntu-latest runners use bash by default. New files (not present in HEAD) will cause git show HEAD:"$f" to fail; add a guard if new lockfiles can appear.


Comparison

Approach Lines in YAML Git version Modifies files Handles created-at
git diff -I (proposed) ~3 ≥ 2.36 No Needs second -I flag
grep pipeline ~4 Any No Needs second grep -vE clause
sed + normalize-in-place ~7 Any Yes (restored) Extend the sed expression
diff + process substitution ~10 Any No Extend the sed expression

The -I proposal wins on brevity. To also handle created-at headers you'd chain a second flag:

git diff --cached \
  -I '^#.*--exclude-newer' \
  -I '^created-at[[:space:]]*=' \
  --quiet

All four approaches let you delete scripts/ci/lockfile_renewal_diff_is_metadata_only.py and its test module entirely.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai the thing is, two files may have substantive updates, three others just modify timestamp; what we want is undo the three other files and commit the two with substantive update

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, good clarification — the per-file mixed case is exactly why the whole-diff boolean isn't enough. Here are the approaches, starting with the most concise.


Primary: per-file git diff -I loop (replaces the Python script entirely)

- name: Discard ephemeral-only lockfile changes
  run: |
    while IFS= read -r f; do
      if git diff HEAD \
           -I '^#.*--exclude-newer' \
           -I '^created-at[[:space:]]*=' \
           --quiet -- "$f"; then
        git restore --source=HEAD -- "$f"
        echo "Restored (metadata-only): $f"
      else
        echo "Keeping (substantive): $f"
      fi
    done < <(git diff --name-only HEAD)

git diff -I <pattern> --quiet -- "$f" exits 0 when every modified hunk in that file has all its changed lines matching the pattern (i.e. the file is effectively unchanged after ignoring those lines). Files with real package changes will have [[packages]]/hash hunks that don't match, so they survive. Metadata-only files get individually restored before the PR step.


Alt: modify the existing Python script for per-file restore

If you want to keep the Python+tests approach (for correctness guarantees on the normalization logic), change diff_vs_head_is_metadata_only to restore as it goes:

def restore_ephemeral_only_files(*, repo_root: Path) -> bool | None:
    """Restore files whose only changes are ephemeral metadata. Returns True if any were restored."""
    paths = _changed_paths_vs_head(repo_root)
    if paths is None:
        return None
    restored_any = False
    for path in paths:
        if _is_binary_change(repo_root, path):
            continue  # leave binary changes alone
        head_bytes = _blob_at_head(repo_root, path)
        if head_bytes is None:
            continue
        work_path = repo_root / path
        if not work_path.is_file():
            continue
        try:
            head_text = head_bytes.decode("utf-8")
            work_text = work_path.read_text(encoding="utf-8")
        except UnicodeDecodeError:
            continue
        if normalize_ephemeral_lockfile_text(head_text) == normalize_ephemeral_lockfile_text(work_text):
            subprocess.run(["git", "restore", "--source=HEAD", "--worktree", "--", path], cwd=repo_root, check=True)
            print(f"Restored (metadata-only): {path}")
            restored_any = True
        else:
            print(f"Keeping (substantive): {path}")
    return restored_any

The workflow step then unconditionally calls the script (no if/then branching needed) and lets the subsequent git add . && git diff --cached --quiet guard handle the "nothing left to commit" case.


Comparison

Approach Lines in YAML Python script needed Per-file restore Testable normalization logic
git diff -I loop ~10 No ❌ (regex in YAML)
Modified Python script ~3 Yes (modified)

The git diff -I loop is the most concise for the workflow, but if the created-at / --exclude-newer normalization patterns are likely to evolve, keeping them in the tested Python script is safer long-term.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed with @jiridanek, using git diff is good enough to find the diff, make code changes smaller.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, good instinct to reconsider. The existing issue #3179 in this repo already documents the same root cause precisely: uv pip compile without --upgrade is no longer a stable no-op in CI — when the AIPCC index publishes new rebuild wheels (e.g. pyarrow-23.0.1-9 alongside the existing -8), uv picks them up.

The issue already identified the correct fix: "always pass --exclude-newer to uv pip compile, in both CI checks and fresh locking" — for CI checks, reuse the timestamp from when the lockfile was last intentionally locked. The open question was where to store the timestamp — extracted from the lockfile header comment, embedded in a sidecar file, or passed as an env variable.

Here's how each of your proposed alternatives stacks up:


Option A — --custom-compile-command (suppress the header churn)

--custom-compile-command sets the header comment at the top of the output file generated by uv pip compile and is intended to reflect custom build scripts that wrap it. In practice, using it would replace the entire # uv pip compile ... --exclude-newer=<timestamp> line with a fixed string — the header becomes deterministic and stops churning. But it also means the timestamp is no longer stored in the lockfile at all, so CI has no way to read it back. You'd need a second storage location anyway.


Option B — .timestamp sidecar file (recommended)

Store a single exclude-newer = <iso-timestamp> value in a committed sidecar (e.g. pylock.timestamp or a repo-level uv-lock-timestamp.env). Both jobs read from it:

  • check-generated-code: reads the sidecar and passes --exclude-newer=$(cat pylock.timestamp) → deterministic, no spurious wheel pickups.
  • make refresh-lock-files: updates the sidecar to now, then compiles → only the sidecar file and actual package changes appear in the renewal PR.

Combined with --custom-compile-command make refresh-lock-files (to make the header a fixed string), the pylock.toml files become fully stable between renewals that don't change pins. This eliminates the need for the metadata-only diff detection script entirely.


Option C — exclude-newer in pyproject.toml

exclude-newer is also supported in pyproject.toml under [tool.uv], though when specified in persistent configuration, local date times are not allowed. This approach works for a single global cutoff but is awkward if different notebooks need different timestamps (e.g. after a partial upgrade).


Option D — UV_EXCLUDE_NEWER environment variable

UV_EXCLUDE_NEWER is equivalent to the --exclude-newer command-line argument. The sidecar approach can write to this env var rather than a CLI flag — either works.


Recommendation

The sidecar file (Option B) combined with --custom-compile-command is the most robust:

  1. pylock.timestamp (one per lockfile group, or a single repo-level one) — committed, human-readable, machine-parseable.
  2. CI check reads it: --exclude-newer=$(cat pylock.timestamp) — no new wheels, idempotent.
  3. Renewal job sets it to now: date -u +%Y-%m-%dT%H:%M:%SZ > pylock.timestamp — only changes when a renewal actually runs.
  4. --custom-compile-command strips the ephemeral timestamp from the pylock.toml header line — no more header churn.

This makes the entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script unnecessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai the timestamp written to file should not be the timestamp when the locking happened, but actually the timestamp when the most recent package in the lockfile was published

that's enough to keep the churn out; and that means we actually don't need to store the timestamp; it's enough if the ci script extracts/computes it and then uses it

I remember that lockfiles did not have the timestamps before I added the --before flag... but maybe that's actually something that has to do with the url, upload-time = 2026-04-01T00:22:14Z, that stuff was missing, maybe it's because now we say we want json in pylocks_generator.py.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find pylocks_generator.py
fd pylocks_generator.py

# Check for upload-time in any pylock files
rg "upload.time" --type toml -l | head -5

# Check the pylocks_generator for JSON/format flags
fd pylocks_generator.py --exec cat {}

# Check uv pip compile invocations for relevant flags
rg "exclude.newer|upload.time|--emit-index-annotation|--emit-find-links|--output-format" -n -C2 | head -60

Repository: opendatahub-io/notebooks
Length of output: 34638


🏁 Script executed:

#!/bin/bash
# Check if pylock.toml files have upload-time fields
rg "upload.time" --iglob "*.toml" -l | head -10
rg "upload.time" --iglob "*.toml" -m 3 | head -20

# Check a sample pylock.toml for its structure
fd "pylock.cpu.toml" | head -3 | xargs -I{} head -30 {}

# Check if uv pip compile pylock.toml format includes upload-time
rg "upload.time|upload_time" scripts/pylocks_generator.py

# Check for any max upload-time computation or related logic
rg "upload.time|max_upload|newest" scripts/ -n -C2

Repository: opendatahub-io/notebooks
Length of output: 16529


@jiridanek, confirmed — upload-time is already present in every wheel entry in the committed pylock.toml files (it appeared because of the ?format=json addition that forces Pulp to return PEP 691 JSON):

{ url = "...aiohttp-3.13.5-2-cp312-cp312-linux_x86_64.whl", upload-time = 2026-04-01T00:22:14Z, ... }

Your instinct is exactly right. The current --exclude-newer header value is the wall-clock at lock time (2026-04-16T12:35:57Z), while the actual newest package in that lock uploaded on 2026-04-01T00:30:57Z — a ~15-day window where AIPCC can publish new rebuild wheels (e.g., -3, -4) that cause spurious CI diffs with no real dependency changes.

Using max(upload-time) across all entries is the tightest stable cutoff: it's derived from content, not from when you ran the command, and it only advances when a genuinely newer package lands in the lockfile.

Concrete change to pylocks_generator.py

Add a parser and wire it into resolve_exclude_newer:

_UPLOAD_TIME_RE = re.compile(r"\bupload-time\s*=\s*\"?(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z)\"?")

def parse_max_upload_time_from_lockfile(path: Path) -> str | None:
    """Return the latest upload-time found in a pylock.toml, or None."""
    if not path.is_file():
        return None
    try:
        text = path.read_text(encoding="utf-8", errors="replace")
    except OSError:
        return None
    times = _UPLOAD_TIME_RE.findall(text)
    return max(times) if times else None


def resolve_exclude_newer(lockfile: Path, *, ci_check: bool, live_timestamp: str) -> str:
    if not ci_check:
        return live_timestamp
    # Prefer max(upload-time) from content — stable and content-derived.
    # Fall back to the header comment timestamp, then to live_timestamp.
    return (
        parse_max_upload_time_from_lockfile(lockfile)
        or parse_exclude_newer_from_lockfile(lockfile)
        or live_timestamp
    )

The regex handles both formats found in the repo: pylock.toml uses upload-time = 2026-04-01T00:22:14Z (unquoted), while uv.lock uses upload-time = "2026-04-01T00:22:14Z" (quoted).

What this eliminates

Problem Today With max(upload-time)
Header --exclude-newer churn Triggers lockfile_renewal_diff_is_metadata_only.py Header still changes, but CI never reads it
Rebuild wheels picked up in CI Yes, within the ~15-day gap No — cutoff is tight to actual content
Sidecar timestamp file needed No No
scripts/ci/lockfile_renewal_diff_is_metadata_only.py Required Can be deleted

The entire scripts/ci/lockfile_renewal_diff_is_metadata_only.py script and its test module become unnecessary once CI uses max(upload-time) as the cutoff. The --exclude-newer header comment in the generated file will still change on each renewal run (since the locking timestamp updates), but that's benign — no CI check reads it anymore.

One small caveat: public-index pylock.toml files (e.g., runtimes/rocm-tensorflow/ubi9-python-3.12/pylock.toml) also have upload-time from PyPI, so parse_max_upload_time_from_lockfile works for both RH-index and public-index paths.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai any reason not to load the lockfile as toml? why parse regex?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, there's no good reason — TOML parsing is strictly better here. I defaulted to regex out of caution about dependencies, but that's unnecessary since tomllib is stdlib from Python 3.11+.

With TOML parsing the upload-time values come back as proper datetime objects, so max() is a direct comparison rather than a lexicographic string sort (which happens to work for ISO-8601 but is fragile):

import tomllib
from datetime import datetime, timezone
from pathlib import Path

def parse_max_upload_time_from_lockfile(path: Path) -> datetime | None:
    """Return the latest upload-time across all wheel entries in a pylock.toml."""
    if not path.is_file():
        return None
    try:
        with path.open("rb") as f:
            data = tomllib.load(f)
    except (OSError, tomllib.TOMLDecodeError):
        return None

    times: list[datetime] = []
    for pkg in data.get("packages", []):
        for wheel in pkg.get("wheels", []):
            t = wheel.get("upload-time")
            if isinstance(t, datetime):
                times.append(t)
    return max(times) if times else None

Then at the call site:

max_t = parse_max_upload_time_from_lockfile(lockfile)
exclude_newer = max_t.strftime("%Y-%m-%dT%H:%M:%SZ") if max_t else live_timestamp

The only edge case worth noting: tomllib requires the file to be opened in binary mode ("rb"), which the snippet above already does. For Python < 3.11 the drop-in backport is tomli (pip install tomli), but given this repo uses uv with modern runtimes, 3.11+ is safe to assume.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 7aa0560 to 3551576 Compare April 22, 2026 18:09
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 22, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 3551576 to 82cc4e7 Compare April 22, 2026 18:10
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 22, 2026
@openshift-ci openshift-ci Bot added the size/l label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py`:
- Around line 76-86: Add a third param case to test_main_exit_code that mocks
md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns
the error exit code; update the param list to include pytest.param(None, 2,
id="git-or-read-error") and change the monkeypatch lambda in the test to return
None when main_code is None (use lambda repo_root: None) while keeping the
existing metadata-only and substantive cases and the md.ROOT patch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 73c1df7f-1187-4533-b159-11e149c6a74e

📥 Commits

Reviewing files that changed from the base of the PR and between 1154df8 and 3551576.

📒 Files selected for processing (4)
  • .github/workflows/piplock-renewal.yaml
  • manifests/tools/update_imagestream_annotations_from_pylock.py
  • scripts/ci/lockfile_renewal_diff_is_metadata_only.py
  • tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py
✅ Files skipped from review due to trivial changes (1)
  • scripts/ci/lockfile_renewal_diff_is_metadata_only.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/piplock-renewal.yaml
  • manifests/tools/update_imagestream_annotations_from_pylock.py

Comment on lines +76 to +86
@pytest.mark.parametrize(
("main_code", "expected"),
[
pytest.param(0, 0, id="metadata-only"),
pytest.param(1, 1, id="substantive"),
],
)
def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None:
monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0)
monkeypatch.setattr(md, "ROOT", Path("/tmp"))
assert md.main() == expected
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cover the None result from diff_vs_head_is_metadata_only.

Line 84 collapses the mocked result to True/False, so the git/read-error branch in main() is untested. Add the None case directly.

Proposed test update
 `@pytest.mark.parametrize`(
-    ("main_code", "expected"),
+    ("diff_result", "expected"),
     [
-        pytest.param(0, 0, id="metadata-only"),
-        pytest.param(1, 1, id="substantive"),
+        pytest.param(True, 0, id="metadata-only"),
+        pytest.param(False, 1, id="substantive"),
+        pytest.param(None, 1, id="git-error"),
     ],
 )
-def test_main_exit_code(monkeypatch: pytest.MonkeyPatch, main_code: int, expected: int) -> None:
-    monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: main_code == 0)
+def test_main_exit_code(
+    monkeypatch: pytest.MonkeyPatch, diff_result: bool | None, expected: int
+) -> None:
+    monkeypatch.setattr(md, "diff_vs_head_is_metadata_only", lambda repo_root: diff_result)
     monkeypatch.setattr(md, "ROOT", Path("/tmp"))
     assert md.main() == expected
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/scripts/test_lockfile_renewal_diff_is_metadata_only.py` around
lines 76 - 86, Add a third param case to test_main_exit_code that mocks
md.diff_vs_head_is_metadata_only to return None and asserts md.main() returns
the error exit code; update the param list to include pytest.param(None, 2,
id="git-or-read-error") and change the monkeypatch lambda in the test to return
None when main_code is None (use lambda repo_root: None) while keeping the
existing metadata-only and substantive cases and the md.ROOT patch.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 8ad4dec to 93008d5 Compare April 23, 2026 15:26
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
manifests/tools/update_imagestream_annotations_from_pylock.py (2)

150-152: ⚠️ Potential issue | 🟠 Major

Reject GitHub refs that do not expand the requested SHA prefix.

commits/{ref} may resolve a SHA, branch, or tag ref, so a hex-like branch/tag can return an unrelated commit. Only accept body["sha"] when it is 40 hex chars and starts with rev_clean; GitHub documents ref as accepting commit SHA, branch, or tag refs: https://docs.github.com/en/rest/commits/commits#get-a-commit.

Proposed fix
         sha = body.get("sha")
-        if isinstance(sha, str) and len(sha) == 40:
-            return sha.lower()
+        if isinstance(sha, str):
+            sha = sha.lower()
+            if len(sha) == 40 and sha.startswith(rev_clean):
+                return sha
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
150 - 152, The code currently accepts any 40-char string from body.get("sha");
change the validation to also ensure the returned SHA is hex and begins with the
expected prefix variable rev_clean: after obtaining sha = body.get("sha"),
confirm isinstance(sha, str), len(sha) == 40, sha.lower().startswith(rev_clean),
and that sha consists only of hex characters (e.g., via a regex or all(c in
hexdigits)), then return sha.lower(); otherwise reject/return None so refs that
resolve to unrelated branch/tag SHAs are not accepted.

430-438: ⚠️ Potential issue | 🟠 Major

Expand short SHA to full OID before existence checks and git operations.

Short SHA collision risk is real. While _git_fetch_commit_from internally calls _rev_for_git_fetch to resolve short SHAs to full OIDs, _git_commit_exists(rev) at line 434 checks with the short SHA directly, and _git_show_first_existing(sha, ...) at line 742 uses the unexpanded short ref. If the short SHA collides with a different local object, cat-file -e and git show may match the wrong commit. Resolve once via _rev_for_git_fetch, capture the full OID, and propagate it through all git operations.

Proposed fix
-def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str | None]:
+def _ensure_commit_from_canonical_upstream(variant: str, rev: str) -> tuple[bool, str, str | None]:
     """Ensure ``rev`` is available for ``git show`` via ``git fetch`` from the variant canonical repo."""
     url = _CANONICAL_REPO_URL[variant]
-    if _git_commit_exists(rev):
-        return True, None
-    ok, detail = _git_fetch_commit_from(url, rev)
+    resolved_rev = _rev_for_git_fetch(url, rev)
+    if _git_commit_exists(resolved_rev):
+        return True, resolved_rev, None
+    ok, detail = _git_fetch_commit_from(url, resolved_rev)
     if ok:
-        return True, None
-    return False, detail
+        return True, resolved_rev, None
+    return False, resolved_rev, detail
@@
-                ok_upstream, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha)
+                ok_upstream, git_rev, fetch_err = _ensure_commit_from_canonical_upstream(variant, sha)
@@
-                print(f"ok {path.name} tag {idx}: commit {sha} is available locally", file=sys.stderr)
-                shown = _git_show_first_existing(sha, rel_paths)
+                print(f"ok {path.name} tag {idx}: commit {git_rev} is available locally", file=sys.stderr)
+                shown = _git_show_first_existing(git_rev, rel_paths)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/tools/update_imagestream_annotations_from_pylock.py` around lines
430 - 438, The function _ensure_commit_from_canonical_upstream must expand a
possibly short rev to a full OID first using _rev_for_git_fetch (or the same
resolver used by _git_fetch_commit_from), then use that resolved full OID for
existence checks and fetches; change _ensure_commit_from_canonical_upstream to
call _rev_for_git_fetch(url, rev) to get full_rev, then call
_git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and
return/propagate the full OID to callers so _git_show_first_existing and any
other git operations always receive the expanded full OID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 416-427: The git fetch subprocess call in
update_imagestream_annotations_from_pylock.py can hang; modify the
subprocess.run invocation (the call that builds cmd from
_rev_for_git_fetch/fetch_rev and runs git fetch) to pass a reasonable timeout
value and wrap the call to catch subprocess.TimeoutExpired; on timeout return
False with an error string that indicates a timeout and includes the quoted
command (same format as current error path), and preserve existing handling for
non-zero returncodes and stdout/stderr.

---

Duplicate comments:
In `@manifests/tools/update_imagestream_annotations_from_pylock.py`:
- Around line 150-152: The code currently accepts any 40-char string from
body.get("sha"); change the validation to also ensure the returned SHA is hex
and begins with the expected prefix variable rev_clean: after obtaining sha =
body.get("sha"), confirm isinstance(sha, str), len(sha) == 40,
sha.lower().startswith(rev_clean), and that sha consists only of hex characters
(e.g., via a regex or all(c in hexdigits)), then return sha.lower(); otherwise
reject/return None so refs that resolve to unrelated branch/tag SHAs are not
accepted.
- Around line 430-438: The function _ensure_commit_from_canonical_upstream must
expand a possibly short rev to a full OID first using _rev_for_git_fetch (or the
same resolver used by _git_fetch_commit_from), then use that resolved full OID
for existence checks and fetches; change _ensure_commit_from_canonical_upstream
to call _rev_for_git_fetch(url, rev) to get full_rev, then call
_git_commit_exists(full_rev) and _git_fetch_commit_from(url, full_rev) and
return/propagate the full OID to callers so _git_show_first_existing and any
other git operations always receive the expanded full OID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 111b4c2f-c328-49b7-b94d-4460f6007cc6

📥 Commits

Reviewing files that changed from the base of the PR and between 3551576 and 93008d5.

📒 Files selected for processing (2)
  • .github/workflows/piplock-renewal.yaml
  • manifests/tools/update_imagestream_annotations_from_pylock.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/piplock-renewal.yaml

Comment thread manifests/tools/update_imagestream_annotations_from_pylock.py Outdated
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 93008d5 to 1915616 Compare April 23, 2026 17:13
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b930fa68-87aa-400e-8236-75f3c5de3d52

📥 Commits

Reviewing files that changed from the base of the PR and between 93008d5 and 1915616.

📒 Files selected for processing (2)
  • .github/workflows/piplock-renewal.yaml
  • manifests/tools/update_imagestream_annotations_from_pylock.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/piplock-renewal.yaml

Comment thread manifests/tools/update_imagestream_annotations_from_pylock.py Outdated
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 1915616 to 4eb4e52 Compare April 23, 2026 17:32
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 23, 2026
@ysok
Copy link
Copy Markdown
Contributor

ysok commented Apr 23, 2026

@mtchoum1 I am not sure if you are still working on this PR, but for this PR it shouldn't be this complicated, here is my untested version. My Cursor prompt: Read the Jira https://redhat.atlassian.net/browse/RHAIENG-4591 and propose code changes.

image

@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 4eb4e52 to 7635098 Compare April 24, 2026 15:09
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from 7635098 to b48f64c Compare April 24, 2026 15:12
@openshift-ci openshift-ci Bot added size/xs and removed size/m labels Apr 24, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/timestamp-skip branch from b48f64c to 0e6e33c Compare April 24, 2026 15:19
@openshift-ci openshift-ci Bot added size/xs and removed size/xs labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/xs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants