Skip to content

adding a script that calls aipcc index and parses hashes for a given …#76

Open
nsingla wants to merge 1 commit into
opendatahub-io:mainfrom
nsingla:fix_multi_arch_hashes
Open

adding a script that calls aipcc index and parses hashes for a given …#76
nsingla wants to merge 1 commit into
opendatahub-io:mainfrom
nsingla:fix_multi_arch_hashes

Conversation

@nsingla
Copy link
Copy Markdown

@nsingla nsingla commented May 15, 2026

…list of arches

Description of your changes:

Checklist:

Pre-Submission Checklist

Additional Checklist Items for New or Updated Components/Pipelines

  • metadata.yaml includes fresh lastVerified timestamp
  • All required files
    are present and complete
  • OWNERS file lists appropriate maintainers
  • README provides clear documentation with usage examples
  • Component follows snake_case naming convention
  • No security vulnerabilities in dependencies
  • Containerfile included if using a custom base image

Summary by CodeRabbit

  • Chores
    • Updated CI/CD workflow trigger conditions to detect changes in build dependencies
    • Enhanced lockfile verification to validate both requirements files remain synchronized
    • Refactored requirements compilation into a standalone script for improved build system maintainability

…list of arches

Signed-off-by: Nelesh Singla <117123879+nsingla@users.noreply.github.com>
@openshift-ci openshift-ci Bot requested review from HumairAK and hbelmiro May 15, 2026 19:42
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign humairak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR consolidates requirement generation from inline Makefile shell commands into a dedicated Python script. The new scripts/compile_requirements.py generates both requirements.txt and requirements-build.txt by invoking uv pip compile against the AIPCC index, then fetches wheel SHA256 hashes for multiple architectures (x86_64, aarch64, ppc64le, s390x) plus universal and ABI3 wheels from the index's simple HTML pages. The Makefile target is simplified to call this script. CI validation is updated to trigger on changes to the new script and build requirements file, and to verify that both lockfiles remain in sync after running make requirements.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Security observations

  • Hash-fetch error handling (CWE-252: Unchecked Return Value): The fetch_hashes_from_index function logs warnings on index fetch failures but continues with empty hashes. Verify that missing hashes do not create a silent downgrade path where unsigned/unverified packages are installed if the index becomes temporarily unavailable; consider failing hard on index fetch errors for packages unless a fallback cache is intentionally implemented.

  • Subprocess execution with user input (CWE-78: OS Command Injection): The resolve_packages function passes extra_args directly to the uv pip compile subprocess. Audit the caller (main) to confirm extra_args is never populated from untrusted sources; currently only hardcoded flags appear to be used, but this pattern should be restricted or validated to prevent command injection.

  • Regex-based HTML parsing for SHA extraction (CWE-400: Uncontrolled Resource Consumption): The wheel filename regex in fetch_hashes_from_index extracts hashes from raw HTML pages without length limits or complexity controls. Large or malformed index responses could cause ReDoS or memory exhaustion; consider validating response size or switching to an HTML parser.

  • Missing timeout on index fetch: fetch_hashes_from_index uses requests.get() without an explicit timeout parameter. Add a timeout to prevent indefinite hangs on network issues.

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description is largely empty—only the incomplete title continuation is provided, with all checklist items unmarked and no explanation of changes, rationale, or implementation details. Fill in 'Description of your changes' with details on what the script does, why it was created, and how it integrates with the build system. Mark completed checklist items accordingly.
Title check ❓ Inconclusive Title is incomplete and vague, cutting off mid-sentence with 'given …' instead of properly completing the thought about parsing hashes for architectures. Complete the title with a clear, specific description. Example: 'feat(scripts): add requirements compilation script with multi-arch hash resolution' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
.github/workflows/sync-requirements.yml (1)

30-31: ⚡ Quick win

Split the failure branch to satisfy workflow lint limits.

Line 31 exceeds the configured YAML line-length limit reported by static analysis.

Proposed fix
-          git diff --exit-code requirements.txt requirements-build.txt \
-            || { echo ""; echo "Lockfiles are out of sync."; echo "Run 'make requirements' and commit the result."; exit 1; }
+          git diff --exit-code requirements.txt requirements-build.txt || {
+            echo ""
+            echo "Lockfiles are out of sync."
+            echo "Run 'make requirements' and commit the result."
+            exit 1
+          }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/sync-requirements.yml around lines 30 - 31, The long
one-liner that runs git diff and an inline failure block (the command starting
with "git diff --exit-code requirements.txt requirements-build.txt || { echo
\"\"; echo \"Lockfiles are out of sync.\"; echo \"Run 'make requirements' and
commit the result.\"; exit 1; }") exceeds YAML line-length limits; replace it
with a short multiline shell block: run the git diff command on its own, then
use an if/then/else or explicit check of its exit status to echo the three
messages and exit nonzero in the else branch (for example: run git diff
--exit-code ...; if [ $? -ne 0 ]; then echo ""; echo "Lockfiles are out of
sync."; echo "Run 'make requirements' and commit the result."; exit 1; fi) so
each line is short and the workflow linter is satisfied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/sync-requirements.yml:
- Around line 6-10: The workflow's pull_request path filter is missing Makefile,
so add "Makefile" to the paths array under the paths: key in the
sync-requirements.yml diff (i.e., update the existing paths list that contains
pyproject.toml, requirements.txt, requirements-build.txt,
scripts/compile_requirements.py) so changes to Makefile also trigger the job;
ensure the entry is exactly "Makefile" (case-sensitive) and commit the updated
YAML.

In `@Makefile`:
- Line 110: The Makefile invokes the system python directly ("python
scripts/compile_requirements.py"), which can produce inconsistent environments;
update that line to call the project-managed runtime instead (for example
replace "python" with the Makefile Python variable used elsewhere like
"$(PYTHON)" or "$(VENV_PYTHON)", or use the project tool wrapper such as "poetry
run python scripts/compile_requirements.py") so lockfile generation runs in the
same managed environment as the rest of the build.

In `@scripts/compile_requirements.py`:
- Around line 148-151: The current code writes requirements without hashes when
fetch_hashes_from_index returns empty (the block that checks "if not hashes" and
appends f"{name}=={version}{marker_part}"), which breaks integrity checks;
change this to fail fast by raising an exception or exiting with a non-zero
status unless an explicit opt-in CLI flag (e.g., --allow-missing-hashes) is
provided; if the flag is set, still record the package into a separate
"missing-hashes" report file or log entry for manual review instead of silently
writing an un-hashed requirement, and update the CLI parsing and relevant
function (e.g., compile_requirements / the caller that invokes
fetch_hashes_from_index) to support and document the new flag.
- Around line 25-31: The _parse_build_requires function currently uses a fragile
regex; replace it with proper TOML parsing using tomllib (or tomli as a backport
fallback): read the pyproject.toml text, parse it with tomllib.loads (or
tomli.loads if tomllib import fails), then extract requires =
parsed["build-system"]["requires"], validate presence and type, and return that
list (raising SystemExit with the existing message if the keys are missing);
ensure you do not perform manual quote stripping and that the function returns a
list[str] of the parsed entries.
- Around line 110-112: The except Exception block that swallows all errors
around the index fetch (the block that logs "WARNING: could not fetch index for
{name}: {e}" and returns []) should be replaced with granular handlers: catch
HTTP 404 responses specifically (treat as "no hashes" and return [] for the
package name), catch other HTTP/transport errors
(requests.exceptions.RequestException / urllib.error.URLError / TimeoutError)
and JSON/decoding errors separately and log the full traceback (use
traceback.format_exc()) instead of a simple message, and for unexpected critical
errors either re-raise after logging or allow the caller to handle them; update
the handler(s) around the fetch logic that reference the variable name and the
log(...) call and add an import for traceback as needed.

---

Nitpick comments:
In @.github/workflows/sync-requirements.yml:
- Around line 30-31: The long one-liner that runs git diff and an inline failure
block (the command starting with "git diff --exit-code requirements.txt
requirements-build.txt || { echo \"\"; echo \"Lockfiles are out of sync.\"; echo
\"Run 'make requirements' and commit the result.\"; exit 1; }") exceeds YAML
line-length limits; replace it with a short multiline shell block: run the git
diff command on its own, then use an if/then/else or explicit check of its exit
status to echo the three messages and exit nonzero in the else branch (for
example: run git diff --exit-code ...; if [ $? -ne 0 ]; then echo ""; echo
"Lockfiles are out of sync."; echo "Run 'make requirements' and commit the
result."; exit 1; fi) so each line is short and the workflow linter is
satisfied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6cdbc020-72de-4233-a9f3-b31c40c34361

📥 Commits

Reviewing files that changed from the base of the PR and between 66be6f9 and d77424a.

📒 Files selected for processing (3)
  • .github/workflows/sync-requirements.yml
  • Makefile
  • scripts/compile_requirements.py

Comment on lines 6 to +10
paths:
- pyproject.toml
- uv.lock
- requirements.txt
- requirements-build.txt
- scripts/compile_requirements.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include Makefile in PR path filters for this check.

Lockfile generation behavior can change via Makefile; without it in pull_request.paths, this job may not run for relevant changes.

Proposed fix
   pull_request:
     paths:
       - pyproject.toml
       - requirements.txt
       - requirements-build.txt
       - scripts/compile_requirements.py
+      - Makefile
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
paths:
- pyproject.toml
- uv.lock
- requirements.txt
- requirements-build.txt
- scripts/compile_requirements.py
paths:
- pyproject.toml
- requirements.txt
- requirements-build.txt
- scripts/compile_requirements.py
- Makefile
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/sync-requirements.yml around lines 6 - 10, The workflow's
pull_request path filter is missing Makefile, so add "Makefile" to the paths
array under the paths: key in the sync-requirements.yml diff (i.e., update the
existing paths list that contains pyproject.toml, requirements.txt,
requirements-build.txt, scripts/compile_requirements.py) so changes to Makefile
also trigger the job; ensure the entry is exactly "Makefile" (case-sensitive)
and commit the updated YAML.

Comment thread Makefile
printf 'setuptools\nwheel\n' | uv pip compile --generate-hashes --no-header --no-annotate \
--python-version 3.12 \
--index-url $(AIPCC_INDEX_URL) - >> requirements-build.txt
python scripts/compile_requirements.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the project-managed runtime for lockfile generation.

Line 110 invokes python directly, which can resolve to a different interpreter/environment than the rest of this Makefile and produce inconsistent lockfiles.

Proposed fix
 requirements:
-	python scripts/compile_requirements.py
+	$(UVRUN) python scripts/compile_requirements.py
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python scripts/compile_requirements.py
$(UVRUN) python scripts/compile_requirements.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 110, The Makefile invokes the system python directly
("python scripts/compile_requirements.py"), which can produce inconsistent
environments; update that line to call the project-managed runtime instead (for
example replace "python" with the Makefile Python variable used elsewhere like
"$(PYTHON)" or "$(VENV_PYTHON)", or use the project tool wrapper such as "poetry
run python scripts/compile_requirements.py") so lockfile generation runs in the
same managed environment as the rest of the build.

Comment on lines +25 to +31
def _parse_build_requires() -> list[str]:
"""Extract [build-system] requires from pyproject.toml."""
content = (REPO_ROOT / "pyproject.toml").read_text()
match = re.search(r"\[build-system\].*?requires\s*=\s*\[(.*?)\]", content, re.DOTALL)
if not match:
raise SystemExit("Could not find [build-system] requires in pyproject.toml")
return [dep.strip().strip("\"'") for dep in match.group(1).split(",") if dep.strip().strip("\"'")]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Regex-based TOML parsing will break on valid syntax. (CWE-1286: Improper Validation of Syntactic Correctness)

The regex pattern r"\[build-system\].*?requires\s*=\s*\[(.*?)\]" with re.DOTALL will fail on:

  • Multi-line arrays with TOML formatting
  • Comments within the requires array
  • Trailing commas
  • Any nested TOML structures between [build-system] and requires

Manual quote stripping on line 31 doesn't handle escaped characters.

Use tomllib (Python 3.11+) or tomli (backport) for robust parsing.

📝 Proposed fix using tomllib
+try:
+    import tomllib
+except ImportError:
+    import tomli as tomllib
+
 def _parse_build_requires() -> list[str]:
     """Extract [build-system] requires from pyproject.toml."""
-    content = (REPO_ROOT / "pyproject.toml").read_text()
-    match = re.search(r"\[build-system\].*?requires\s*=\s*\[(.*?)\]", content, re.DOTALL)
-    if not match:
+    with open(REPO_ROOT / "pyproject.toml", "rb") as f:
+        data = tomllib.load(f)
+    requires = data.get("build-system", {}).get("requires")
+    if not requires:
         raise SystemExit("Could not find [build-system] requires in pyproject.toml")
-    return [dep.strip().strip("\"'") for dep in match.group(1).split(",") if dep.strip().strip("\"'")]
+    return requires
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _parse_build_requires() -> list[str]:
"""Extract [build-system] requires from pyproject.toml."""
content = (REPO_ROOT / "pyproject.toml").read_text()
match = re.search(r"\[build-system\].*?requires\s*=\s*\[(.*?)\]", content, re.DOTALL)
if not match:
raise SystemExit("Could not find [build-system] requires in pyproject.toml")
return [dep.strip().strip("\"'") for dep in match.group(1).split(",") if dep.strip().strip("\"'")]
try:
import tomllib
except ImportError:
import tomli as tomllib
def _parse_build_requires() -> list[str]:
"""Extract [build-system] requires from pyproject.toml."""
with open(REPO_ROOT / "pyproject.toml", "rb") as f:
data = tomllib.load(f)
requires = data.get("build-system", {}).get("requires")
if not requires:
raise SystemExit("Could not find [build-system] requires in pyproject.toml")
return requires
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/compile_requirements.py` around lines 25 - 31, The
_parse_build_requires function currently uses a fragile regex; replace it with
proper TOML parsing using tomllib (or tomli as a backport fallback): read the
pyproject.toml text, parse it with tomllib.loads (or tomli.loads if tomllib
import fails), then extract requires = parsed["build-system"]["requires"],
validate presence and type, and return that list (raising SystemExit with the
existing message if the keys are missing); ensure you do not perform manual
quote stripping and that the function returns a list[str] of the parsed entries.

Comment on lines +110 to +112
except Exception as e:
log(f" WARNING: could not fetch index for {name}: {e}")
return []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Broad exception handler hides all errors.

Catching Exception on line 110 silently suppresses:

  • Network failures (transient vs. persistent)
  • HTTP 404 (package not in index)
  • HTTP 403/401 (auth issues)
  • Decoding errors
  • Timeout errors

This makes debugging failures difficult and treats all errors as "package has no hashes."

Catch specific exceptions or at minimum log the full traceback for non-404 errors.

🔍 Proposed fix with granular error handling
+    from urllib.error import HTTPError, URLError
+
     try:
         with urllib.request.urlopen(url, timeout=30) as resp:
             html = resp.read().decode()
-    except Exception as e:
+    except HTTPError as e:
+        if e.code == 404:
+            log(f"  WARNING: package {name} not found in index")
+        else:
+            log(f"  ERROR: HTTP {e.code} fetching {name}: {e}")
+        return []
+    except (URLError, OSError) as e:
         log(f"  WARNING: could not fetch index for {name}: {e}")
         return []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/compile_requirements.py` around lines 110 - 112, The except Exception
block that swallows all errors around the index fetch (the block that logs
"WARNING: could not fetch index for {name}: {e}" and returns []) should be
replaced with granular handlers: catch HTTP 404 responses specifically (treat as
"no hashes" and return [] for the package name), catch other HTTP/transport
errors (requests.exceptions.RequestException / urllib.error.URLError /
TimeoutError) and JSON/decoding errors separately and log the full traceback
(use traceback.format_exc()) instead of a simple message, and for unexpected
critical errors either re-raise after logging or allow the caller to handle
them; update the handler(s) around the fetch logic that reference the variable
name and the log(...) call and add an import for traceback as needed.

Comment on lines +148 to +151
if not hashes:
log(f" WARNING: no matching wheels found for {name}=={version}")
lines.append(f"{name}=={version}{marker_part}")
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Writing requirements without hashes defeats integrity verification. (CWE-494: Download of Code Without Integrity Check)

When fetch_hashes_from_index returns no hashes (network error, missing wheels, etc.), line 150 writes the requirement without --hash=... entries. This allows pip install -r to accept any uploaded version of that package without checksum verification, enabling supply-chain attacks.

Either:

  1. Fail hard when hashes are missing for critical packages
  2. Generate a separate "missing-hashes" report that requires manual review
  3. Document that lockfiles without full hashes must not be used in production
🔒 Proposed fix: fail on missing hashes
         if not hashes:
             log(f"  WARNING: no matching wheels found for {name}=={version}")
-            lines.append(f"{name}=={version}{marker_part}")
-            continue
+            raise SystemExit(f"CRITICAL: Cannot generate secure lockfile without hashes for {name}=={version}")

Or add a --allow-missing-hashes flag if some packages legitimately have no matching wheels.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not hashes:
log(f" WARNING: no matching wheels found for {name}=={version}")
lines.append(f"{name}=={version}{marker_part}")
continue
if not hashes:
log(f" WARNING: no matching wheels found for {name}=={version}")
raise SystemExit(f"CRITICAL: Cannot generate secure lockfile without hashes for {name}=={version}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/compile_requirements.py` around lines 148 - 151, The current code
writes requirements without hashes when fetch_hashes_from_index returns empty
(the block that checks "if not hashes" and appends
f"{name}=={version}{marker_part}"), which breaks integrity checks; change this
to fail fast by raising an exception or exiting with a non-zero status unless an
explicit opt-in CLI flag (e.g., --allow-missing-hashes) is provided; if the flag
is set, still record the package into a separate "missing-hashes" report file or
log entry for manual review instead of silently writing an un-hashed
requirement, and update the CLI parsing and relevant function (e.g.,
compile_requirements / the caller that invokes fetch_hashes_from_index) to
support and document the new flag.

Copy link
Copy Markdown

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants