Skip to content

Fix ImageStream cache failure crashing requests instead of falling back to Quay#1357

Merged
yashvardhannanavati merged 1 commit into
mainfrom
fix_iib_cache
Jun 26, 2026
Merged

Fix ImageStream cache failure crashing requests instead of falling back to Quay#1357
yashvardhannanavati merged 1 commit into
mainfrom
fix_iib_cache

Conversation

@yashvardhannanavati

Copy link
Copy Markdown
Collaborator

When iib_use_imagestream_cache is enabled but the index-db-cache ImageStream is unavailable, the IIBError from oc now triggers a graceful fallback to pulling index.db directly from Quay instead of failing the entire request.

@qodo-for-releng

Copy link
Copy Markdown

PR Summary by Qodo

Gracefully fall back to Quay when ImageStream cache access fails
🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

Description

• Catch ImageStream cache access failures and fall back to pulling index.db from Quay.
• Preserve existing cache-sync/refresh behavior when ImageStream cache is healthy.
• Add a unit test covering the ImageStream failure → Quay fallback path.
Diagram

graph TD
  A["pull_index_db_artifact()"] --> B{Cache enabled?}
  B -->|No| Q["Get Quay pullspec"] --> P["oras pull (Quay)"] --> R["artifact_dir"]
  B -->|Yes| V["Verify cache sync"] --> D{Synced?}
  D -->|Yes| I["Get ImageStream pullspec"] --> O["oras pull (ImageStream)"] --> R
  D -->|No| F["Refresh cache"] --> Q --> P --> R
  V -. "IIBError" .-> Q
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Narrow the try/except scope to only the OpenShift/verify call
  • ➕ Avoids masking IIBError from unrelated failures (e.g., ORAS pull failures) that should probably still fail fast.
  • ➕ Makes it clearer which failure modes trigger the fallback.
  • ➖ May miss other ImageStream-related IIBErrors (e.g., imagestream pullspec computation or ORAS pull via ImageStream) where fallback is also desirable.
2. Catch specific OpenShift-not-found conditions (typed exception / error code)
  • ➕ Prevents fallback on unexpected IIBError conditions and preserves strict failure semantics for real bugs.
  • ➖ Requires stronger error typing/plumbing; more code and potentially fragile message matching if not typed.

Recommendation: The PR’s approach (catch IIBError and fall back to Quay) is pragmatic for availability: the primary goal is to keep requests working when the ImageStream cache is down. If reviewers are concerned about masking non-ImageStream failures, consider narrowing the try/except to the ImageStream-only path or logging the caught exception details for debugging while still falling back.

Files changed (2) +62 / -13

Bug fix (1) +23 / -13
containerized_utils.pyFall back to Quay when ImageStream cache access raises IIBError +23/-13

Fall back to Quay when ImageStream cache access raises IIBError

• Wraps the ImageStream cache verification/usage block in a try/except for IIBError. On failure, logs a warning and pulls index.db directly from Quay instead of failing the request.

iib/workers/tasks/containerized_utils.py

Tests (1) +39 / -0
test_containerized_utils.pyAdd unit test for ImageStream failure → Quay fallback +39/-0

Add unit test for ImageStream failure → Quay fallback

• Introduces a test that simulates verify_indexdb_cache_for_image raising IIBError when cache is enabled. Asserts the function logs a warning and pulls the artifact from the Quay pullspec via ORAS.

tests/test_workers/test_tasks/test_containerized_utils.py

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 26, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:29 AM UTC · Completed 4:40 AM UTC
Commit: 8207ad4 · View workflow run →

@qodo-for-releng

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📜 Skill insights (0)

Grey Divider


Remediation recommended

1. Overbroad fallback try/except 🐞 Bug ☼ Reliability
Description
In pull_index_db_artifact, the try/except IIBError wraps both the ImageStream path and the Quay pull
path, so a Quay pull IIBError is logged as an ImageStream cache failure and then retried once via
the fallback. This adds latency and makes it harder to tell whether the failure was in ImageStream
access or the Quay pull itself.
Code

iib/workers/tasks/containerized_utils.py[R245-271]

+        try:
+            # Verify index.db cache is synced. Refresh if not.
+            log.info('ImageStream cache is enabled. Checking cache sync status.')
+            if verify_indexdb_cache_for_image(from_index):
+                log.info('Index.db cache is synced. Pulling from ImageStream.')
+                # Pull from ImageStream when digests match
+                imagestream_ref = get_imagestream_artifact_pullspec(from_index)
+                artifact_dir = get_oras_artifact(
+                    imagestream_ref,
+                    temp_dir,
+                )
+            else:
+                log.info('Index.db cache is not synced. Refreshing and pulling from Quay.')
+                refresh_indexdb_cache_for_image(from_index)
+                # Pull directly from Quay after triggering refresh
+                artifact_ref = get_indexdb_artifact_pullspec(from_index)
+                artifact_dir = get_oras_artifact(
+                    artifact_ref,
+                    temp_dir,
+                )
+        except IIBError:
+            log.warning(
+                'ImageStream cache access failed. Falling back to pulling directly from Quay.'
            )
-        else:
-            log.info('Index.db cache is not synced. Refreshing and pulling from Quay.')
-            refresh_indexdb_cache_for_image(from_index)
-            # Pull directly from Quay after triggering refresh
            artifact_ref = get_indexdb_artifact_pullspec(from_index)
            artifact_dir = get_oras_artifact(
                artifact_ref,
Relevance

⭐⭐⭐ High

Team accepted prior change to avoid misleading error logs/fallthrough in exception handling (similar
reliability issue).

PR-#1061

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new handler catches IIBError across both ImageStream and Quay branches; since get_oras_artifact
wraps any pull failure in IIBError, a Quay pull error inside the try will enter the fallback block
and cause an extra Quay pull attempt with a misleading cache-failure warning.

iib/workers/tasks/containerized_utils.py[243-273]
iib/workers/tasks/oras_utils.py[97-148]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`pull_index_db_artifact()` catches `IIBError` around both the ImageStream logic and the Quay pull logic. This means Quay pull failures can be misclassified as ImageStream cache failures and unnecessarily retried.

### Issue Context
`get_oras_artifact()` raises `IIBError` for any artifact pull failure, regardless of whether the registry is the internal ImageStream registry or Quay.

### Fix Focus Areas
- iib/workers/tasks/containerized_utils.py[244-273]

### Suggested fix approach
- Restructure control flow so the `try/except IIBError` only wraps the ImageStream-specific operations:
 - (a) the cache verification that requires `oc`/ImageStream access, and/or
 - (b) the `get_oras_artifact(imagestream_ref, ...)` call.
- Keep the Quay pull outside that `try`, so a Quay pull failure is not logged as an ImageStream cache failure and is not redundantly retried.
- Optionally, if you *do* want a retry for Quay pulls, implement it explicitly with clear logging and limited attempts (instead of as an accidental side effect of the ImageStream fallback handler).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Warning drops exception context 🐞 Bug ◔ Observability
Description
The fallback handler logs a generic warning without including the caught exception, losing the stack
trace/message that explains why ImageStream cache access failed. This reduces observability when
diagnosing production failures.
Code

iib/workers/tasks/containerized_utils.py[R265-268]

+        except IIBError:
+            log.warning(
+                'ImageStream cache access failed. Falling back to pulling directly from Quay.'
            )
-        else:
-            log.info('Index.db cache is not synced. Refreshing and pulling from Quay.')
-            refresh_indexdb_cache_for_image(from_index)
-            # Pull directly from Quay after triggering refresh
Relevance

⭐⭐ Medium

Mixed evidence: they use exc_info/log.exception sometimes, but exception-context suggestions also
rejected elsewhere.

PR-#1061
PR-#1181
PR-#1097

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The code explicitly logs only a static warning string in the exception handler, so the underlying
error (e.g., oc/imagestream not found) is not captured in logs.

iib/workers/tasks/containerized_utils.py[265-268]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The fallback log message does not include the triggering exception, which makes diagnosing cache access problems harder.

### Issue Context
The `except IIBError:` block logs a warning message but does not log the exception message/traceback.

### Fix Focus Areas
- iib/workers/tasks/containerized_utils.py[265-268]

### Suggested fix approach
- Change to `except IIBError as e:` and log with traceback/context, e.g.:
 - `log.warning('...fallback...', exc_info=True)`
 - or `log.warning('...fallback...: %s', e, exc_info=True)`
- Keep the rest of the fallback behavior unchanged.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review

Findings

Low

  • [Comment Style] iib/workers/tasks/containerized_utils.py:246 — The multi-line comment block (lines 246–249) explaining the try/except rationale spans four lines, which is uncommon in this codebase where inline comments are typically single-line and single-concern.
    Remediation: Consider condensing to a single line, e.g. # Try ImageStream cache first; fall back to Quay on any error.

  • [Comment Content] iib/workers/tasks/containerized_utils.py:260 — Two comments use em dash characters (—) at lines 260 and 268, which do not appear elsewhere in the codebase's Python files.
    Remediation: Use a regular hyphen or comma for clause separation to match existing comment style.

  • [scope-creep] iib/workers/tasks/containerized_utils.py — The refactor moves the Quay pull path outside the if/else block into a shared fallback, which is a structural change beyond the try/except addition. The user-visible behavior is preserved (stale-cache requests still pull from Quay after triggering a refresh), so this is a clean refactor rather than a behavioral change.

Previous run

Review

Findings

Medium

  • [error-handling] iib/workers/tasks/containerized_utils.py:264 — The except IIBError block catches errors from all calls inside the try block, not just the ImageStream-availability check. If verify_indexdb_cache_for_image succeeds (cache is synced) but get_oras_artifact fails while pulling from the ImageStream (e.g., network timeout, auth failure), the code silently falls back to pulling from Quay instead of surfacing the real pull error. Similarly, if refresh_indexdb_cache_for_image raises IIBError, it is swallowed. This broad catch masks genuine failures unrelated to ImageStream availability.
    Remediation: Narrow the try/except scope to wrap only the ImageStream-availability checks (verify_indexdb_cache_for_image, get_imagestream_artifact_pullspec), or add an inner try/except around the get_oras_artifact(imagestream_ref, ...) call that re-raises the IIBError.

Low

  • [test-inadequate] tests/test_workers/test_tasks/test_containerized_utils.py:171 — The new test only covers the scenario where verify_indexdb_cache_for_image raises IIBError. There are no tests for the fallback being triggered by failures in get_imagestream_artifact_pullspec, get_oras_artifact (when pulling from the ImageStream), or refresh_indexdb_cache_for_image.

  • [exception-handling-consistency] iib/workers/tasks/containerized_utils.py:265 — The bare except IIBError: pattern without capturing the exception as a variable loses the error context in the log message. The warning log says "ImageStream cache access failed" but does not include the actual error details, which would aid debugging.

  • [log-message-formatting] iib/workers/tasks/containerized_utils.py:266 — The multi-line log.warning() message does not follow the established formatting pattern of splitting long strings into separate concatenated parts on different lines.


Labels: PR fixes a bug in ImageStream cache fallback behavior

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread iib/workers/tasks/containerized_utils.py Outdated
Comment thread tests/test_workers/test_tasks/test_containerized_utils.py
Comment thread iib/workers/tasks/containerized_utils.py Outdated
Comment thread iib/workers/tasks/containerized_utils.py Outdated
@fullsend-ai-review fullsend-ai-review Bot added the bug Something isn't working label Jun 26, 2026
…ck to Quay

When iib_use_imagestream_cache is enabled but the index-db-cache
ImageStream is unavailable, the IIBError from oc now triggers a
graceful fallback to pulling index.db directly from Quay instead
of failing the entire request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 26, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:55 AM UTC · Completed 5:08 AM UTC
Commit: 8207ad4 · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot added the ready-for-merge All reviewers approved — ready to merge label Jun 26, 2026
@yashvardhannanavati yashvardhannanavati merged commit 7bfc2eb into main Jun 26, 2026
27 checks passed
@yashvardhannanavati yashvardhannanavati deleted the fix_iib_cache branch June 26, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-merge All reviewers approved — ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants