Skip to content

Fixed analysis when gather-extra fails in post phase#118

Closed
qu4rkn3t wants to merge 2 commits into
redhat-performance:mainfrom
qu4rkn3t:fix-gather-extra-analysis
Closed

Fixed analysis when gather-extra fails in post phase#118
qu4rkn3t wants to merge 2 commits into
redhat-performance:mainfrom
qu4rkn3t:fix-gather-extra-analysis

Conversation

@qu4rkn3t

Copy link
Copy Markdown
Contributor

Description

When gather-extra fails in the post phase, it doesn't create clusteroperators.json. BugZooka would just return a generic error with no deeper analysis.

Now, we first check if orion performance data exists. If not, we get relevant errors from build logs and then trigger LLM analysis of those errors.

Error link: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.22-nightly-x86-payload-control-plane-6nodes/2064660673337495552

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: fe6826b9-0644-4e67-8a04-6f7cf052f94f

📥 Commits

Reviewing files that changed from the base of the PR and between 180deb9 and 80050c8.

📒 Files selected for processing (1)
  • bugzooka/analysis/prow_analyzer.py

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Improved error analysis handling when certain diagnostic files are unavailable.
    • Enhanced error detection and categorization for end-of-pipeline stages with more accurate failure classification.

Walkthrough

When clusteroperators.json is unavailable, analyze_prow_artifacts now calls scan_orion_jsons to prioritize changepoint detection. If Orion previews are found, errors combine matched log lines with Orion insights and override categorization for post-phase steps. Without Orion results, the function falls back to build-log LLM-triggered analysis via search_prow_errors.

Changes

Error analysis fallback prioritization

Layer / File(s) Summary
Missing clusteroperators.json fallback with Orion-first strategy
bugzooka/analysis/prow_analyzer.py
Replaces build-log tail concatenation with a two-tier error analysis: attempts Orion changepoint scanning first, returns Orion-based results when previews exist and handles categorization overrides for post-phase steps, then falls back to LLM-triggered build-log analysis if no Orion results are found.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fixed analysis when gather-extra fails in post phase' directly matches the main change, which improves error analysis when gather-extra fails in the post phase.
Description check ✅ Passed The description clearly explains the problem (missing clusteroperators.json when gather-extra fails), the solution (checking Orion data first, then extracting build log errors for LLM analysis), and provides a concrete example link.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

with open(build_file_path, "r", errors="replace", encoding="utf-8") as f:
build_log_content = list(deque(f, maxlen=BUILD_LOG_TAIL))
# check if there are orion results first, as the actual test may have performance issues
orion_preview, orion_full, cp_tests = scan_orion_jsons(directory_path)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orion scan here is not relevant. we just need to verify if the failure phase is post and then if it is classified under existing maintenance bucket, our error message should be

gather-extra runs at the end of the pipeline. Appears to be general workload failure, but we have performance results to look further.

Instead of

This appears to be an installation or maintenance issue. Please re-trigger the run.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok pushed new changes

)
# No orion results, so fall back to build log analysis (which triggers LLM analysis)
return ProwAnalysisResult(
errors=[

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better error reporting (that's why I set requires_llm=True)

)
cluster_operators_file_path = os.path.join(directory_path, "clusteroperators.json")
if not os.path.isfile(cluster_operators_file_path):
with open(build_file_path, "r", errors="replace", encoding="utf-8") as f:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this being skipped?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed because we are using orion

@qu4rkn3t qu4rkn3t closed this Jun 30, 2026
@qu4rkn3t qu4rkn3t deleted the fix-gather-extra-analysis branch June 30, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants