rhdh: gzip junit results in SHARED_DIR and align overlays resource limits#81313
Conversation
The junit-results.xml on the overlays main branch exceeds the Kubernetes Secret 1 MiB size limit, causing the SHARED_DIR Secret update to fail and leaving the send-data-router step with no junit file to report. Fix by gzipping junit XML before writing to SHARED_DIR (XML compresses ~10-15x) and decompressing in the data-router steps to ARTIFACT_DIR before processing and sending to Data Router. Changes: - overlays ocp-helm: gzip junit + 800 KB size check before SHARED_DIR - overlays send-data-router: decompress .gz to ARTIFACT_DIR, fall back to plain XML for backward compat - rhdh send-data-router: same decompression + fallback (preventive, activated by a follow-up PR in redhat-developer/rhdh) Assisted-by: OpenCode
WalkthroughThe PR updates JUnit report handling for Data Router send steps to use artifact-local files, adds gzip support and size checks for exported reports, and changes CI runner CPU and memory values. ChangesJUnit artifact handling and Data Router send flow
Estimated code review effort: 3 (Moderate) | ~20 minutes Suggested labels: Sequence Diagram(s)sequenceDiagram
participant HelmExportScript
participant OverlaySendScript
participant RHDHSendScript
participant droute
HelmExportScript->>OverlaySendScript: produce junit-results.xml.gz
OverlaySendScript->>droute: send staged junit-results.xml
RHDHSendScript->>droute: send staged junit-results-*.xml
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/pj-rehearse periodic-ci-redhat-developer-rhdh-plugin-export-overlays-main-e2e-ocp-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh`:
- Around line 96-105: Treat the missing JUnit case as a normal skip in
process_junit_file and the downstream send-step check, not as a failure. Update
the logic in
redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh so
that when junit-results.xml.gz and junit-results.xml are both absent, it exits
the helper cleanly and the later verification around the data-router junit
output does not abort the step. Keep the behavior aligned with
process_junit_file and the check near the send-step validation so the
overflow/capped-report path is tolerated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: adcc4158-ebcb-4431-abc5-f2165ca0b212
📒 Files selected for processing (3)
ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/ocp/helm/redhat-developer-rhdh-plugin-export-overlays-ocp-helm-commands.shci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.shci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh
Match the RHDH ocp-helm step resource specs: - CPU: 1 request / 10 limit (was 2 / 4) - Memory: 1Gi request / 5Gi limit (was 6Gi / 8Gi) Assisted-by: OpenCode
|
/pj-rehearse periodic-ci-redhat-developer-rhdh-plugin-export-overlays-main-e2e-ocp-helm-nightly |
|
@zdrapela: your |
|
/pj-rehearse periodic-ci-redhat-developer-rhdh-plugin-export-overlays-main-e2e-ocp-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse ack |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- Remove set +o nounset (all env vars have defaults in ref YAML) - Add explicit exit 0 to guarantee the step never fails the CI job - Change ERROR to WARNING for missing junit (expected in overflow case) - Fix unbound GIT_PR_NUMBER and TAG_NAME references in rhdh data-router - Add comments explaining the set +o errexit contract Assisted-by: OpenCode
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh (1)
103-117: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winStage decompressed XML atomically before suppressing fallback.
With
errexitdisabled, a failedgunzipcan leave an empty/partial XML at the final path. Line 116 then skips copying a valid plain XML fallback because the file exists.Proposed fix
for junit_gz in "${SHARED_DIR}"/junit-results-*.xml.gz; do if [[ -f "$junit_gz" ]]; then local filename filename=$(basename "${junit_gz%.gz}") - gunzip -c "$junit_gz" > "${ARTIFACT_DIR}/data-router/${filename}" - echo "Decompressed $(basename "$junit_gz") -> ${ARTIFACT_DIR}/data-router/${filename}" + local staged_file="${ARTIFACT_DIR}/data-router/${filename}" + local tmp_file="${staged_file}.tmp" + if gunzip -c "$junit_gz" > "$tmp_file"; then + mv "$tmp_file" "$staged_file" + echo "Decompressed $(basename "$junit_gz") -> ${staged_file}" + else + rm -f "$tmp_file" + echo "WARNING: Failed to decompress $(basename "$junit_gz"); plain XML fallback will be used if present" + fi fi done @@ - if [[ ! -f "${ARTIFACT_DIR}/data-router/${filename}" ]]; then + if [[ ! -s "${ARTIFACT_DIR}/data-router/${filename}" ]]; then cp "$junit_shared" "${ARTIFACT_DIR}/data-router/${filename}" echo "Copied ${filename} from SHARED_DIR to ARTIFACT_DIR" fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh` around lines 103 - 117, The fallback copy logic in redhat-developer-rhdh-send-data-router-commands.sh can be skipped after a failed gunzip because the decompressed file is written directly to the final artifact path. Update the junit handling loop around the existing gunzip/cp logic to stage decompressed output atomically (for example via a temporary file and rename only on success) or otherwise verify successful decompression before creating the final XML path, so the plain XML copy in the junit_shared loop still runs when decompression fails.
🧹 Nitpick comments (2)
ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh (2)
99-104: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winUnchecked
gunzip/cpexit status can silently propagate a corrupt/empty JUnit file.Neither the
gunzip -c ... > "$junit_file"nor thecp "$junit_shared" "$junit_file"branch checks command success. Witherrexitdisabled, a corrupted.gzor a failed copy still leaves$junit_filepresent (possibly empty/truncated) — the downstream existence check at Line 269-273 would pass, and a bad/empty report could be sent to Data Router without any error surfaced.🩹 Proposed fix
if [[ -f "$junit_gz" ]]; then - gunzip -c "$junit_gz" > "$junit_file" - echo "Decompressed junit-results.xml.gz -> ${junit_file}" + if ! gunzip -c "$junit_gz" > "$junit_file"; then + echo "ERROR: Failed to decompress ${junit_gz}" + return 1 + fi + echo "Decompressed junit-results.xml.gz -> ${junit_file}" elif [[ -f "$junit_shared" ]]; then - cp "$junit_shared" "$junit_file" - echo "Copied junit-results.xml from SHARED_DIR to ARTIFACT_DIR" + if ! cp "$junit_shared" "$junit_file"; then + echo "ERROR: Failed to copy ${junit_shared}" + return 1 + fi + echo "Copied junit-results.xml from SHARED_DIR to ARTIFACT_DIR" else🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh` around lines 99 - 104, The JUnit file preparation in the data-router commands script does not verify whether the `gunzip` or `cp` branch actually succeeded, so a bad archive or failed copy can still leave a bogus `$junit_file` behind. Update the logic around the `gunzip -c` and `cp` calls to explicitly check their exit status (or fail fast on error) before proceeding, and only log success when the file was actually created correctly. Use the `junit_gz`, `junit_shared`, and `junit_file` branches in this script to locate the fix.
261-261: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
process_junit_filereturn code is ignored at the call site.Failures inside
process_junit_file(e.g. the sed rewrites, or the fix above returning 1) aren't checked here; the script only detects failure indirectly via the file-existence probe at Line 269-273. Explicitly checking the return value would make the failure path clearer and avoid relying solely on a side-effect check.♻️ Suggested fix
- process_junit_file + process_junit_file || return 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh` at line 261, The call to process_junit_file is ignoring its return code, so failures from the function can be missed until the later file-existence check. Update the call site in the data-router commands script to capture and immediately check process_junit_file’s exit status, and handle a non-zero result explicitly before continuing; use the process_junit_file symbol to locate the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh`:
- Line 231: The `redhat-developer-rhdh-send-data-router-commands.sh` PR metadata
argument is only reading `GIT_PR_NUMBER`, which leaves `pr` empty when the job
is identified by `PULL_NUMBER` instead. Update the `--arg pr` value in the
data-router command assembly to fall back to `PULL_NUMBER` when `GIT_PR_NUMBER`
is unset, while keeping the existing PR-job detection logic consistent. Use the
same PR metadata handling path in this script so the emitted `pr` attribute is
preserved for both environment variable sources.
---
Outside diff comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh`:
- Around line 103-117: The fallback copy logic in
redhat-developer-rhdh-send-data-router-commands.sh can be skipped after a failed
gunzip because the decompressed file is written directly to the final artifact
path. Update the junit handling loop around the existing gunzip/cp logic to
stage decompressed output atomically (for example via a temporary file and
rename only on success) or otherwise verify successful decompression before
creating the final XML path, so the plain XML copy in the junit_shared loop
still runs when decompression fails.
---
Nitpick comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.sh`:
- Around line 99-104: The JUnit file preparation in the data-router commands
script does not verify whether the `gunzip` or `cp` branch actually succeeded,
so a bad archive or failed copy can still leave a bogus `$junit_file` behind.
Update the logic around the `gunzip -c` and `cp` calls to explicitly check their
exit status (or fail fast on error) before proceeding, and only log success when
the file was actually created correctly. Use the `junit_gz`, `junit_shared`, and
`junit_file` branches in this script to locate the fix.
- Line 261: The call to process_junit_file is ignoring its return code, so
failures from the function can be missed until the later file-existence check.
Update the call site in the data-router commands script to capture and
immediately check process_junit_file’s exit status, and handle a non-zero result
explicitly before continuing; use the process_junit_file symbol to locate the
fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 7f312f2e-367f-40c3-9281-fb02b1edf9c9
📒 Files selected for processing (2)
ci-operator/step-registry/redhat-developer/rhdh-plugin-export-overlays/send/data-router/redhat-developer-rhdh-plugin-export-overlays-send-data-router-commands.shci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh
| --arg description "[View job run details](${JOB_URL})" \ | ||
| --arg job_type "$JOB_TYPE" \ | ||
| --arg pr "$GIT_PR_NUMBER" \ | ||
| --arg pr "${GIT_PR_NUMBER:-}" \ |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
Preserve PR metadata from PULL_NUMBER when GIT_PR_NUMBER is absent.
This file already uses PULL_NUMBER to detect PR jobs, but the new default emits an empty pr attribute when only PULL_NUMBER is set.
Proposed fix
- --arg pr "${GIT_PR_NUMBER:-}" \
+ --arg pr "${GIT_PR_NUMBER:-${PULL_NUMBER:-}}" \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --arg pr "${GIT_PR_NUMBER:-}" \ | |
| --arg pr "${GIT_PR_NUMBER:-${PULL_NUMBER:-}}" \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/redhat-developer/rhdh/send/data-router/redhat-developer-rhdh-send-data-router-commands.sh`
at line 231, The `redhat-developer-rhdh-send-data-router-commands.sh` PR
metadata argument is only reading `GIT_PR_NUMBER`, which leaves `pr` empty when
the job is identified by `PULL_NUMBER` instead. Update the `--arg pr` value in
the data-router command assembly to fall back to `PULL_NUMBER` when
`GIT_PR_NUMBER` is unset, while keeping the existing PR-job detection logic
consistent. Use the same PR metadata handling path in this script so the emitted
`pr` attribute is preserved for both environment variable sources.
cd985dd to
460158c
Compare
|
[REHEARSALNOTIFIER]
A total of 60 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@zdrapela: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/pj-rehearse ack |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rm3l, zdrapela The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Gzip junit XML before writing to SHARED_DIR to stay under the Kubernetes Secret 1 MiB size limit. Raw junit XML can exceed this when coverage <property> tags and many test cases are present. XML compresses ~10-15x with gzip, keeping even large files well under the limit. Also adds a safety check that removes the gzipped file if it still exceeds 800 KB after compression, and validates the source file exists before attempting the copy. The data-router step in openshift/release already handles both .gz and plain .xml formats (openshift/release#81313). Assisted-by: OpenCode
Gzip junit XML before writing to SHARED_DIR to stay under the Kubernetes Secret 1 MiB size limit. Raw junit XML can exceed this when coverage <property> tags and many test cases are present. XML compresses ~10-15x with gzip, keeping even large files well under the limit. Also adds a safety check that removes the gzipped file if it still exceeds 800 KB after compression, and validates the source file exists before attempting the copy. The data-router step in openshift/release already handles both .gz and plain .xml formats (openshift/release#81313). Assisted-by: OpenCode
…mits (openshift#81313) * rhdh: gzip junit results in SHARED_DIR to avoid Secret size limit The junit-results.xml on the overlays main branch exceeds the Kubernetes Secret 1 MiB size limit, causing the SHARED_DIR Secret update to fail and leaving the send-data-router step with no junit file to report. Fix by gzipping junit XML before writing to SHARED_DIR (XML compresses ~10-15x) and decompressing in the data-router steps to ARTIFACT_DIR before processing and sending to Data Router. Changes: - overlays ocp-helm: gzip junit + 800 KB size check before SHARED_DIR - overlays send-data-router: decompress .gz to ARTIFACT_DIR, fall back to plain XML for backward compat - rhdh send-data-router: same decompression + fallback (preventive, activated by a follow-up PR in redhat-developer/rhdh) Assisted-by: OpenCode * rhdh overlays: align ocp-helm resource requests/limits with rhdh Match the RHDH ocp-helm step resource specs: - CPU: 1 request / 10 limit (was 2 / 4) - Memory: 1Gi request / 5Gi limit (was 6Gi / 8Gi) Assisted-by: OpenCode * rhdh: harden data-router scripts error handling - Remove set +o nounset (all env vars have defaults in ref YAML) - Add explicit exit 0 to guarantee the step never fails the CI job - Change ERROR to WARNING for missing junit (expected in overflow case) - Fix unbound GIT_PR_NUMBER and TAG_NAME references in rhdh data-router - Add comments explaining the set +o errexit contract Assisted-by: OpenCode
https://redhat.atlassian.net/browse/RHDHBUGS-3428
Problem
The
junit-results.xmlon the overlaysmainbranch exceeds the Kubernetes Secret 1 MiB size limit. ci-operator'sentrypoint-wrappersyncsSHARED_DIRcontents into a Secret, but after base64 encoding the effective raw limit is ~768 KB. The junit file grew past this due to coverage<property>tags and increased test/workspace count.Fix
1. Gzip junit in SHARED_DIR
Gzip junit XML before writing to
SHARED_DIR(XML compresses ~10-15x), decompress in the data-router steps toARTIFACT_DIRbefore processing and sending to Data Router.SHARED_DIR.gztoARTIFACT_DIR/data-router/, fall back to plain XML for backward compatredhat-developer/rhdh)2. Align resource limits
3. Harden data-router error handling
Both scripts:
set +o nounset(all env vars havedefault: ""in ref YAML, so nounset is safe)$GIT_PR_NUMBERand$TAG_NAMEreferences in rhdh data-router (${VAR:-})Overlays data-router — fail on genuine errors (no
send-alertstep exists for notification):return 1→ step fails → visible in Prowdroute sendfailure after max retries:return 1return 1(unchanged)RHDH data-router — never fail the job (has
send-alertas notification backstop):exit 0at end of scriptERRORtoWARNINGfor missing junitFollow-up
A separate PR in
redhat-developer/rhdhwill update.ci/pipelines/lib/testing.shto gzip junit files before writing toSHARED_DIR. The RHDH data-router change here handles both compressed and uncompressed files, so there is no ordering dependency.Summary by CodeRabbit
This PR updates the RHDH plugin export CI flow to avoid Secret size failures and to make Data Router processing work with the new compressed junit handoff.
ocp-helmartifact collection step now gzipsjunit-results.xmlbefore placing it inSHARED_DIR, with a size guard to skip oversized archives that could still break Secret updates.send-data-routerstep forocp-helmnow consumes the gzipped junit file fromSHARED_DIR, decompresses it intoARTIFACT_DIR, and sends results from that local artifact path.send-data-routerstep was updated to process junit files fromARTIFACT_DIR/data-router, support both gzipped and plain XML inputs, and keep retry-based error handling aligned with CI failure behavior.ocp-helmstep resource requests/limits were adjusted to match the RHDH step profile.