fix(ci): fix RHDH OCP Orchestrator Helm e2e job failures #3929

chadcrum · 2025-12-18T15:47:19Z

Summary

Fix multiple issues causing RHDH OCP Orchestrator Helm e2e jobs (e2e-ocp-helm) to fail in the showcase-rbac namespace.

Root Cause: The helm chart's create-sonataflow-database job does not include the PGSSLMODE environment variable, causing database creation to fail when connecting to external PostgreSQL instances that require SSL (Crunchy Data PostgreSQL).

Fixes included:

Add manual SSL-enabled database creation as a workaround
Improve database creation reliability with proper error handling and timeouts
Remove readOnlyRootFilesystem restriction (psql needs /tmp write access for SSL)
Increase dynamic-plugins-root volume from 2Gi to 5Gi
Add --wait --timeout flags to helm install commands
Increase Keycloak login timeout to 30 seconds
Fix E2E test selectors and helper methods

Jira: RHDHBUGS-2449

Test plan

Verify e2e-ocp-helm Prow job passes
Verify sonataflow database creation succeeds with SSL

Test Results

✅ Tested 5 times - all helm deployments deployed without issue and all runs passed with zero failures.

Run	Status	Duration	Passed	Skipped
1	✅ Succeeded	55m 15s	32	37
2	✅ Succeeded	48m 49s	38	31
3	✅ Succeeded	1h 4m 33s	37	31
4	✅ Succeeded	1h 11m 4s	32	37
5	✅ Succeeded	1h 1m 58s	32	37

Note: Variance in passed/skipped counts is due to conditional test skipping in rbac.spec.ts based on environment timing, not failures.

🤖 Generated with Claude Code

openshift-ci · 2025-12-18T15:47:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign albarbaro for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

.ibm/OWNERS
e2e-tests/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chadcrum · 2025-12-18T15:51:50Z

/ok-to-test

…or class

The orchestrator workflows table selector was looking for "WorkflowsNameCategoryLast" but the actual UI only displays columns: Name, Workflow Status, Last run, Last run status, Description, Actions. The "Category" column does not exist in the release-1.8 UI, causing the orchestrator RBAC tests to fail with element not found errors. This fix updates the selector to match the actual table header text "Workflows" which is present in the UI. Backported from commit f17d95b (PR redhat-developer#3406) in main branch. Fixes failing test: - Test Orchestrator RBAC > Test global orchestrator workflow access is allowed Related: FLPATH-2798

… install Add --wait --timeout=5m flags to the greeting workflow helm install command to ensure workflow pods are ready before tests execute. Without --wait, the helm command returns immediately while pods are still initializing, which can cause: - Tests to run before workflows are available - Race conditions between workflow deployment and test execution - Pods experiencing CreateContainerConfigError during startup With --wait, helm monitors the release and only returns success when all pods are Running and pass readiness probes. The 5-minute timeout provides ample time for the pods to start (observed ready time: ~90 seconds). This ensures tests only run against fully-initialized infrastructure and provides clearer failure messages if pods cannot start. Related: FLPATH-2798

…se creation Add manual database creation workaround for showcase-rbac deployment to handle SSL-required connections to external Crunchy Data PostgreSQL clusters. The helm chart's create-sonataflow-database job does not inject PGSSLMODE environment variable, causing authentication failures when connecting to external PostgreSQL instances that require SSL (Crunchy Data operator). This fix adds: - create_sonataflow_database_with_ssl() helper function - Temporary pod that runs psql with PGSSLMODE=require - Proper SSL configuration from postgres-cred secret Without SSL configuration: FATAL: no pg_hba.conf entry for host "X.X.X.X", user "janus-idp", database "postgres", no encryption This resolves CrashLoopBackOff issues in showcase-rbac namespace for: - greeting workflow - user-onboarding workflow - sonataflow-platform-data-index-service - sonataflow-platform-jobs-service Related: FLPATH-2798

- Increase timeout from 2 minutes to 5 minutes to handle image pull delays and rate limiting - Add database verification step to confirm successful creation - Improve status reporting during pod creation with status change logging - Add wait for jobs-service rollout before deploying workflows to prevent race conditions - Better error handling and logging throughout the process This addresses issues where the manual database creation pod was timing out due to ImagePullBackOff delays (QPS exceeded) in the CI environment.

Separate variable declarations from assignments to avoid masking return values. This resolves ShellCheck warnings in: - create_sonataflow_database_with_ssl() function (line 889) - verify_sonataflow_database() function (lines 983, 992)

- Return error code 1 when database creation pod fails - Return error code 1 when database creation times out - Clean up pod and show logs before returning on failure - Change WARNING to ERROR for actual failure cases

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The securityContext with readOnlyRootFilesystem: true was preventing psql from working properly because it needs to write temporary files to /tmp during SSL connections to the external PostgreSQL database. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The default 2Gi ephemeral volume for dynamic-plugins-root is insufficient when many plugins are enabled (orchestrator, kubernetes, tekton, techdocs, keycloak, etc.). The init container fails with "No space left on device" error during plugin extraction. Increase the volume size to 5Gi for both showcase and RBAC namespaces using the deployment.patch field in the Backstage CR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The default 10-second actionTimeout was being exceeded when the Keycloak popup was slow to render, causing orchestrator RBAC tests to fail during authentication setup. Add explicit waitFor with 30-second timeout before interacting with the Keycloak login form to handle slow responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

chadcrum · 2025-12-18T16:26:41Z

/test e2e-tests

openshift-ci · 2025-12-18T16:26:45Z

@chadcrum: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-ocp-helm

The following commands are available to trigger optional jobs:

/test e2e-aks-helm-nightly

/test e2e-aks-operator-nightly

/test e2e-eks-helm-nightly

/test e2e-eks-operator-nightly

/test e2e-gke-helm-nightly

/test e2e-gke-operator-nightly

/test e2e-ocp-helm-nightly

/test e2e-ocp-helm-upgrade-nightly

/test e2e-ocp-operator-auth-providers-nightly

/test e2e-ocp-operator-nightly

/test e2e-ocp-v4-17-helm-nightly

/test e2e-ocp-v4-19-helm-nightly

/test e2e-ocp-v4-20-helm-nightly

/test e2e-osd-gcp-helm-nightly

/test e2e-osd-gcp-operator-nightly

Use /test all to run the following jobs that were automatically triggered:

pull-ci-redhat-developer-rhdh-release-1.8-e2e-ocp-helm

Details

In response to this:

/test e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rhdh-qodo-merge · 2025-12-18T16:26:45Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

chadcrum · 2025-12-18T16:30:24Z

/test e2e-ocp-helm

rhdh-qodo-merge · 2025-12-18T16:30:28Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

github-actions · 2025-12-18T17:04:31Z

The image is available at:

/test e2e-ocp-helm

chadcrum · 2025-12-18T17:53:27Z

/retest

chadcrum · 2025-12-18T18:57:38Z

/test e2e-ocp-helm

rhdh-qodo-merge · 2025-12-18T18:57:41Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

chadcrum · 2025-12-18T19:49:50Z

/test e2e-ocp-helm

rhdh-qodo-merge · 2025-12-18T19:49:54Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

chadcrum · 2025-12-18T21:20:55Z

/test e2e-ocp-helm

rhdh-qodo-merge · 2025-12-18T21:20:58Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

chadcrum · 2025-12-18T23:38:48Z

/test e2e-ocp-helm

rhdh-qodo-merge · 2025-12-18T23:38:52Z

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

chadcrum · 2025-12-19T15:19:58Z

@christoph-jerolimov @subhashkhileri I'm working with @gustavolira to help stabilize the ocp helm/operator rhdh jobs (related to orchestrator).

As @gustavolira is out until next year, can one of you take a look at this?

github-actions · 2025-12-19T17:19:50Z

The image is available at:

/test e2e-ocp-helm

openshift-ci bot requested review from josephca and subhashkhileri December 18, 2025 15:47

chadcrum had a problem deploying to external December 18, 2025 15:47 — with GitHub Actions Error

chadcrum changed the title ~~fix(ci): Fix RHDH OCP Orchestrator Helm e2e job failures~~ fix(ci): fix RHDH OCP Orchestrator Helm e2e job failures Dec 18, 2025

chadcrum marked this pull request as draft December 18, 2025 15:51

openshift-ci bot added the do-not-merge/work-in-progress label Dec 18, 2025

openshift-ci bot added the ok-to-test label Dec 18, 2025

chadcrum and others added 12 commits December 18, 2025 11:10

fix(e2e): add missing selectGreetingWorkflowItem method to Orchestrat…

609194d

…or class

style: fix prettier formatting in workflows.ts

ae1ea89

style: fix ShellCheck SC2155 warnings in utils.sh

200a1c2

Separate variable declarations from assignments to avoid masking return values. This resolves ShellCheck warnings in: - create_sonataflow_database_with_ssl() function (line 889) - verify_sonataflow_database() function (lines 983, 992)

fix(ci): add proper error returns for database creation failures

f73f1d1

- Return error code 1 when database creation pod fails - Return error code 1 when database creation times out - Clean up pod and show logs before returning on failure - Change WARNING to ERROR for actual failure cases

style: fix prettier formatting in utils.sh

42071f2

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

chadcrum force-pushed the fix/orchestrator-helm-fixes-1.8 branch from 29a5eae to c6a9203 Compare December 18, 2025 16:10

chadcrum had a problem deploying to external December 18, 2025 16:11 — with GitHub Actions Error

chadcrum marked this pull request as ready for review December 18, 2025 22:44

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 18, 2025

openshift-ci bot requested a review from albarbaro December 18, 2025 22:44

chadcrum had a problem deploying to external December 18, 2025 22:44 — with GitHub Actions Error

chadcrum marked this pull request as draft December 19, 2025 16:06

openshift-ci bot added the do-not-merge/work-in-progress label Dec 19, 2025

chadcrum marked this pull request as ready for review December 19, 2025 16:24

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 19, 2025

openshift-ci bot requested review from psrna and zdrapela December 19, 2025 16:25

chadcrum had a problem deploying to external December 19, 2025 16:25 — with GitHub Actions Error

Merge branch 'release-1.8' into fix/orchestrator-helm-fixes-1.8

9b0e0ae

chadcrum requested a deployment to external December 19, 2025 16:25 — with GitHub Actions Waiting

fix(ci): fix RHDH OCP Orchestrator Helm e2e job failures #3929

Are you sure you want to change the base?

fix(ci): fix RHDH OCP Orchestrator Helm e2e job failures #3929

Uh oh!

Conversation

chadcrum commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Test Results

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 18, 2025

Uh oh!

rhdh-qodo-merge bot commented Dec 18, 2025

Uh oh!

chadcrum commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chadcrum commented Dec 18, 2025 •

edited

Loading