Skip to content

Conversation

@chadcrum
Copy link
Contributor

@chadcrum chadcrum commented Dec 18, 2025

Summary

Fix multiple issues causing RHDH OCP Orchestrator Helm e2e jobs (e2e-ocp-helm) to fail in the showcase-rbac namespace.

Root Cause: The helm chart's create-sonataflow-database job does not include the PGSSLMODE environment variable, causing database creation to fail when connecting to external PostgreSQL instances that require SSL (Crunchy Data PostgreSQL).

Fixes included:

  • Add manual SSL-enabled database creation as a workaround
  • Improve database creation reliability with proper error handling and timeouts
  • Remove readOnlyRootFilesystem restriction (psql needs /tmp write access for SSL)
  • Increase dynamic-plugins-root volume from 2Gi to 5Gi
  • Add --wait --timeout flags to helm install commands
  • Increase Keycloak login timeout to 30 seconds
  • Fix E2E test selectors and helper methods

Jira: RHDHBUGS-2449

Test plan

  • Verify e2e-ocp-helm Prow job passes
  • Verify sonataflow database creation succeeds with SSL

Test Results

Tested 5 times - all helm deployments deployed without issue and all runs passed with zero failures.

Run Status Duration Passed Failed Skipped
1 ✅ Succeeded 55m 15s 32 0 37
2 ✅ Succeeded 48m 49s 38 0 31
3 ✅ Succeeded 1h 4m 33s 37 0 31
4 ✅ Succeeded 1h 11m 4s 32 0 37
5 ✅ Succeeded 1h 1m 58s 32 0 37

Note: Variance in passed/skipped counts is due to conditional test skipping in rbac.spec.ts based on environment timing, not failures.

🤖 Generated with Claude Code

@openshift-ci
Copy link

openshift-ci bot commented Dec 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign albarbaro for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chadcrum chadcrum changed the title fix(ci): Fix RHDH OCP Orchestrator Helm e2e job failures fix(ci): fix RHDH OCP Orchestrator Helm e2e job failures Dec 18, 2025
@chadcrum chadcrum marked this pull request as draft December 18, 2025 15:51
@chadcrum
Copy link
Contributor Author

/ok-to-test

chadcrum and others added 12 commits December 18, 2025 11:10
The orchestrator workflows table selector was looking for
"WorkflowsNameCategoryLast" but the actual UI only displays columns:
Name, Workflow Status, Last run, Last run status, Description, Actions.

The "Category" column does not exist in the release-1.8 UI, causing
the orchestrator RBAC tests to fail with element not found errors.

This fix updates the selector to match the actual table header text
"Workflows" which is present in the UI.

Backported from commit f17d95b (PR redhat-developer#3406) in main branch.

Fixes failing test:
- Test Orchestrator RBAC > Test global orchestrator workflow access is allowed

Related: FLPATH-2798
… install

Add --wait --timeout=5m flags to the greeting workflow helm install command
to ensure workflow pods are ready before tests execute.

Without --wait, the helm command returns immediately while pods are still
initializing, which can cause:
- Tests to run before workflows are available
- Race conditions between workflow deployment and test execution
- Pods experiencing CreateContainerConfigError during startup

With --wait, helm monitors the release and only returns success when all
pods are Running and pass readiness probes. The 5-minute timeout provides
ample time for the pods to start (observed ready time: ~90 seconds).

This ensures tests only run against fully-initialized infrastructure and
provides clearer failure messages if pods cannot start.

Related: FLPATH-2798
…se creation

Add manual database creation workaround for showcase-rbac deployment to handle
SSL-required connections to external Crunchy Data PostgreSQL clusters.

The helm chart's create-sonataflow-database job does not inject PGSSLMODE
environment variable, causing authentication failures when connecting to
external PostgreSQL instances that require SSL (Crunchy Data operator).

This fix adds:
- create_sonataflow_database_with_ssl() helper function
- Temporary pod that runs psql with PGSSLMODE=require
- Proper SSL configuration from postgres-cred secret

Without SSL configuration:
  FATAL: no pg_hba.conf entry for host "X.X.X.X", user "janus-idp",
  database "postgres", no encryption

This resolves CrashLoopBackOff issues in showcase-rbac namespace for:
- greeting workflow
- user-onboarding workflow
- sonataflow-platform-data-index-service
- sonataflow-platform-jobs-service

Related: FLPATH-2798
- Increase timeout from 2 minutes to 5 minutes to handle image pull delays and rate limiting
- Add database verification step to confirm successful creation
- Improve status reporting during pod creation with status change logging
- Add wait for jobs-service rollout before deploying workflows to prevent race conditions
- Better error handling and logging throughout the process

This addresses issues where the manual database creation pod was timing out
due to ImagePullBackOff delays (QPS exceeded) in the CI environment.
Separate variable declarations from assignments to avoid masking return values.
This resolves ShellCheck warnings in:
- create_sonataflow_database_with_ssl() function (line 889)
- verify_sonataflow_database() function (lines 983, 992)
- Return error code 1 when database creation pod fails
- Return error code 1 when database creation times out
- Clean up pod and show logs before returning on failure
- Change WARNING to ERROR for actual failure cases
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The securityContext with readOnlyRootFilesystem: true was preventing
psql from working properly because it needs to write temporary files
to /tmp during SSL connections to the external PostgreSQL database.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The default 2Gi ephemeral volume for dynamic-plugins-root is
insufficient when many plugins are enabled (orchestrator, kubernetes,
tekton, techdocs, keycloak, etc.). The init container fails with
"No space left on device" error during plugin extraction.

Increase the volume size to 5Gi for both showcase and RBAC namespaces
using the deployment.patch field in the Backstage CR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The default 10-second actionTimeout was being exceeded when the
Keycloak popup was slow to render, causing orchestrator RBAC tests
to fail during authentication setup.

Add explicit waitFor with 30-second timeout before interacting with
the Keycloak login form to handle slow responses.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@chadcrum
Copy link
Contributor Author

/test e2e-tests

@openshift-ci
Copy link

openshift-ci bot commented Dec 18, 2025

@chadcrum: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-ocp-helm

The following commands are available to trigger optional jobs:

/test e2e-aks-helm-nightly
/test e2e-aks-operator-nightly
/test e2e-eks-helm-nightly
/test e2e-eks-operator-nightly
/test e2e-gke-helm-nightly
/test e2e-gke-operator-nightly
/test e2e-ocp-helm-nightly
/test e2e-ocp-helm-upgrade-nightly
/test e2e-ocp-operator-auth-providers-nightly
/test e2e-ocp-operator-nightly
/test e2e-ocp-v4-17-helm-nightly
/test e2e-ocp-v4-19-helm-nightly
/test e2e-ocp-v4-20-helm-nightly
/test e2e-osd-gcp-helm-nightly
/test e2e-osd-gcp-operator-nightly

Use /test all to run the following jobs that were automatically triggered:

pull-ci-redhat-developer-rhdh-release-1.8-e2e-ocp-helm
Details

In response to this:

/test e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@chadcrum
Copy link
Contributor Author

/test e2e-ocp-helm

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@github-actions
Copy link
Contributor

The image is available at:

/test e2e-ocp-helm

@chadcrum
Copy link
Contributor Author

/retest

@chadcrum
Copy link
Contributor Author

/test e2e-ocp-helm

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@chadcrum
Copy link
Contributor Author

/test e2e-ocp-helm

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@chadcrum
Copy link
Contributor Author

/test e2e-ocp-helm

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@chadcrum chadcrum marked this pull request as ready for review December 18, 2025 22:44
@openshift-ci openshift-ci bot requested a review from albarbaro December 18, 2025 22:44
@chadcrum
Copy link
Contributor Author

/test e2e-ocp-helm

@rhdh-qodo-merge
Copy link

You are above your monthly Qodo Merge usage quota. If you are a paying user, please link your GitHub/GitLab/Bitbucket account with your qodo account here to claim your seat. To allow usage organization-wide without linking, please reach to Qodo.

@chadcrum
Copy link
Contributor Author

@christoph-jerolimov @subhashkhileri I'm working with @gustavolira to help stabilize the ocp helm/operator rhdh jobs (related to orchestrator).

As @gustavolira is out until next year, can one of you take a look at this?

@chadcrum chadcrum marked this pull request as draft December 19, 2025 16:06
@chadcrum chadcrum marked this pull request as ready for review December 19, 2025 16:24
@openshift-ci openshift-ci bot requested review from psrna and zdrapela December 19, 2025 16:25
@github-actions
Copy link
Contributor

The image is available at:

/test e2e-ocp-helm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant