Skip to content

fix(gorch): add HAProxy timeout annotations to prevent 504 errors#692

Open
sheltoncyril wants to merge 1 commit into
trustyai-explainability:mainfrom
sheltoncyril:bugfix/RHOAIENG-33054-route-timeout
Open

fix(gorch): add HAProxy timeout annotations to prevent 504 errors#692
sheltoncyril wants to merge 1 commit into
trustyai-explainability:mainfrom
sheltoncyril:bugfix/RHOAIENG-33054-route-timeout

Conversation

@sheltoncyril
Copy link
Copy Markdown
Contributor

@sheltoncyril sheltoncyril commented Mar 30, 2026

The Guardrails Orchestrator routes were returning 504 Gateway Timeout errors after 30 seconds when LLM-based guardrails detection took longer to process. This occurred because OpenShift's HAProxy router enforces a default 30-second timeout on routes without explicit annotations.

This fix adds a 5-minute timeout annotation to three LLM-facing routes:

  • Orchestrator route (port 8032) - main API endpoint for chat/completions-detection
  • Gateway route - handles sidecar gateway LLM traffic
  • Built-in detector route - handles built-in detector LLM calls

Changes:

  • Added Annotations field to RouteConfig struct (controllers/utils/route.go)
  • Updated route template to conditionally render annotations (controllers/gorch/templates/route.tmpl.yaml)
  • Set haproxy.router.openshift.io/timeout: 5m on three route reconciliation functions

The 5-minute timeout allows slow LLM inference to complete while preventing truly hung connections from lingering indefinitely. This is a backward-compatible change that takes effect when the operator reconciles existing GuardrailsOrchestrator resources.

Fixes RHOAIENG-33054

Summary by CodeRabbit

  • Chores
    • Extended route configuration infrastructure to support custom metadata annotations, enabling greater flexibility in route behavior management and deployment configuration.
    • Implemented conditional annotation rendering in route templates to ensure proper handling of annotations during route deployment and reconciliation processes.
    • Added timeout configuration support for routes, allowing operators to fine-tune request handling behavior and improve overall system stability.

The Guardrails Orchestrator routes were returning 504 Gateway Timeout
errors after 30 seconds when LLM-based guardrails detection took longer
to process. This occurred because OpenShift's HAProxy router enforces a
default 30-second timeout on routes without explicit annotations.

This fix adds a 5-minute timeout annotation to three LLM-facing routes:
- Orchestrator route (port 8032) - main API endpoint for chat/completions-detection
- Gateway route - handles sidecar gateway LLM traffic
- Built-in detector route - handles built-in detector LLM calls

Changes:
- Added Annotations field to RouteConfig struct (controllers/utils/route.go)
- Updated route template to conditionally render annotations (controllers/gorch/templates/route.tmpl.yaml)
- Set haproxy.router.openshift.io/timeout: 5m on three route reconciliation functions

The 5-minute timeout allows slow LLM inference to complete while preventing
truly hung connections from lingering indefinitely. This is a backward-compatible
change that takes effect when the operator reconciles existing GuardrailsOrchestrator
resources.

Fixes RHOAIENG-33054
@sheltoncyril sheltoncyril requested a review from RobGeada March 30, 2026 11:35
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Added Annotations field to RouteConfig struct to enable custom route annotations. Three route reconciliation functions now set HAProxy router timeout annotations, and the route template conditionally renders these annotations when present.

Changes

Cohort / File(s) Summary
Route Configuration Structure
controllers/utils/route.go
Added exported Annotations map[string]string field to RouteConfig struct to store custom route annotations.
Route Reconciliation Logic
controllers/gorch/route.go
Populated Annotations field with HAProxy timeout settings "haproxy.router.openshift.io/timeout": "5m" in three route reconciliation functions: reconcileGatewayRoute, reconcileOrchestratorRoute, and reconcileBuiltInDetectorRoute.
Route Template Rendering
controllers/gorch/templates/route.tmpl.yaml
Added conditional rendering of metadata.annotations in the Route template; when .Annotations is populated, the template emits annotation key-value pairs; otherwise, the field is omitted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 The routes now wear fine annotations so grand,
Five minutes of patience, as timeouts are planned.
Through gateways and orchestrators, the timeout does flow,
HAProxy's requests now have time for their show!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding HAProxy timeout annotations to prevent 504 errors, which directly addresses the core problem and solution in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
controllers/gorch/route.go (1)

25-29: Deduplicate repeated timeout annotation literals.

The same annotation key/value and map literal is repeated in three places; extracting constants + helper reduces drift risk.

♻️ Proposed refactor
 const (
 	routeTemplatePath = "route.tmpl.yaml"
+	haproxyTimeoutAnnotationKey = "haproxy.router.openshift.io/timeout"
+	llmRouteTimeout             = "5m"
 )
+
+func llmRouteAnnotations() map[string]string {
+	return map[string]string{
+		haproxyTimeoutAnnotationKey: llmRouteTimeout,
+	}
+}
@@
-		Annotations: map[string]string{
-			// Fix for RHOAIENG-33054: Set HAProxy timeout to 5 minutes
-			// Gateway route handles LLM traffic that can exceed default 30s timeout
-			"haproxy.router.openshift.io/timeout": "5m",
-		},
+		Annotations: llmRouteAnnotations(),
@@
-		Annotations: map[string]string{
-			// Fix for RHOAIENG-33054: Set HAProxy timeout to 5 minutes
-			// LLM-based guardrails detection can take longer than the default 30s
-			"haproxy.router.openshift.io/timeout": "5m",
-		},
+		Annotations: llmRouteAnnotations(),
@@
-		Annotations: map[string]string{
-			// Fix for RHOAIENG-33054: Set HAProxy timeout to 5 minutes
-			// Built-in detector route handles LLM traffic that can exceed default 30s timeout
-			"haproxy.router.openshift.io/timeout": "5m",
-		},
+		Annotations: llmRouteAnnotations(),

Also applies to: 41-45, 61-65

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/gorch/route.go` around lines 25 - 29, The repeated annotation
literal ("haproxy.router.openshift.io/timeout": "5m") used in the Annotations
map is duplicated in three places; extract it into a shared constant (e.g.,
HAProxyTimeoutAnnotationKey and HAProxyTimeoutAnnotationValue) and replace the
inline map entry with a reuse pattern (either a small helper function like
buildHAProxyTimeoutAnnotations() that returns the map entry or a utility
function addHAProxyTimeoutAnnotation(annotations map[string]string)) and update
all occurrences (the Annotations map literals around the route creation) to use
the constant/helper to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controllers/gorch/route.go`:
- Around line 25-29: The repeated annotation literal
("haproxy.router.openshift.io/timeout": "5m") used in the Annotations map is
duplicated in three places; extract it into a shared constant (e.g.,
HAProxyTimeoutAnnotationKey and HAProxyTimeoutAnnotationValue) and replace the
inline map entry with a reuse pattern (either a small helper function like
buildHAProxyTimeoutAnnotations() that returns the map entry or a utility
function addHAProxyTimeoutAnnotation(annotations map[string]string)) and update
all occurrences (the Annotations map literals around the route creation) to use
the constant/helper to avoid drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 338453e1-ebd2-4ffb-a0b2-9e2491d0e330

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2cd26 and d82f035.

📒 Files selected for processing (3)
  • controllers/gorch/route.go
  • controllers/gorch/templates/route.tmpl.yaml
  • controllers/utils/route.go

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 30, 2026

@sheltoncyril: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e d82f035 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant