Skip to content

feat: failover route on cold-start time out #1280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yyewolf
Copy link

@yyewolf yyewolf commented Apr 7, 2025

This PR is an attempt at solving #874 by adding a fallback service.

One use case would be the following :

  • You have a service that takes a bit of time to scale up from zero
  • You want the user to have feedback on what's going on instead of seeing the website loading

It's currently implemented so that if there's an error or time out occuring while fetching endpoints, it would then choose to forward to the fallback service.

To make sure the request is properly forwarded there, you can use a startupProbe on the Pod to make sure it does not fill the service's endpoints before it's actually in a Ready state.

This change includes no breaking change CRD wise because it doesn't affect the behavior of the operator / interceptor when the field is not present.

I'm looking for your help on this matter, whether you think this is a good idea or not and discuss about the changes.

I haven't done much regarding testing and documenting in case this proposition is not accepted.

An example of this PR being in use is available here : https://juice.hackcorp.net/.

It is funneled through an ExternalName service, that points to another service that points to a caddy with a small static page.

Checklist

Fixes #874

@yyewolf yyewolf requested a review from a team as a code owner April 7, 2025 21:32
@wozniakjan wozniakjan self-assigned this Apr 8, 2025
@vadasambar
Copy link

@yyewolf make test is failing

--- FAIL: TestForwarderSuccess (0.10s)
    upstream_test.go:49: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:49
                Error:          Should be true
                Test:           TestForwarderSuccess
                Messages:       request was not received within 100ms
--- FAIL: TestForwarderHeaderTimeout (0.00s)
    upstream_test.go:96: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:96
                Error:          Not equal: 
                                expected: 502
                                actual  : 500
                Test:           TestForwarderHeaderTimeout
--- FAIL: TestForwarderWaitsForSlowOrigin (0.01s)
    upstream_test.go:145: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:145
                Error:          Not equal: 
                                expected: 200
                                actual  : 500
                Test:           TestForwarderWaitsForSlowOrigin
2025/04/09 15:59:15 forwardRequest took 8.834µs
--- FAIL: TestForwarderConnectionRetryAndTimeout (0.00s)
    upstream_test.go:177: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:177
                Error:          "8.834µs" is not greater than or equal to "150ms"
                Test:           TestForwarderConnectionRetryAndTimeout
                Messages:       proxy returned after 772.125µs, expected not to return until 150ms
--- FAIL: TestForwardRequestRedirectAndHeaders (0.00s)
    upstream_test.go:222: 
                Error Trace:    /Users/suraj.bankar/work/http-add-on/interceptor/handler/upstream_test.go:222
                Error:          Not equal: 
                                expected: 301
                                actual  : 500
                Test:           TestForwardRequestRedirectAndHeaders

@yyewolf yyewolf force-pushed the feat/fallback-service branch 2 times, most recently from a951b04 to 9a4a5e2 Compare April 9, 2025 15:15
@yyewolf
Copy link
Author

yyewolf commented Apr 9, 2025

@vadasambar Should be better now, I inadvertently put true as a default in the upstream tests while in reality these scenario don't include a fallback stream.

@yyewolf yyewolf force-pushed the feat/fallback-service branch from 9a4a5e2 to 67083b7 Compare April 9, 2025 15:18
@vadasambar
Copy link

@vadasambar Should be better now, I inadvertently put true as a default in the upstream tests while in reality these scenario don't include a fallback stream.

no problem :) thank you

@wozniakjan wozniakjan requested a review from Copilot April 29, 2025 09:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a fallback mechanism for HTTP requests so that if the primary service is unreachable or times out, the request is forwarded to a designated fallback service. Key changes include:

  • Adding FallbackTargetRef support in the CRD and associated getter methods.
  • Propagating and handling fallback URL within contexts by introducing RequestWithFallbackStream, ContextWithFallbackStream, and FallbackStreamFromContext.
  • Adjusting proxy, routing, and upstream handlers and tests to incorporate fallback logic.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/util/contexthttp.go Added RequestWithFallbackStream function to inject a fallback stream into the request context.
pkg/util/context.go Introduced a new context key and helper functions for fallback streams.
operator/apis/http/v1alpha1/httpscaledobject_types.go Added FallbackTargetRef struct and its getter methods, along with CRD validation for fallback.
interceptor/proxy_handlers_test.go Introduced a new test ensuring requests are forwarded to the fallback service.
interceptor/proxy_handlers.go Modified forwarding handler to use fallback when the primary wait function fails.
interceptor/middleware/routing.go Updated routing logic to inject fallback stream into context if FallbackTargetRef is provided.
interceptor/handler/upstream.go Updated upstream handler construction to account for fallback streaming.
config/crd/bases/http.keda.sh_httpscaledobjects.yaml Updated CRD to include fallbackTargetRef with validation rules.
Comments suppressed due to low confidence (1)

interceptor/proxy_handlers_test.go:217

  • [nitpick] Remove the debug print statement to avoid unwanted output during test runs.
fmt.Println(res)

@yyewolf yyewolf force-pushed the feat/fallback-service branch from 67083b7 to f73338b Compare May 1, 2025 17:00
@wozniakjan wozniakjan changed the title feat: Add FallbackTargetRef and fallback to it when the request would be timing out feat: failover route on cold-start time out May 2, 2025
@yyewolf yyewolf force-pushed the feat/fallback-service branch from f73338b to 27a7716 Compare May 3, 2025 14:17
@yyewolf yyewolf force-pushed the feat/fallback-service branch from 27a7716 to c617762 Compare May 3, 2025 14:19
@wozniakjan wozniakjan requested a review from Copilot May 7, 2025 14:38
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a failover mechanism to route HTTP requests when the primary backend is unavailable during a cold start. Key changes include:

  • Adding new context helpers and functions to manage a failover stream.
  • Extending API types with a ColdStartTimeoutFailoverRef and its getters.
  • Updating proxy and upstream handler logic to support conditional failover routing.
  • Adding new tests to validate that requests are correctly forwarded to the fallback service.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/util/contexthttp.go Adds RequestWithFailoverStream to inject a failover stream into the request context.
pkg/util/context.go Introduces context keys and helper functions for the failover stream.
operator/apis/http/v1alpha1/httpscaledobject_types.go Adds ColdStartTimeoutFailoverRef and its getter methods, along with validation rules.
interceptor/proxy_handlers_test.go Adds a new test to validate forwarding to the fallback service.
interceptor/proxy_handlers.go Updates wait error handling and upstream creation to take the failover configuration into account.
interceptor/middleware/routing.go Adjusts stream retrieval to allow passing in a generic reference for both primary and failover routes.
interceptor/handler/upstream_test.go Updates tests to pass the new parameter for upstream creation.
interceptor/handler/upstream.go Modifies upstream logic to switch to the failover stream based on a flag.
config/crd/bases/http.keda.sh_httpscaledobjects.yaml Updates the CRD schema to include the new coldStartTimeoutFailoverRef field.

@@ -73,7 +74,7 @@ func newForwardingHandler(
httpso.GetNamespace(),
httpso.Spec.ScaleTargetRef.Service,
)
if err != nil {
if err != nil && !hasFailover {
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding diagnostic logging for the failover case when waitFunc returns an error (i.e. when hasFailover is true) to ease troubleshooting in production scenarios.

Copilot uses AI. Check for mistakes.

@@ -29,6 +31,10 @@ func (uh *Upstream) ServeHTTP(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

stream := util.StreamFromContext(ctx)
if uh.shouldFailover {
stream = util.FailoverStreamFromContext(ctx)
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] If the failover stream retrieved from the context is nil despite shouldFailover being true, consider logging a warning to help diagnose potential misconfigurations.

Suggested change
stream = util.FailoverStreamFromContext(ctx)
stream = util.FailoverStreamFromContext(ctx)
if stream == nil {
log.Printf("Warning: Failover stream is nil despite shouldFailover being true. Request: %v", r)
}

Copilot uses AI. Check for mistakes.

wozniakjan
wozniakjan previously approved these changes May 7, 2025
@wozniakjan wozniakjan dismissed their stale review May 7, 2025 14:41

premature approval

@wozniakjan
Copy link
Member

looks like the validation checks didn't succeed. Can you please address the errors @yyewolf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Please hold" landing pages for slow scale from zero scenarios
4 participants