Skip to content

Conversation

@ahmet2mir
Copy link

Hello, it's a first approach with edging.

On design I followed what Grafana does (as they have a huge experience on http requests (dashboard/connectors, mimir, loki, k6s etc) and i try to keep the client customizable.

IMHO hedging and retry must be mutual exclusive to avoid overloading a server, on user side, they could init a client with hedging if request fail, create a new resty client with "standard" retry .

Regarding "write" operation on requests, i prefer to not enable it, only on read, on write operation it could be dangerous (crystal accept write)

On testing part i created a hedging server, but i could move URI to createGenericServer if better.

I didn't included the lib hedging (no commit sicne last year but almost stable i think) but try to follow your styling and way of spliting the code, let me know if you prefer that way or if i use the external lib.

Simple test


import (
	"fmt"
	"net/http"
	"resty.dev/v3"
	"strconv"
	"sync/atomic"
	"time"
)

func main() {
	// Start a test server that simulates slow responses
	requestCount := atomic.Int32{}

	http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
		count := requestCount.Add(1)
		fmt.Printf("[Server] Request #%d received at %s\n", count, time.Now().Format("15:04:05.000"))

		// First request is slow, subsequent ones are slower and result should come from first
		if count == 1 {
			time.Sleep(2 * time.Second)
		} else {
			time.Sleep(5 * time.Second)
		}

		fmt.Fprintf(w, "Response from request #%d", count)
	})

	go http.ListenAndServe(":9999", nil)
	time.Sleep(100 * time.Millisecond) // Give server time to start

	// Create client with hedging enabled
	client := resty.New()

	// Delay of 50ms means: if first request doesn't complete in 50ms, send another one
	if err := client.EnableHedging(100*time.Millisecond, 5, 10.0); err != nil {
		fmt.Println("Error enabling hedging:", err)
		panic(err)
	}

	fmt.Println("[Client] Starting hedged request...")
	start := time.Now()

	res, err := client.R().
		SetQueryParams(map[string]string{
			"page_no": "1",
			"limit":   "20",
			"sort":    "name",
			"order":   "asc",
			"random":  strconv.FormatInt(time.Now().Unix(), 10),
		}).
		SetHeader("Accept", "application/json").
		Get("http://localhost:9999/slow")

	duration := time.Since(start)

	if err != nil {
		fmt.Println("[Client] Error:", err)
		panic(err)
	}

	fmt.Printf("\n[Client] Response received in %v\n", duration)
	fmt.Printf("[Client] Status: %d\n", res.StatusCode())
	fmt.Printf("[Client] Body: %s\n", res.String())
	fmt.Printf("[Client] Total requests made to server: %d\n", requestCount.Load())
}

I'm open to discuss if you have any concern or enhancement

Close #1050

@ahmet2mir ahmet2mir changed the title feat(hedging): support hedging in order to reduce latency when querin… feat(hedging): support hedging in order to reduce latency when quering endpoints Dec 17, 2025
Copy link
Member

@jeevatkm jeevatkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmet2mir Thanks for the PR. I apologize for the delayed response. I’ll review your PR notes and implementation, and I’ll get back to you soon.

Copy link
Member

@jeevatkm jeevatkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmet2mir, I’ve reviewed the PR code changes and your notes. Please find my perspective and suggestions for code improvements.

  • It’s great to understand your thought process and design insights. Thanks for sharing about Grafana, etc.
  • I wouldn’t recommend disabling retry by default. However, if we disable it by default and provide an option to enable it, then I’m okay with that. Having a fallback mechanism for retry is practical, rather than creating a new Resty client, which isn’t an appropriate approach. Once all the hedged requests fail, a full retry loop can take over.
  • I don’t include any libraries in the Resty except Go’s built-in libraries and packages from golang.org.
  • I agree with you to restrict support to read-only HTTP methods. However, we should also provide an option to enable other methods, similar to how Resty does in many places. This way, users have the flexibility to choose the methods they want to use. May be its more appropriate to rename the method to isReadyOnlyMethod instead of isSafeMethod.
  • I suggest removing the existing new fields from the client and instead adding them to the unexported hedging struct. This way, we’ll have only one new field for hedging. When enabling, we should create an instance of the hedging struct and assign it to the client.
  • I need to think about the approach, "how to keep hedging flow and keep existing flow as fallback flow.” I need some time to think about it. Of course, feel free to suggest one.
  • I reviewed the source code on (https://github.com/cristalhq/hedgedhttp) and the PR implementation. I noticed significant changes and improvements. Therefore, I suggest the following: (Also, please add a single line of space before the line “package resty”).
    // Copyright (c) 2015-present Jeevanandam M ([email protected]), All rights reserved.
    // 2025 Ahmet Demir (https://github.com/ahmet2mir)
    // resty source code and usage is governed by a MIT style
    // license that can be found in the LICENSE file.
    // SPDX-License-Identifier: MIT
    
    package resty 
    
    // This hedging implementation draws inspiration from the reference provided here: https://github.com/cristalhq/hedgedhttp.
    
    ...
    
  • In the go.mod, keep only go 1.23.0, please remove the go 1.24.0 line and toolchain line.
  • It’s fine to keep the hedging server within the hedging_test.go file. However, please avoid duplicating the code within test function. Instead, I suggest creating the createHeadingServer method and using the createTestServer method within wrap the function instead of httptest.NewServer(http.HandlerFunc(
  • Do we really need to use the rate limiter package here for the wait?

@ahmet2mir
Copy link
Author

ahmet2mir commented Dec 30, 2025

Hi, thanks for detailed feedback, i try to fix based on remarks,

  1. I changed the way to enable Hedging following the chained mode.
client.
	EnableHedging(50*time.Millisecond, 5, 10.0).
	SetRetryCount(1).
	SetRetryWaitTime(2 * time.Second).
	SetRetryMaxWaitTime(5 * time.Second)

Any function using SetHedgingXXXX will return current client and log error telling hedging is not enabled.
This avoid managing error and declaring hedging out of client declaration.

e.g.

```go
client.
	SetHedgingDelay(50*time.Millisecond, 5, 10.0).
	SetRetryCount(1).
	SetRetryWaitTime(2 * time.Second).
	SetRetryMaxWaitTime(5 * time.Second)

Will show

2025/12/30 12:51:54.544191 ERROR RESTY SetHedgingDelay: resty: hedging not enabled, ignoring this option, please enable with EnableHedging() first
  1. Retry is restored and it will retry the whole "hedging" e.g.
[Client] Retry fallback... 1
[Client] Starting hedged request...

[Server] Request #1 received at 12:47:21.701
[Server] Request #2 received at 12:47:21.851
[Server] Request #3 received at 12:47:22.002
[Server] Request #4 received at 12:47:22.153
[Server] Request #5 received at 12:47:22.304

# retry
[Server] Request #6 received at 12:47:28.912
[Server] Request #7 received at 12:47:29.064
[Server] Request #8 received at 12:47:29.215
[Server] Request #9 received at 12:47:29.366
[Server] Request #10 received at 12:47:29.516

[Client] Response received in 12.215696403s
[Client] Status: 500
[Client] Body: Response from request #6
[Client] Total requests made to server: 10
  1. hedging fields are now moved to dedicated struct

  2. Headers & mentions fixed

  3. go.mod fixed

  4. tests enhanced

  5. removed ratelimiter package in favor of simple vanilla timer

  6. Regarding the question "how to keep hedging flow and keep existing flow as fallback flow.”

Hedging returns the first response to complete, not the first successful response.

For example request #1 would have succeeded after 10 seconds, but it was cancelled when request #2 completed with an error at 5 seconds.

A good approach is hedging with up to 3 requests (if we consider that every service on earth are running at least on 3 locations with good load balancing)
Then retry with custom conditions

Example on error greater or equal to 500 and ignore 4XX errors

client.
	EnableHedging(1*time.Second, 3, 3.0).
	SetRetryCount(3).
	SetRetryWaitTime(2 * time.Second).
	SetRetryMaxWaitTime(5 * time.Second).
	AddRetryCondition(func(r *resty.Response, err error) bool {
    	return r.StatusCode() >= 500
  	})

…ging func if not enabled, restore retry, allow unsafe funcs

Signed-off-by: Ahmet DEMIR <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Retry with "hedge" to reduce tail latency at scale

2 participants