Skip to content

lacop11/grpc-rtt-repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gRPC multi-roundtrip latency issue notes

This documents an issue with gRPC on high latency links where the call latency is multiple times the roundtrip network latency once the packet size exceeds a certain value.

Related: grpc/grpc-go#5599 which was closed without any conclusions.

The setup

  • Create two test VMs on GCP in regions us-central1 and asia-southeast1

    • Machine type: c2-standard-4
    • All other settings are defaults (Debian bookworm boot image, default network, ...)
  • Use gcloud compute ssh ... --tunnel-through-iap to SSH into both machines:

    • Get their internal network IPs:

      lacop@grpc-latency-us:~$ ip addr
      2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
          inet 10.128.0.57/32 metric 100 scope global dynamic ens4
      
      #####
      
      lacop@grpc-latency-sg:~$ ip addr
      2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
          inet 10.148.0.3/32 metric 100 scope global dynamic ens4
      
    • Run ping to get baseline latency numbers:

      lacop@grpc-latency-us:~$ ping -q -c 100 -i 0.1 10.148.0.3
      PING 10.148.0.3 (10.148.0.3) 56(84) bytes of data.
      
      --- 10.148.0.3 ping statistics ---
      100 packets transmitted, 100 received, 0% packet loss, time 9963ms
      rtt min/avg/max/mdev = 205.520/205.555/206.562/0.101 ms, pipe 3
      
      #####
      
      lacop@grpc-latency-sg:~$ ping -q -c 100 -i 0.1 10.128.0.57
      PING 10.128.0.57 (10.128.0.57) 56(84) bytes of data.
      
      --- 10.128.0.57 ping statistics ---
      100 packets transmitted, 100 received, 0% packet loss, time 9963ms
      rtt min/avg/max/mdev = 205.523/205.548/205.624/0.017 ms, pipe 3
      
    • Setup iperf (sudo apt-get install iperf3) and measure bandwidth:

      lacop@grpc-latency-us:~$ iperf3 -s # If you didn't start the daemon on install
      
      #####
      
      lacop@grpc-latency-sg:~$ iperf3 -c 10.128.0.57 -t 30 -i 5
      Connecting to host 10.128.0.57, port 5201
      [  5] local 10.148.0.3 port 50344 connected to 10.128.0.57 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-5.00   sec  53.2 MBytes  89.3 Mbits/sec    0   6.04 MBytes
      [  5]   5.00-10.00  sec  70.0 MBytes   117 Mbits/sec    0   6.04 MBytes
      [  5]  10.00-15.00  sec  72.5 MBytes   122 Mbits/sec    0   6.04 MBytes
      [  5]  15.00-20.00  sec  70.0 MBytes   117 Mbits/sec    0   6.04 MBytes
      [  5]  20.00-25.00  sec  70.0 MBytes   117 Mbits/sec    0   6.04 MBytes
      [  5]  25.00-30.00  sec  73.8 MBytes   124 Mbits/sec    0   6.04 MBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-30.00  sec   410 MBytes   115 Mbits/sec    0             sender
      [  5]   0.00-30.21  sec   409 MBytes   114 Mbits/sec                  receiver
      

      (Other direction yields similar results.)

    • Install golang and uv on both VMs:

      wget https://go.dev/dl/go1.24.4.linux-amd64.tar.gz
      sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.24.4.linux-amd64.tar.gz && rm go1.24.4.linux-amd64.tar.gz
      export PATH=$PATH:/usr/local/go/bin
      
      sudo apt-get install pipx && pipx install uv
      
    • Clone this repository on both VMs.

Reproduction

We will run gRPC echo service with the following schema:

message EchoRequest {
  bytes payload = 1;
}

message EchoResponse {
  int64 payload_length = 1;
}

service EchoService {
  rpc Echo(EchoRequest) returns (EchoResponse);
}
Updating the schema All the generated files are committed to the repo to make it simpler to run, likely no need to run this. Included only for reference.
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

uv run --with=grpcio==1.73.0,grpcio-tools==1.73.0 \
python3 -m grpc_tools.protoc \
-I. \
--python_out=python/ --pyi_out=python/ --grpc_python_out=python/ \
--go_out=golang/echo/ --go_opt=paths=source_relative --go-grpc_out=golang/echo/ --go-grpc_opt=paths=source_relative \
echo.proto

Start the server on one VM:

lacop@grpc-latency-us:~/grpc-repro/golang$ go run server/server.go
2025/07/08 09:44:29 server listening at [::]:50051

Run the client with small payload size first, expected latency is same as ping RTT (ignoring the initial connection setup):

lacop@grpc-latency-sg:~/grpc-repro/golang$ go run client/client.go --address 10.128.0.57 --size 16384 --repeats 5 --delay 1
Connecting to 10.128.0.57:50051...
Payload size: 16384 bytes, repeats: 5, delay: 1.000000 seconds
Sending payload of size 16384 bytes...
Response received: Payload length = 16384 bytes
        Round trip time: 822.929856ms
Response received: Payload length = 16384 bytes
        Round trip time: 410.677761ms
Response received: Payload length = 16384 bytes
        Round trip time: 410.665896ms
Response received: Payload length = 16384 bytes
        Round trip time: 410.685099ms
Response received: Payload length = 16384 bytes
        Round trip time: 410.678903ms
All requests completed successfully.

Now try larger payload size. Latency will double at some point:

  • With --size 14000 still get 200ms consistently.
  • With --size 15000 get a mix of 200ms and 400ms.
  • With --size 16000 get 400ms consistently.

Things I tried

Sending more frequently or sending dummy requests in background

I noticed that sending requests more often reduces the latency to RTT:

lacop@grpc-latency-sg:~/grpc-repro/golang$ go run client/client.go --address 10.128.0.57 --size 32000 --repeats 5 --delay 1
Connecting to 10.128.0.57:50051...
Payload size: 32000 bytes, repeats: 5, delay: 1.000000 seconds
Sending payload of size 32000 bytes...
Response received: Payload length = 32000 bytes
        Round trip time: 822.89423ms
Response received: Payload length = 32000 bytes
        Round trip time: 410.679966ms
Response received: Payload length = 32000 bytes
        Round trip time: 411.580503ms
Response received: Payload length = 32000 bytes
        Round trip time: 411.614844ms
Response received: Payload length = 32000 bytes
        Round trip time: 411.134367ms
All requests completed successfully.

lacop@grpc-latency-sg:~/grpc-repro/golang$ go run client/client.go --address 10.128.0.57 --size 32000 --repeats 5 --delay 0.1
Connecting to 10.128.0.57:50051...
Payload size: 32000 bytes, repeats: 5, delay: 0.100000 seconds
Sending payload of size 32000 bytes...
Response received: Payload length = 32000 bytes
        Round trip time: 822.8642ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.584479ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.65431ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.610368ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.738869ms
All requests completed successfully.

So does sending dummy requests in the background. This setting is sending 32k payload every one second, but a background goroutine is sending single byte payload every 250ms on the same channel:

lacop@grpc-latency-sg:~/grpc-repro/golang$ go run client/client.go --address 10.128.0.57 --size 32000 --repeats 5 --delay 1 --background-sender-delay 0.25
Connecting to 10.128.0.57:50051...
Payload size: 32000 bytes, repeats: 5, delay: 1.000000 seconds
Background sender started with a delay of 0.250000 seconds...
Sending payload of size 32000 bytes...
Response received: Payload length = 32000 bytes
        Round trip time: 409.942309ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.081078ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.058354ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.003944ms
Response received: Payload length = 32000 bytes
        Round trip time: 207.266047ms
All requests completed successfully.

Similar timing can be achieved when sending even 10x larger payloads as long as there are frequent requests.

All of this suggests the problem is the gRPC / HTTP2 flow control which reduces window size after some idle period, and each new request has to wait for 2*RTT or longer to get the response and window resized.

Using static window sizes

I tried every permutation of the gRPC settings I could find.

On server side:

s := grpc.NewServer(
    // 32MiB static window sizes
    grpc.StaticStreamWindowSize(32*1024*1024),
    grpc.StaticConnWindowSize(32*1024*1024),
    // 32MiB initial window sizes
    grpc.InitialWindowSize(32*1024*1024),
    grpc.InitialConnWindowSize(32*1024*1024),
    // 32MiB max message and buffer sizes
    grpc.MaxRecvMsgSize(32*1024*1024),
    grpc.MaxSendMsgSize(32*1024*1024),
    grpc.ReadBufferSize(32*1024*1024),
    grpc.WriteBufferSize(32*1024*1024),
)

On client side:

conn, err := grpc.NewClient(fmt.Sprintf("%s:%d", *address, *port),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    // 32MiB static window sizes
    grpc.WithStaticConnWindowSize(32*1024*1024),
    grpc.WithStaticStreamWindowSize(32*1024*1024),
    // 32MiB initial window sizes
    grpc.WithInitialConnWindowSize(32*1024*1024),
    grpc.WithInitialWindowSize(32*1024*1024),
    // 32MiB max message and buffer sizes
    grpc.WithReadBufferSize(32*1024*1024),
    grpc.WithWriteBufferSize(32*1024*1024),
    grpc.WithDefaultCallOptions(
        grpc.MaxCallRecvMsgSize(32*1024*1024),
        grpc.MaxCallSendMsgSize(32*1024*1024),
    ),
)

This requires replace google.golang.org/grpc => google.golang.org/grpc v1.75.0-dev to get grpc/grpc-go#8283 for the StaticStreamWindowSize and StaticConnWindowSize options.

None of these (or various combinations of them) had any effect.

Packet capture

Ran sudo tcpdump -i ens4 -w session1.pcap on the client side and captured the call of go run client/client.go --address 10.128.0.57 --size 16000 --repeats 5 --delay 1 --background-sender-delay 0. Filtered down to the one relevant TCP stream, results stored in filtered.pcap file.

10.148.0.3 is the client, 10.128.0.57 is the server.

Initial window messages look fine, they configure window to 32MiB: client to server initial window server to client initial window

However looking at a sample message: sample packet

We see initial POST from client to server (packet #48), but the body is only sent at T+204ms (#54) after getting the ACK (#53).

Python client

Using python client (with grpcio package) with the same golang server behaves the same way:

lacop@grpc-latency-sg:~/grpc-repro/python$ ./client.py 10.128.0.57:50051 12000 5 1 0
Connecting to 10.128.0.57:50051...
Payload size: 12000 bytes, repeats: 5, delay: 1.0 seconds, background delay: 0.0 seconds
Response received: Payload length = 12000 bytes
        Round trip time: 631.498ms
Response received: Payload length = 12000 bytes
        Round trip time: 205.205ms
Response received: Payload length = 12000 bytes
        Round trip time: 205.154ms
Response received: Payload length = 12000 bytes
        Round trip time: 205.291ms
Response received: Payload length = 12000 bytes
        Round trip time: 205.247ms

lacop@grpc-latency-sg:~/grpc-repro/python$ ./client.py 10.128.0.57:50051 16000 5 1 0
Connecting to 10.128.0.57:50051...
Payload size: 16000 bytes, repeats: 5, delay: 1.0 seconds, background delay: 0.0 seconds
Response received: Payload length = 16000 bytes
        Round trip time: 819.888ms
Response received: Payload length = 16000 bytes
        Round trip time: 409.416ms
Response received: Payload length = 16000 bytes
        Round trip time: 409.456ms
Response received: Payload length = 16000 bytes
        Round trip time: 409.979ms
Response received: Payload length = 16000 bytes
        Round trip time: 409.448ms

lacop@grpc-latency-sg:~/grpc-repro/python$ ./client.py 10.128.0.57:50051 16000 5 1 0.25
Connecting to 10.128.0.57:50051...
Payload size: 16000 bytes, repeats: 5, delay: 1.0 seconds, background delay: 0.25 seconds
Response received: Payload length = 16000 bytes
        Round trip time: 831.116ms
Response received: Payload length = 16000 bytes
        Round trip time: 206.303ms
Response received: Payload length = 16000 bytes
        Round trip time: 268.517ms
Response received: Payload length = 16000 bytes
        Round trip time: 206.253ms
Response received: Payload length = 16000 bytes
        Round trip time: 206.335ms

Selfcontained reproduction

The previous report in grpc/grpc-go#5599 went nowhere mostly due to lack of a self-contained reproduction.

I have tried to build in local.go, using https://pkg.go.dev/google.golang.org/[email protected]/benchmark/latency to inject latency into the local connection.

This does not behave exactly the same way, much larger payload sizes are needed to increase the latency, and it is not consistently 2x after certain size:

$ go run local/local.go --latency=0.1 --payload-size=1500000 --delay=2
2025/07/08 14:18:09 Simulating latency of 0.10 seconds
2025/07/08 14:18:09 server listening at 127.0.0.1:56747
2025/07/08 14:18:10 Connecting to server at 127.0.0.1:56747...
2025/07/08 14:18:10 [client] Sending payload of size 1500000 bytes...
2025/07/08 14:18:11 [server] Received: 1500000
2025/07/08 14:18:11 [client] Response received: Payload length = 1500000 bytes, RTT: 1.832299625s
2025/07/08 14:18:13 [client] Sending payload of size 1500000 bytes...
2025/07/08 14:18:14 [server] Received: 1500000
2025/07/08 14:18:14 [client] Response received: Payload length = 1500000 bytes, RTT: 617.209541ms
2025/07/08 14:18:16 [client] Sending payload of size 1500000 bytes...
2025/07/08 14:18:16 [server] Received: 1500000
2025/07/08 14:18:16 [client] Response received: Payload length = 1500000 bytes, RTT: 508.438ms
2025/07/08 14:18:18 [client] Sending payload of size 1500000 bytes...
2025/07/08 14:18:19 [server] Received: 1500000
2025/07/08 14:18:19 [client] Response received: Payload length = 1500000 bytes, RTT: 406.6395ms
2025/07/08 14:18:21 [client] Sending payload of size 1500000 bytes...
2025/07/08 14:18:21 [server] Received: 1500000
2025/07/08 14:18:21 [client] Response received: Payload length = 1500000 bytes, RTT: 303.73075ms
2025/07/08 14:18:23 Server stopped gracefully.

$ go run local/local.go --latency=0.1 --payload-size=2000000 --delay=2
2025/07/08 14:18:54 Simulating latency of 0.10 seconds
2025/07/08 14:18:54 server listening at 127.0.0.1:56751
2025/07/08 14:18:55 Connecting to server at 127.0.0.1:56751...
2025/07/08 14:18:55 [client] Sending payload of size 2000000 bytes...
2025/07/08 14:18:57 [server] Received: 2000000
2025/07/08 14:18:57 [client] Response received: Payload length = 2000000 bytes, RTT: 2.03785975s
2025/07/08 14:18:59 [client] Sending payload of size 2000000 bytes...
2025/07/08 14:19:00 [server] Received: 2000000
2025/07/08 14:19:00 [client] Response received: Payload length = 2000000 bytes, RTT: 812.116083ms
2025/07/08 14:19:02 [client] Sending payload of size 2000000 bytes...
2025/07/08 14:19:02 [server] Received: 2000000
2025/07/08 14:19:02 [client] Response received: Payload length = 2000000 bytes, RTT: 412.359167ms
2025/07/08 14:19:04 [client] Sending payload of size 2000000 bytes...
2025/07/08 14:19:05 [server] Received: 2000000
2025/07/08 14:19:05 [client] Response received: Payload length = 2000000 bytes, RTT: 411.370292ms
2025/07/08 14:19:07 [client] Sending payload of size 2000000 bytes...
2025/07/08 14:19:07 [server] Received: 2000000
2025/07/08 14:19:07 [client] Response received: Payload length = 2000000 bytes, RTT: 413.100041ms
2025/07/08 14:19:09 Server stopped gracefully.

Sending the dummy background messages does not seem to help, the latency is slowly improving without it too even with longer delay.

Not clear to me what is different about this setup compared to the real network latency on two VMs. But at least it reproduces the unexpected result that with 200ms (100ms one way) latency the gRPC latency is 2x that.

About

Bug report notes and reproduction code for large gRPC latency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published