Skip to content

net/tcp_timer: fix tcp RTO abnormally large after retransmission occurs #15118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

Meissi-jian
Copy link
Contributor

when tcp retransmission only double conn->timer in Karn(tcp_timer.c L602), after retransmission in Jacobson M is is now rtt test will become a negative value.

  signed char m;
  m = conn->rto - conn->timer; // M is now rtt test

  /* This is taken directly from VJs original code in his paper */

  m = m - (conn->sa >> 3);
  conn->sa += m;              //conn->sa is a negative value
  if (m < 0)
	{
	  m = -m;
	}

  m = m - (conn->sv >> 2);
  conn->sv += m;
  conn->rto = (conn->sa >> 3) + conn->sv; //rto

For example,we lost one ack packet, we will set conn->timer = 6 by backoff,conn->rto still 3. After retransmission we will Do RTT estimation, then will get

conn->sa = 253
conn->rto = 46

Then if any packets lost it will wait for a long time before triggering a retransmission.

Summary

  • The issue of abnormal increase in RTO (retransmission timeout) after TCP retransmission has been fixed.
  • In the Karn algorithm, when TCP retransmits, only conn->timer is doubled, which may cause the estimated RTT value to become negative.
  • The update logic of conn->rto and conn->timer has been corrected to ensure that they remain consistent after retransmission.
  • This issue was reproduced by discarding ACK packets in tcp_input and the fix was verified by debugging conn->rto after the Jacobson algorithm.

Impact

Impacts the implementation of tcp timer, fixing tcp timer abnoramlly large after do backoff.

Testing

Test method on reproduce.
We can reproduce this issue by adding drop ACK packets in tcp_input, and debug conn->rto after Jacobson.

diff --git a/net/tcp/tcp_input.c b/net/tcp/tcp_input.c
index 70b99f345b..b812ef3ff5 100644
--- a/net/tcp/tcp_input.c
+++ b/net/tcp/tcp_input.c
@@ -61,6 +61,7 @@
 #include "devif/devif.h"
 #include "utils/utils.h"
 #include "tcp/tcp.h"
@@ -742,6 +743,13 @@ static void tcp_input(FAR struct net_driver_s *dev, uint8_t domain,
   /* Demultiplex this segment. First check any active connections. */
 
   conn = tcp_active(dev, tcp);
+  if (((tcp->flags & TCP_CTL) == (TCP_SYN | TCP_ACK)) && conn->drop_count < 2)
+    {
+      conn->drop_count++;
+      printf("WARNING: ************* Drop first sync ack packet \n");
+      goto drop;
+    }
+
   if (conn)
     {
       /* We found an active connection.. Check for the subsequent SYN
@@ -1135,6 +1143,8 @@ found:
           m = m - (conn->sv >> 2);
           conn->sv += m;
           conn->rto = (conn->sa >> 3) + conn->sv;
+          printf("WARNING:### conn=%p conn->rto=%d m=%d conn->timer=%d conn->sv=%d conn->sa=%d ###\n",
+                 conn, (int)conn->rto, (int)m,(int)conn->timer,(int)conn->sv,(int)conn->sa);
         }

test before patch with debug version:

nsh> iperf -c 10.192.41.28 -t 5 -i 1
     IP: 192.168.31.92
 mode=tcp-client sip=192.168.31.92:5001,dip=10.192.41.28:5001, interval=1, time=5
*****tcp_timer 0
WARNING: ************* Drop first sync ack packet 
WARNING: ************* Drop first sync ack packet 
*****tcp_timer 3
WARNING:### conn=0x3fc8d8a0 conn->rto=46 m=-1 conn->timer=6 conn->sv=15 conn->sa=253 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=70 m=28 conn->timer=46 conn->sv=43 conn->sa=222 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=84 m=17 conn->timer=70 conn->sv=60 conn->sa=195 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=90 m=9 conn->timer=84 conn->sv=69 conn->sa=171 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=91 m=4 conn->timer=90 conn->sv=73 conn->sa=150 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=89 m=0 conn->timer=91 conn->sv=73 conn->sa=132 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=85 m=-2 conn->timer=89 conn->sv=71 conn->sa=116 ###
WARNING:### conn=0x3fc8d8a0 conn->rto=80 m=-3 conn->timer=85 conn->sv=68 conn->sa=102 ###

test by the patch:

nsh> iperf -c 10.192.41.28 -t 10 -i 1
     IP: 192.168.31.92
 mode=tcp-client sip=192.168.31.92:5001,dip=10.192.41.28:5001, interval=1, time=10
*****tcp_timer 0
WARNING: ************* Drop first sync ack packet 
WARNING: ************* Drop first sync ack packet 
*****tcp_timer 3
WARNING:### conn=0x3fc8e010 conn->rto=12 m=-4 conn->timer=6 conn->sv=12 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=9 m=-3 conn->timer=12 conn->sv=9 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=7 m=-2 conn->timer=9 conn->sv=7 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=6 m=-1 conn->timer=7 conn->sv=6 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=5 m=-1 conn->timer=6 conn->sv=5 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=4 m=-1 conn->timer=5 conn->sv=4 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=-1 conn->timer=4 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###
WARNING:### conn=0x3fc8e010 conn->rto=3 m=0 conn->timer=3 conn->sv=3 conn->sa=0 ###

when tcp retransmission only double conn->timer in Karn(tcp_timer.c L602), after retransmission in Jacobson
M is is now rtt test will become a negative value.
```
  signed char m;
  m = conn->rto - conn->timer; // M is now rtt test

  /* This is taken directly from VJs original code in his paper */

  m = m - (conn->sa >> 3);
  conn->sa += m;              //conn->sa is a negative value
  if (m < 0)
	{
	  m = -m;
	}

  m = m - (conn->sv >> 2);
  conn->sv += m;
  conn->rto = (conn->sa >> 3) + conn->sv; //rto
```
For example,we lost one ack packet, we will set conn->timer = 6 by backoff,conn->rto still 3.
After retransmission we will Do RTT estimation, then will get
```
conn->sa = 253
conn->rto = 46
```
Then if any packets lost it will wait for a long time before triggering a retransmission.

Test method.
We can reproduce this issue by adding drop ACK packets in tcp_input, and debug conn->rto after Jacobson.

Signed-off-by: meijian <[email protected]>
@github-actions github-actions bot added Area: Networking Effects networking subsystem Size: XS The size of the change in this PR is very small labels Dec 10, 2024
@xiaoxiang781216 xiaoxiang781216 merged commit d178650 into apache:master Dec 10, 2024
27 checks passed
@xiaoxiang781216 xiaoxiang781216 linked an issue Dec 10, 2024 that may be closed by this pull request
1 task
@Meissi-jian Meissi-jian deleted the tcp_rto branch December 11, 2024 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Networking Effects networking subsystem Size: XS The size of the change in this PR is very small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TCP retransmition is not correct
4 participants