Skip to content

Number of open connections resulting in network errors #384

Open
@rob-ellison-jet

Description

We've experienced issues with the snowbridge errors due to connection reset by peer also 404 responses on the http target. The problem is due to the number of open connections on the pods.

Source = kinesis
Target = http

The http target in our case is Google Tag Manager Server-side.

Generate sufficient constant load onto the kinesis stream of approximately 4K records per minute. After a few hours you should start to get either connection reset by peer from aws or 404's from GTM.

Check the number of open connections on the box.

$ netstat -atn | grep TIME_WAIT | wc -l
45269

We're currently using version 2.3.0. The main culprit appears to be following line in the http target.

Applying the following patch drastically reduces the number of open connections:

diff --git a/pkg/target/http.go b/pkg/target/http.go
index 30c293e..6c3cfca 100644
--- a/pkg/target/http.go
+++ b/pkg/target/http.go
@@ -11,6 +11,7 @@ import (
 	"bytes"
 	"encoding/json"
 	"fmt"
+	"io"
 	"net/http"
 	"net/url"
 	"time"
@@ -214,7 +215,8 @@ func (ht *HTTPTarget) Write(messages []*models.Message) (*models.TargetWriteResu
 			failed = append(failed, msg)
 			continue
 		}
-		defer resp.Body.Close()
+		io.Copy(io.Discard, resp.Body)
+		resp.Body.Close()
 		if resp.StatusCode == http.StatusOK {
 			sent = append(sent, msg)
 			if msg.AckFunc != nil { // Ack successful messages
/ $ netstat -ant | grep TIME_WAIT | grep 172.20 | wc -l
3799

In staging we have also tested using the latest version 3.0.0 and have experienced similar issues there also.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions