Description
We've experienced issues with the snowbridge errors due to connection reset by peer
also 404 responses on the http target. The problem is due to the number of open connections on the pods.
Source = kinesis
Target = http
The http target in our case is Google Tag Manager Server-side.
Generate sufficient constant load onto the kinesis stream of approximately 4K records per minute. After a few hours you should start to get either connection reset by peer from aws or 404's from GTM.
Check the number of open connections on the box.
$ netstat -atn | grep TIME_WAIT | wc -l
45269
We're currently using version 2.3.0. The main culprit appears to be following line in the http target.
Applying the following patch drastically reduces the number of open connections:
diff --git a/pkg/target/http.go b/pkg/target/http.go
index 30c293e..6c3cfca 100644
--- a/pkg/target/http.go
+++ b/pkg/target/http.go
@@ -11,6 +11,7 @@ import (
"bytes"
"encoding/json"
"fmt"
+ "io"
"net/http"
"net/url"
"time"
@@ -214,7 +215,8 @@ func (ht *HTTPTarget) Write(messages []*models.Message) (*models.TargetWriteResu
failed = append(failed, msg)
continue
}
- defer resp.Body.Close()
+ io.Copy(io.Discard, resp.Body)
+ resp.Body.Close()
if resp.StatusCode == http.StatusOK {
sent = append(sent, msg)
if msg.AckFunc != nil { // Ack successful messages
/ $ netstat -ant | grep TIME_WAIT | grep 172.20 | wc -l
3799
In staging we have also tested using the latest version 3.0.0 and have experienced similar issues there also.