[BUG] OOM (out of memory) recurring every 8-9 days

**Describe the bug**
When the service is killed by the OS due to OOM, the systemd automatically starts it again. 
Then, the memory consumption in the machine steadily increases for 8-9 days until next OOM. 

**Logs**
I've not noticed something too particular in logs. The OOM log appears in system logs (demsg etc...). 
I'll be happy to provide specific grep/messages, otherwise the log is huge. 

**Go-carbon Configuration:**

go-carbon.conf:
```
[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "local"
max-cpu = 4
metric-interval = "1m0s"

[whisper]
data-dir = "/data/graphite/whisper/"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
quotas-file = ""
workers = 4
max-updates-per-second = 0
sparse-create = false
physical-size-factor = 0.75
flock = true
compressed = false
enabled = true
hash-filenames = true
remove-empty-file = false
online-migration = false
online-migration-rate = 5
online-migration-global-scope = ""

[cache]
max-size = 100000000
write-strategy = "max"

[udp]
listen = "0.0.0.0:2003"
enabled = true
buffer-size = 0

[tcp]
listen = "0.0.0.0:2003"
enabled = true
buffer-size = 0
compression = ""

[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = true
buffer-size = 0

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"

[grpc]
listen = "127.0.0.1:7003"
enabled = true

[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/data/graphite/tagging/"
tagdb-timeout = "1s"

[carbonserver]
listen = "0.0.0.0:8080"
enabled = true
query-cache-enabled = true
streaming-query-cache-enabled = false
query-cache-size-mb = 0
find-cache-enabled = true
buckets = 100
max-globs = 1000
fail-on-max-globs = false
empty-result-ok = true
do-not-log-404s = false
metrics-as-counters = false
trigram-index = true
internal-stats-dir = ""
cache-scan = false
max-metrics-globbed = 1000000000
max-metrics-rendered = 100000000
trie-index = false
concurrent-index = false
realtime-index = 0
file-list-cache = ""
file-list-cache-version = 1
max-creates-per-second = 0
no-service-when-index-is-not-ready = false
max-inflight-requests = 0
render-trace-logging-enabled = false
[carbonserver.grpc]
listen = ""
enabled = false
read-timeout = "1m0s"
idle-timeout = "1m0s"
write-timeout = "1m0s"
scan-frequency = "5m0s"
quota-usage-report-frequency = "1m0s"

[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0

[pprof]
listen = "127.0.0.1:7007"
enabled = false

[[logging]]
logger = ""
file = "/var/log/go-carbon/go-carbon.log"
level = "info"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
sample-tick = ""
sample-initial = 0
sample-thereafter = 0

[prometheus]
enabled = false
endpoint = "/metrics"
[prometheus.labels]

[tracing]
enabled = false
jaegerEndpoint = ""
stdout = false
send_timeout = "10s"
```

storage-schemas.conf:
```
[carbon]
pattern = ^carbon\.
retentions = 60:90d

[redash-metrics]
pattern = (.*{something I prefer to not share}.*)
retentions = 1m:7y

[production]
pattern = (^production.*|^secTeam.*)
retentions = 1m:60d,15m:120d,1h:3y

[non-production]
pattern = (^non-production.*|^canary.*)
retentions = 1m:14d,30m:30d,1h:180d

[default]
pattern = .*
retentions = 1m:14d,5m:90d,30m:1y
```

storage-aggregation.conf files:
```
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = max

[someTeam_aggregation]
pattern = ^someTeam.*
xFilesFactor = 0
aggregationMethod = average

[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average
```

 I wonder whether fields `max-size`, `max-metrics-globbed` or `max-metrics-rendered` have to do with the issue. 

**Additional context**
`carbonapi` service also runs in same server. 
We've an identical dev server, but it's `carbonapi` is almost not queried. 
Interestingly we don't have that issue in the dev server, which suggest the issue has to do with queries. 
Here is the memory usage graph for prod (left) and dev (right), side by side, for a period of 22 days:
<img width="2328" alt="image" src="https://github.com/go-graphite/go-carbon/assets/58519130/73c686d6-ec98-468a-9081-2e432572f3a4">
In addition, the systemd status also indicates considerable different, although the prod service is active for only about 1.5 day.
Dev:
```
$ sudo systemctl status go-carbon.service | grep -E 'Memory|Active'
     Active: active (running) since Mon 2023-12-18 10:07:53 UTC; 2 weeks 6 days ago
     Memory: 26.5G
```
Prod:
```
$ sudo systemctl status go-carbon.service | grep -E 'Memory|Active'
     Active: active (running) since Sat 2024-01-06 05:18:57 UTC; 1 day 8h ago
     Memory: 42.0G
```
Although that shall make sense since there are almost zero queries from the dev server. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OOM (out of memory) recurring every 8-9 days #579

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] OOM (out of memory) recurring every 8-9 days #579

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions