Skip to content

Unable to change ulimit when running on K8S environment #6709

Open
@22fortisetliber

Description

@22fortisetliber

Describe the bug
When running Cortex in a Kubernetes environment, the ulimit config for cortex process could not be changed.

To Reproduce

  1. On Node - Setting fs.file-max on /etc/sysctl.conf : 10485760
  2. On Node - Setting on /etc/security/limits.conf
* soft nofile 10485760 
* hard nofile 10485760
  1. Running Cortex version 1.17.0 with Helm
  2. Config ulimit for pod (I did try to config with initContainer / SecurityContext)

Inside Pod:

# ulimit -Ha
core file size (blocks)         (-c) unlimited
data seg size (kb)              (-d) unlimited
scheduling priority             (-e) 0
file size (blocks)              (-f) unlimited
pending signals                 (-i) 256726
max locked memory (kb)          (-l) unlimited
max memory size (kb)            (-m) unlimited
open files                      (-n) 1048576
POSIX message queues (bytes)    (-q) 819200
real-time priority              (-r) 0
stack size (kb)                 (-s) unlimited
cpu time (seconds)              (-t) unlimited
max user processes              (-u) unlimited
virtual memory (kb)             (-v) unlimited
file locks                      (-x) unlimited 
# ps | grep cortex
# cat /proc/1/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            65535                65535                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       256726               256726               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

When Cortex reaches approximately 65,000 open files, it crashes with errors indicating "too many open files" in the logs. Here the log of Store Gateway:

caller=bucket_stores.go:161 level=warn msg="failed to synchronize TSDB blocks" err="445 errors: failed to synchronize TSDB blocks for user user1: read dir: open /data/tsdb-sync/user1: too many open files; failed to synchronize TSDB blocks for user user2: read dir: open /data/tsdb-sync/user2: too many open files; failed to synchronize TSDB blocks for user user3: read dir: open /data/tsdb-sync/user3: too many open files .....

Expected behavior
The Cortex process running in Kubernetes pods should properly inherit and apply the increased file descriptor limits

Environment:

  • Infrastructure: Kubernetes (v1.29.5) with Containerd (v1.7.16)
  • Deployment tool: Helm
  • Server OS: Ubuntu 22.04.3 LTS

Additional Context

Metadata

Metadata

Assignees

Labels

type/productionIssues related to the production use of Cortex, inc. configuration, alerting and operating.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions