Skip to content

Monitoring node runs out of RAM and CPU resources with growth of the tables number and data in it #2429

Open
@vponomaryov

Description

@vponomaryov

Installation details
Panel Name: any
Dashboard Name: any
Scylla-Monitoring Version: 4.8.0
Scylla-Version: 2024.2.0~rc3-20241004.89f8638e9e9b
Monitor node instance type: m6i.xlarge

Running a test which creates tables in batches by 125 we observe constant memory and CPU utilization growth:

Image

The same about disk utilization:
Image

Result of the top command:

Tasks: 134 total,   1 running, 133 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.6 us,  0.2 sy,  0.0 ni, 74.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15717.2 total,   1244.2 free,  12641.1 used,   1831.9 buff/cache
MiB Swap:  20480.0 total,  16750.0 free,   3730.0 used.   2393.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                                
   5527 ubuntu    20   0  113.7g  12.1g 570644 S 100.3  79.1   8266:01 prometheus                                                                                                                                                                                             
   9710 scylla    20   0   16.0t  76860  20480 S   1.0   0.5  92:07.13 scylla                                                                                                                                                                                                 
    414 root      20   0 1949744  17860   8192 S   0.3   0.1   4:16.46 containerd                                                                                                                                                                                             
   2977 root      20   0 2134828  32928  14080 S   0.3   0.2   3:13.23 dockerd                                                                                                                                                                                                
   5508 root      20   0 1238716   6408   3456 S   0.3   0.0   1:22.68 containerd-shim                                                                                                                                                                                        
   9718 scylla-+  20   0 1266796  25560  11904 S   0.3   0.2   4:57.53 scylla-manager                                                                                                                                                                                         
  57000 root      20   0 1319948  24704  16768 S   0.3   0.2   0:00.04 snapd                                                                                                                                                                                                  
      1 root      20   0  167584   6480   4048 S   0.0   0.0   0:23.68 systemd                                                                                                                                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.06 kthreadd                                                                                                                                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp     

DB nodes load:
Image

On the DB nodes load screenshot may be observed the situation with batches.
Each tooth is population of the 125 tables.

Argus: scylla-staging/valerii/vp-scale-5000-tables-test#3
CI job: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/valerii/job/vp-scale-5000-tables-test/3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working right

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions