Description
Hello,
We are using Incus + lxcfs in our setup and we’ve come to an issue with memory consumption of the lxcfs process while reading aggressively from /sys/devices/system/cpu/online
.
Versions
We’ve tested with different versions of both lxcfs and libfuse3 and the issue seems to be present even with the latest stable versions:
- lxcfs 6.0.3
- libfuse3 both latest (3.16.2) and CentOS 9 default (3.10.2)
Setup
We are running an Incus container on a node with 56 CPU cores. It seems reproducible even with one single container. In our setup the container itself is restricted in CPU usage using limits.cpu.allowance: 1200ms/60ms (although not very relevant it is much faster to see the effect if the container can use more CPU).
Reproducer
To reproduce the issue, compile the following C code that starts a number of threads inside the container, each opening, reading from and then closing а file:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <signal.h>
#include <stdbool.h>
//compile with:
//gcc -pthread -o fuse-stress-poc fuse-stress-poc.c
#define DEFAULT_FILE_PATH "/sys/devices/system/cpu/online"
volatile sig_atomic_t run = 1;
void handle_sigint(int sig) {
run = 0;
}
void* stress_work(void* arg) {
const char* file_path = (const char*)arg;
int fd;
char buffer[256];
ssize_t bytes_read;
while (run) {
fd = open(file_path, O_RDONLY);
if (fd != -1) {
bytes_read = read(fd, buffer, sizeof(buffer) - 1);
close(fd);
}
}
return NULL;
}
int main(int argc, char* argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <num_threads> <file_path>\n", argv[0]);
return EXIT_FAILURE;
}
int num_threads = atoi(argv[1]);
if (num_threads <= 0) {
fprintf(stderr, "Number of threads must be positive.\n");
return EXIT_FAILURE;
}
const char* file_path = argv[2];
if (access(file_path, R_OK) != 0) {
fprintf(stderr, "File '%s' does not exist or is not accessible.\n", file_path);
return EXIT_FAILURE;
}
signal(SIGINT, handle_sigint);
pthread_t* threads = malloc(num_threads * sizeof(pthread_t));
if (threads == NULL) {
perror("malloc");
return EXIT_FAILURE;
}
for (int i = 0; i < num_threads; i++) {
pthread_create(&threads[i], NULL, stress_work, (void*)file_path);
}
for (int i = 0; i < num_threads; i++) {
pthread_join(threads[i], NULL);
}
free(threads);
return 0;
}
Run it with the following command in a container:
./fuse-stress-poc 400 /sys/devices/system/cpu/online
Monitor the RSS memory usage of lxcfs. We can see it go over 1GB in about a minute. Then if we just stop/kill the process inside the container the RSS memory usage stays around the same value instead of dropping back to about 2MB.
So far we’ve tried the following:
- Reading
/proc/uptime
and/proc/cpuinfo
to see if we can see a leak with these files but we could not reproduce the issue, RSS usage stays low (around 2MB) while reading these files. - We’ve attempted to find which commit introduces this behavior and as far as we saw, weirdly enough it seems to be the one enabling direct_io: c2b4b50
We would appreciate your assistance in verifying if this issue is reproducible on your end, so we can collaborate effectively to identify and implement a solution.
While investigating other issues related to hanging lxcfs file operations, we inadvertently discovered this situation. As a result, we developed a stress test. Although we were unable to reproduce the hang, we identified what appears to be a memory leak.
Apologies for any confusion caused by the opening, resolving, and creating a new issue. I accidentally clicked the wrong option while typing.
Regards,
Deyan
Activity