Skip to content

Request to address zombie/defunct processes by modifying container main process #517

@ghost

Description

Issue with sleep infinity in awesome-akash repo

Problem Description

In the awesome-akash repository, several containers are configured to run sleep infinity as their main process. This configuration can lead to unintended behavior, particularly the accumulation of zombie or defunct processes. This occurs because sleep infinity does not handle child processes properly, causing them to remain in a defunct state.

Here's an example where sleep infinity is used:

awesome-akash$ git grep -i sleep |grep infin
Ethereum_2.0/main.sh:sleep infinity
Falcon-7B/Dockerfile:CMD python3 falcon7b.py && sleep infinity
Sentinel-dVPN-node/main.sh:        sleep infinity
bitcoin/main.sh:sleep infinity
cryptodredge-c11/entrypoint.sh:sleep infinity
semantra/deploy.yaml:        sleep infinity ;'
softether-vpn/launch:sleep infinity

Impact

It is possible for certain deployments to initiate subprocesses that do not properly implement the wait() function. This improper handling can result in the formation of <defunct> processes, also known as “zombie” processes. Zombie processes occur when a subprocess completes its task but still remains in the system’s process table due to the parent process not reading its exit status. Over time, if not managed correctly, these zombie processes have the potential to accumulate and occupy all available process slots in the system, leading to resource exhaustion.

These zombie processes aren’t too harmful much (they don’t occupy cpu/mem / nor impact cgroup cpu/mem limits) unless they take up the whole process table space so no new processes will be able to spawn, i.e. the limit:

$ cat /proc/sys/kernel/pid_max
4194304

If sleep infinity is set as the main container process (PID 1), it fails to properly reap child processes, leading to their accumulation as zombie processes. Containers with such configurations may be terminated by the zombie killer cron job, implemented by some providers to handle these defunct processes.

Proposed Solutions

  1. Use a Proper Primary Process: Containers should use their main application or a robust init system as the primary process. For example, running /usr/sbin/sshd -D is preferable to sleep infinity.
  2. Process Management in Containers: Consider using dedicated init systems like tini, dumb-init, or runit, which are designed to handle child processes correctly:

Additional Resources

Request for Action

I suggest reviewing the current use of sleep infinity across the repository and discussing potential alternatives for better process management. This change could improve the stability and performance of deployments using this repository.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions