Skip to content

Support hugepages allocation in containers #2589

Open
@achimnol

Description

Some HPC applications may want to use hugepages (2 MiB / 1 GiB page sizes) to reduce TLB cache pressure.

In container runtimes, there are several examples to support hugepages:

Some references on hugepages:

We need to explicitly enable hugepages on part of our testing infra and implement the option, like:

backend.ai create -r mem=16G --resource-opt shm=1G,huge-2Mi=512M,huge-1Gi=4G ...

or,

backend.ai create -r mem=16G -r mem.huge-2g=512M -r mem.huge-1g=1G --resource-opt shm=1G ...

The first option (resource-opt) does not prevent the overlapped usage but just allow the hugepage access from containers with limits.

The second option (resource-slot) treats hugepages as an accounted resource that cannot be shared between different containers. For consistency with MIG slots (cuda.mig-5g, ...), I've removed the trailing i (binary suffix) in the resource slot names.

Metadata

Assignees

No one assigned

    Labels

    comp:agentRelated to Agent componentcomp:managerRelated to Manager componenturgency:3Must be finished within a certain time frame.

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions