Support hugepages allocation in containers

Some HPC applications may want to use hugepages (2 MiB / 1 GiB page sizes) to reduce TLB cache pressure.

In container runtimes, there are several examples to support hugepages:

- https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
  Some references on hugepages:

- https://repost.aws/knowledge-center/configure-hugepages-ec2-linux-instance
- https://help.ubuntu.com/community/KVM%20-%20Using%20Hugepages
- https://access.redhat.com/solutions/36741
- https://docs.oracle.com/database/121/UNXAR/appi_vlm.htm#UNXAR391
  We need to explicitly enable hugepages on part of our testing infra and implement the option, like:

```shell
backend.ai create -r mem=16G --resource-opt shm=1G,huge-2Mi=512M,huge-1Gi=4G ...
```

or,

```shell
backend.ai create -r mem=16G -r mem.huge-2g=512M -r mem.huge-1g=1G --resource-opt shm=1G ...
```

The first option (resource-opt) does not prevent the overlapped usage but just allow the hugepage access from containers with limits.

The second option (resource-slot) treats hugepages as an accounted resource that cannot be shared between different containers. For consistency with MIG slots (`cuda.mig-5g`, ...), I've removed the trailing `i` (binary suffix) in the resource slot names.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support hugepages allocation in containers #2589

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support hugepages allocation in containers #2589

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions