Description
Background:
The /proc/diskstats
information is assembled for a container by reading the blkio subsystem in the cgroup, in the proc_diskstats_read
function.
Problem:
However, the blkio cgroup subsystem can be inaccurate. For instance, if a container operates on the disk /dev/sdc
, the host's /proc/diskstats
may show a lot of operations on /dev/sdc
. But the blkio cgroup may not present this information accurately, as blkio.io_serviced_recursive
, blkio.io_service_time_recursive
, and other data related to these operations may be empty.
Proposed Solution:
To address this issue, one approach is to read the host's /proc/diskstats
to assemble the container's /proc/diskstats
.
This can be accomplished by configuring independent disks for each container, such as using /dev/sdc
as the data disk of container A and /dev/sdd
as the data disk of container B. This practice is also widely adopted in the industry to isolate container resources.
To support this approach, lxcfs may need to isolate /proc/partitions
(I don't know why the community doesn't support it so far) and modify the diskstats assembly logic to read the corresponding disk data from the host's /proc/diskstats
according to the disk used by the container and reassemble it.
A better approach may be to use an option (--enable-host-diskstats
) to switch between the old and new solutions, ensuring compatibility.
Furthermore, given the limited accuracy of blkio cgroups in most cases, it may be redundant to use them to assemble /proc/diskstats.
Thank you for your attention!
Activity