Skip to content

Conversation

@xypp3
Copy link
Contributor

@xypp3 xypp3 commented Dec 16, 2025

Description of the Pull Request (PR):

A short subsection to the "Resource Usage / Limits" section, that describes how to gather more detailed system metrics using cgroups and perf Linux program.

Reason for addition

After struggling to figure out how to get performance counters from Apptainer containers, I am hoping to share my findings. I believe that being able to get software and hardware counter information from Apptainer containers is crucial to deeply understanding and optimizing the running programs.

Testing

I checked that the docs I wrote produced correct ouput using: make html SKIPCLI=1


P.S. if you have any changes you want me to make for clarity and such I am open to feedback. Additionally if you do not see the utility of this section in the docs then I understand your decision to close this PR.

Signed-off-by: Petr Konstantin Milev <petrkmilev@gmail.com>
Copy link
Contributor

@DrDaveD DrDaveD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I think this is helpful.

I'm confused by what you mean by {path}. Please explain that in the text.

In addition, I see a couple of small wording problems. One is those to facts which should perhaps be that fact. The other is you are using gathering which should probably drop the using.

Finally, instances aren't always run in a cgroup, so you could add when possible to Instances are run within a cgroup.

@xypp3
Copy link
Contributor Author

xypp3 commented Dec 18, 2025

Thank you for your feedback. I added a small explaination and example for {path} in the paragraph. I'll post it here if you want to edit it more so that I don't clog up the commit history and once we agree on good wording, I'll post the more finalized commit.


For in-depth container perfomance analysis it would be useful to collect data from Linux's performance counters. Normally this is done by supplying the target PID to perf. This, however does not work for containers out of the box. Apart from PID, perf can also keep track of cgroups. When possible, instances are run within a cgroup, so we can use those that fact to track our container's performance counters. You can do this by using the -G or --cgroup perf option that collects all system events and filters by cgroup name. For cgroups v2, the cgroup name is everything after /sys/fs/cgroup/. When possible instances create a cgroup in {path}/apptainer-{instance id number}, where {path} is the path between /sys/fs/cgroup and your cgroup directory. For example, for the cgroup in /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/user.slice/apptainer-30443.scope the {path} is user.slice/user-1000.slice/user@1000.service/user.slice/. The final perf command would look something like:

.. code::

perf -a -e cache-misses --cgroup "{path}/apptainer-{instance id number}" -- sleep 10

.. note::
Because you are gathering system metrics you will need to reduce your kernel.perf_event_paranoid to less than or equal to 0 or enable CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN


Thank you for taking your time to review my change :)

@DrDaveD
Copy link
Contributor

DrDaveD commented Dec 18, 2025

What I usually do for PR modifications is to do force pushes after comments and I find that works pretty well. Alternatively we can do a squash merge when a PR is merged to collapse to 1 commit. Having said that, here's my comments on the draft you posted in a comment.

We use a substitution mechanism in the docs for some things that is done by putting a keyword surrounded by brackets. My first comment related to that is we have a convention of using {command} in place of the string apptainer, so please replace your apptainer strings with {command}. (Even though the reasons for doing that in the first place aren't so important any more, it is the convention, so let's stick with it).

Second, please change instances in the third sentence with {command} instances to make that more clear.

Finally, I'm a bit uncomfortable with using that same convention also within the text. One potential issue is that we might someday decide to use the same keyword for a substitution. Also, I looked over the rest of the documentation and I didn't see anything similar. My suggestion is to instead use shell variables for example $CGROUPPATH and $INSTANCEID.

…conflict

Signed-off-by: Petr Konstantin Milev <petrkmilev@gmail.com>
@xypp3
Copy link
Contributor Author

xypp3 commented Dec 19, 2025

I should have fixed the phrasing and the variables as per your feedback.

If you have any more feedback feel free to tell me and I'll fix it :)

Copy link
Contributor

@DrDaveD DrDaveD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close, a few more comments.

Please break up the lines into no more than 80 characters since that's the style of the rest of the file. That also makes for easier reviews.

Drop those from use those that fact.

In Usually the {command} instance change the to for. Just after that drop one in from in in.

Signed-off-by: Petr Konstantin Milev <petrkmilev@gmail.com>
@xypp3
Copy link
Contributor Author

xypp3 commented Dec 19, 2025

ok done :)

happy to keep editing if you have more feedback

Signed-off-by: Petr Konstantin Milev <petrkmilev@gmail.com>
@xypp3
Copy link
Contributor Author

xypp3 commented Dec 19, 2025

ok that should be the comments you have left for now.

I'll be afk for the rest of the evening so if you have more feedback, I'll get to you tomorrow :)

Copy link
Contributor

@DrDaveD DrDaveD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@DrDaveD DrDaveD merged commit 3b8d027 into apptainer:main Dec 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants