Permitted, Portable, Parsable Energy Meter for CPUs and GPUs.
- Native
Intel,NVIDIAandAMDGPU viaxxx-smi. Native CPU support viaperforlikwid - Millisecond latency
- Fortran, C, C++, Python examples provided
- Encapsulated in unix shell for complete freedom of interface and portability.
- Add any custom energy meter simply by executing in shell.
Open p3em.sh, uncomment the right energy meter section for your system (or add your own, see below) to get accumulated energy consumption of your node.
The flexibility and low latency of p3em makes it perfect to profile energy consumption of regions of interest in your of code.
If you are at this point, you likely have already your regions of interest marked by your internal code timer. If not, add one right now
(for a compact one you may look into this one).
We won't give you yet another API.
Use the one you already have, by instrumenting your timer now() funciton
to return energy instead of time.
Both quantities are monotonic, additive in intervals, and
your timer will zero either variable.
A likely workflow:
- add
p3emas submodule in your repository - follow the provided examples (C/Fortran/C++/python) for instrumenting your timer
- run your code without any change
- process the performance measures with your usual pipeline.
For further examples you may look at the timer linked above.
Not what you are looking for right now? Read below to understand and customize p3em's behaviour.
p3em.shparses the sample energy meters to yield a running total of consumed energy.- The measure is intended at node level (i.e. it yields the sum over all selected devices in a node); node fractions are no fun.
- The provided energy meters aim only at the device cores , as a proxy of device size for providing properly normalized performance values, directly comparable among devices. See paper (TBA).
Interested in other metrics with different meanings, e.g: global "at the plug" measures? Include memory/disk? Adding CPU and GPU together? See below!
Simply add your own executable or script, written in your favourite language, printing any running total
of consumed energy, and lowest possible latency, following the example.
Just run p3em.sh and play with it in your shell for visual clarity.