Skip to content

Energy profiling tools: Core infrastructure with timing tool and export capabilities#299

Open
ethan-puyaubreau wants to merge 9 commits intokokkos:developfrom
ethan-puyaubreau:feature/energy-profiler-infrastructure
Open

Energy profiling tools: Core infrastructure with timing tool and export capabilities#299
ethan-puyaubreau wants to merge 9 commits intokokkos:developfrom
ethan-puyaubreau:feature/energy-profiler-infrastructure

Conversation

@ethan-puyaubreau
Copy link
Copy Markdown
Contributor

@ethan-puyaubreau ethan-puyaubreau commented Aug 15, 2025

This PR introduces the foundational infrastructure for energy profiling tools in Kokkos:

Features:

  • Timing Infrastructure: Timing system with TimingInfo structure and region tracking
  • State Management: Thread-safe EnergyProfilerState singleton for managing active/completed timing regions
  • Region Types: Support for ParallelFor, ParallelReduce, ParallelScan, DeepCopy, and UserRegion profiling
  • CSV Export: timing_export module for exporting timing data to CSV files with summary statistics

Implementation:

  • Main Library: kp_energy_profiler - Core profiling functionality
  • Utilities: timing_utils.hpp/cpp - State management and helper functions
  • Export: timing_export.hpp/cpp - CSV export and summary generation
  • No External Dependencies: Self-contained implementation using only standard C++ libraries

Architecture:
This infrastructure provides a clean separation between timing collection, state management, and data export, making it easy for future energy monitoring providers to integrate their functionality. The design supports both synchronous and asynchronous profiling patterns (if combined with #300).

Copy link
Copy Markdown
Contributor

@JBludau JBludau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
Comment thread profiling/energy-profiler/common/filename_prefix.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/CMakeLists.txt Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
@ethan-puyaubreau
Copy link
Copy Markdown
Contributor Author

@dalg24 @masterleinad Hello! I would need some review on this PR, being the baseline blocks needed the daemon system and energy measurement tools (that would also need review #300)

Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.hpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/unit-tests/csv_export_test.cpp Outdated
Copy link
Copy Markdown
Contributor

@JBludau JBludau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are getting there

Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
Comment thread profiling/energy-profiler/common/filename_prefix.cpp Outdated
Comment thread profiling/energy-profiler/common/filename_prefix.hpp Outdated
Comment thread profiling/energy-profiler/common/filename_prefix.hpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.hpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment on lines +92 to +94
// Stack-based timing for robust region/kernel tracking
void start_region(const std::string& name, RegionType type, uint64_t id = 0);
void end_region();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine but noting that if you can't correlate start and end of a region via some identifier then it won't be threadsafe.

Comment thread tests/energy-profiler/test_csv_export.cpp Outdated
Comment thread tests/energy-profiler/test_csv_export.cpp Outdated
Comment thread tests/energy-profiler/test_timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/error_handling.hpp Outdated
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from a9c54ce to 5d9082f Compare August 26, 2025 18:45
# - Tool interface definitions
# - Basic kernel timer tool

add_subdirectory(kokkos-tools) No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This red circle means "no newline at end of file".
Please fix it by adding a newline character.
Same comment everywhere.

Comment on lines +20 to +23
namespace KokkosTools {
namespace EnergyProfiler {

std::string generate_prefix() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace KokkosTools {
namespace EnergyProfiler {
std::string generate_prefix() {
std::string KokkosTools::EnergyProfiler::generate_prefix() {

Comment on lines +21 to +22
namespace KokkosTools {
namespace EnergyProfiler {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace KokkosTools {
namespace EnergyProfiler {
namespace KokkosTools::EnergyProfiler {

Comment thread profiling/energy-profiler/kokkos-tools/kp_energy_kernel_timer.cpp Outdated
Comment thread tests/energy-profiler/test_csv_export.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
Comment thread profiling/energy-profiler/common/timer_system.cpp Outdated
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch 2 times, most recently from 6fb522b to f2f0606 Compare August 29, 2025 00:01
Comment thread profiling/energy-profiler/timing/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/timing/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/timing/kp_energy_kernel_timer.cpp Outdated
Comment thread profiling/energy-profiler/timing_utils.hpp Outdated
Comment thread profiling/energy-profiler/kp_energy_profiler.cpp
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from 5fbccdf to 2629fad Compare August 29, 2025 14:44
Comment thread profiling/energy-profiler/kp_energy_profiler.cpp Outdated
Comment thread profiling/energy-profiler/kp_energy_profiler.cpp Outdated
@ethan-puyaubreau ethan-puyaubreau changed the title Energy profiling tools: baseline infrastructure and timer system Energy profiling tools: Core infrastructure with timing tool and export capabilities Aug 29, 2025
std::lock_guard<std::mutex> lock(state.get_mutex());
state.get_active_regions().push_back(region);
} catch (const std::exception& e) {
std::cerr << "Error in start_region: " << e.what() << std::endl;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ... afaik that is not how exceptions are supposed to be used ... especially since you are not handling the exception but just printing and not rethrowing. So this would not lead to an abort.

if any of the functions in the try block throw, you are silencing that but also not restoring a valid state so that the program can continue. I would remove the try-catch

Comment thread profiling/energy-profiler/kp_energy_profiler.cpp Outdated
TimingInfo region;
region.name = name;
region.type = type;
region.start_time = std::chrono::high_resolution_clock::now();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when starting, you should take the time at the end and then use it to update the last region in the dqueue. This way you don't measure the construction or the lock

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am writing this: It of course does not hold for nested regions ... but that is too much for now

@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from d1bae2d to 049a99f Compare August 29, 2025 17:46
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from 049a99f to 6fcf78e Compare August 29, 2025 18:25
@maartenarnst
Copy link
Copy Markdown
Contributor

A recent reference with related work on rocm:

@vlkale
Copy link
Copy Markdown
Contributor

vlkale commented Feb 17, 2026

A recent reference with related work on rocm:

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants