Skip to content

Add cycle counter-based MPI sampler using ARMv9 cntvct_el0 register#100

Draft
Copilot wants to merge 3 commits intonsccsz-toolfrom
copilot/add-cycle-counter-timer
Draft

Add cycle counter-based MPI sampler using ARMv9 cntvct_el0 register#100
Copilot wants to merge 3 commits intonsccsz-toolfrom
copilot/add-cycle-counter-timer

Conversation

Copy link
Contributor

Copilot AI commented Feb 11, 2026

Adds a third sampling mode using the ARM cycle counter register for high-precision timing on ARMv9 systems, with automatic fallback to POSIX timer on non-ARM architectures.

Changes

  • New sampler implementation (src/sampler/mpi_sampler_cycle_counter.cpp)

    • ARM cycle counter via mrs %0, cntvct_el0 / cntfrq_el0
    • Compile-time arch detection (__aarch64__, __ARM_ARCH) + runtime validation
    • Hybrid approach: POSIX timer fires at check interval, cycle counter determines actual sample timing
  • Configuration via PERFLOW_TIMER_METHOD env var

    • auto (default): Use cycle counter on ARM, POSIX elsewhere
    • cycle: Request cycle counter (falls back with warning if unavailable)
    • posix: Force POSIX timer only
  • Build system - Added perflow_mpi_sampler_cycle_counter target in CMakeLists.txt

  • Documentation - New user guide at docs/user-guide/CYCLE_COUNTER_SAMPLER.md

Usage

LD_PRELOAD=lib/libperflow_mpi_sampler_cycle_counter.so \
    PERFLOW_TIMER_METHOD=auto \
    PERFLOW_SAMPLING_FREQ=1000 \
    mpirun -n 4 ./app

Output format identical to existing samplers—compatible with all analysis tools.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Feat] Implement Cycle Counter-Based Timer Using ARMv9 Assembly Instruction</issue_title>
<issue_description>## Description
Currently, the timer-based sampler uses POSIX timers for interrupts. We want to explore an alternative approach using ARMv9's cycle counter register (cntvct_el0) to build a sampler that triggers based on fixed cycle intervals.

Background

  • Current implementation: POSIX timer-based interrupts
  • Proposed addition: Cycle counter-based timer using ARMv9 assembly instruction
  • Both timer approaches should be maintained for different use cases and architectures

Requirements

Task 1: Analyze Current Timer Implementation

  • Study the existing timer-based sampler (mpi_sampler_timer.cpp)
  • Understand how timer interrupts are configured and handled
  • Identify integration points with call stack capture and data collection

Task 2: Implement Cycle Counter-Based Timer

  • Create new sampler implementation (e.g., mpi_sampler_cycle_counter.cpp)
  • Use the provided assembly instruction to read the cycle counter:
    uint64_t read_cycle_counter() {
        uint64_t counter;
        __asm__ volatile("mrs %0, cntvct_el0" : "=r"(counter));
        return counter;
    }
  • Design a mechanism to trigger samples at fixed cycle intervals
  • Ensure proper integration with existing call stack capture and data collection

Task 3: Configuration and Architecture Support

  • Add a configuration option to choose between timer implementations
  • Implement architecture detection to ensure the cycle counter is only used on ARMv9
  • Provide fallback to POSIX timer on non-ARMv9 architectures
  • Allow runtime switching between timer methods

Task 4: Testing and Validation

  • Test both timer implementations on supported platforms
  • Verify that the cycle counter-based sampler produces valid performance data
  • Compare accuracy and overhead between the two approaches
  • Ensure existing tests pass with both timer methods

Task 5: Performance Analysis

  • Measure the overhead of the cycle counter reading
  • Compare precision between cycle-based and POSIX timer-based sampling
  • Analyze any performance differences in data collection
  • Document the advantages and limitations of each approach

Acceptance Criteria

  1. Cycle counter-based sampler successfully compiles and runs on ARMv9
  2. Both timer implementations produce comparable sampling results
  3. The configuration system allows easy switching between timer methods
  4. Architecture detection prevents using the cycle counter on non-ARMv9 systems
  5. Performance analysis documents trade-offs between the two approaches

Deliverables

  1. New mpi_sampler_cycle_counter.cpp implementation
  2. Updated configuration system with timer selection
  3. Architecture detection and fallback mechanism
  4. Test results comparing both timer approaches
  5. Performance analysis report

Dependencies

  • ARMv9 architecture for cycle counter implementation
  • Existing timer-based sampler code (src/sampler/mpi_sampler_timer.cpp)
  • Architecture detection libraries or macros

Notes

  • The cycle counter may not be available on all ARM architectures
  • Consider frequency scaling effects on cycle counter accuracy
  • Ensure proper synchronization in multi-core environments
  • Design for minimal overhead when reading the cycle counter

This feature will provide an alternative high-precision timer option for ARMv9 systems while maintaining compatibility with the existing POSIX timer approach for broader platform support.
</issue_description>

<agent_instructions>Please read the issue carefully. If you have any problems, ask me first. Then finish the tasks.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 11, 2026 03:36
Co-authored-by: yuyangJin <37532408+yuyangJin@users.noreply.github.com>
… constant

Co-authored-by: yuyangJin <37532408+yuyangJin@users.noreply.github.com>
Copilot AI changed the title [WIP] Add cycle counter-based timer using ARMv9 assembly Add cycle counter-based MPI sampler using ARMv9 cntvct_el0 register Feb 11, 2026
Copilot AI requested a review from yuyangJin February 11, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants