Skip to content

[CI] issue: 4684071Terminate Valgrind gracefully#1141

Open
NirWolfer wants to merge 1 commit intoMellanox:masterfrom
NirWolfer:vg_mem_summary_bugfix
Open

[CI] issue: 4684071Terminate Valgrind gracefully#1141
NirWolfer wants to merge 1 commit intoMellanox:masterfrom
NirWolfer:vg_mem_summary_bugfix

Conversation

@NirWolfer
Copy link
Collaborator

Description

Valgrind today is being terminated with -9 which prevents it from printing all the summaries at the end which are needed for debuging.

What

Change -9 signal to -SIGINT af the first try of shutting down Valgrind

Why ?

HPCINFRA-3989

How ?

It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@NirWolfer NirWolfer marked this pull request as ready for review October 22, 2025 14:19
@NirWolfer NirWolfer closed this Oct 22, 2025
@NirWolfer NirWolfer reopened this Oct 22, 2025
@NirWolfer
Copy link
Collaborator Author

bot:retest

@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from fcc5bdf to 50d4440 Compare October 23, 2025 06:59
@NirWolfer NirWolfer changed the title [CI] issue: HPCINFRA-3989 Terminate Valgrind gracefully [CI] issue: 4684071Terminate Valgrind gracefully Oct 23, 2025
@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from 50d4440 to ec6f093 Compare October 23, 2025 07:47
@NirWolfer
Copy link
Collaborator Author

bot:retest

Valgrind today is being terminated with -9 which prevents it from
printing all the summaries at the end which are needed for debuging.

Change -9 signal to -SIGINT af the first try of shutting down Valgrind

Signed-off-by: NirWolfer <nwolfer@nvidia.com>
@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from ec6f093 to b2fd3c5 Compare December 31, 2025 16:01
@greptile-apps
Copy link

greptile-apps bot commented Dec 31, 2025

Greptile Summary

Changed Valgrind process termination signal from SIGKILL (-9) to SIGINT to allow graceful shutdown and enable Valgrind to print debugging summaries.

  • Replaced pkill -9 with pkill -SIGINT on line 98 of contrib/jenkins_tests/vg.sh
  • Existing fallback mechanism ensures reliability: escalates to SIGTERM after 10 seconds, then SIGKILL after 3 more seconds if needed
  • Resolves HPCINFRA-3989 by preserving Valgrind output for debugging

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a single-line fix that improves debugging capability without compromising functionality. The existing multi-level fallback mechanism (SIGINT → SIGTERM → SIGKILL) ensures processes are terminated even if graceful shutdown fails. The change aligns with best practices for process termination and has no impact on test logic or correctness.
  • No files require special attention

Important Files Changed

Filename Overview
contrib/jenkins_tests/vg.sh Changed Valgrind termination from SIGKILL to SIGINT to allow graceful shutdown with fallback mechanism already in place

Sequence Diagram

sequenceDiagram
    participant Script as CI Script
    participant Process as Valgrind/sockperf
    participant System as Operating System
    
    Script->>Process: Start Valgrind with sockperf server
    Script->>Process: Start Valgrind with sockperf client
    Script->>Process: Run tests (10 seconds)
    
    alt Process still running
        Script->>System: pkill -SIGINT (graceful)
        System->>Process: Send SIGINT signal
        Process->>Process: Cleanup & print summaries
        Process-->>System: Exit gracefully
        Script->>Script: Wait 10 seconds
        
        alt SIGINT didn't work
            Script->>System: pkill -SIGTERM
            System->>Process: Send SIGTERM signal
            Process-->>System: Force exit
            Script->>Script: Wait 3 seconds
            
            alt SIGTERM didn't work
                Script->>System: pkill -SIGKILL
                System->>Process: Send SIGKILL signal
                Process--xSystem: Immediate termination
            end
        end
    end
    
    Script->>Script: Analyze Valgrind logs
    Script->>Script: Report results
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant