Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup by Copilot · Pull Request #37915 · tenstorrent/tt-metal

Copilot · 2026-02-15T08:12:03Z

Ticket

N/A

Problem description

Clang Static Analyzer MPI-Checker flagged optin.mpi.MPI-Checker warning at line 254: "Request 'req' has no matching wait."

MPIRequest wraps MPI_Request but lacked a destructor. Per MPI standard, requests must be completed via MPI_Wait(), MPI_Test(), or cancelled/freed before program exit. Without a destructor, incomplete requests leak when MPIRequest objects are destroyed without explicit wait()/cancel() calls.

What's changed

Added ~MPIRequest() destructor that:

Guards against double-cleanup via done_ flag
Guards against post-finalization calls via was_mpi_finalized() check
Cancels and frees incomplete requests using MPI_Cancel() + MPI_Request_free()
Captures error codes without throwing (destructor safety)

Mirrors existing cancel() method pattern. 13 LOC added across 2 files.

MPIRequest::~MPIRequest() {
    if (done_ || was_mpi_finalized()) {
        return;
    }
    [[maybe_unused]] int cancel_err = MPI_Cancel(&req_);
    [[maybe_unused]] int free_err = MPI_Request_free(&req_);
}

Checklist

New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

Original prompt

Fix this Clang Static Analyzer issue:

optin.mpi.MPI-Checker in tt_metal/distributed/multihost/mpi_distributed_context.cpp:254 — Request 'req' has no matching wait.

Verify it's a real issue, apply minimal fix, follow coding standards.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot

Copilot wasn't able to review any files in this pull request.

Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>

blozano-tt · 2026-02-15T18:34:15Z

/codeowners ping Please check for validity. Maybe we can fix the resource leaks observed in CI?

tenstorrent-github-bot · 2026-02-15T18:34:54Z

CodeOwners Group Analysis

This PR requires approval from one member of each of the following groups:

Summary: 1 pending groups, 0 approved groups

Group Information:

⏳ tt_metal/distributed/ (Group) - Members: Aditya Saigal, Allan Liu, John Bauman, Joseph Chu, Nigel Huang | Pending approval
📁 Files owned by this group (2 files)
- tt_metal/distributed/multihost/mpi_distributed_context.cpp
- tt_metal/distributed/multihost/mpi_distributed_context.hpp

Note: At least one approval from each group is sufficient.

tenstorrent-github-bot · 2026-02-15T18:35:00Z

Hi Allan Liu (@aliuTT), Joseph Chu (@cfjchu), this PR Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup by Copilot (@Copilot) needs your approval/review to merge this.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tt_metal/distributed/multihost/mpi_distributed_context.hpp:52

Rule-of-5 violation: Adding a custom destructor without deleting or defining copy/move constructors and assignment operators creates a risk of double-free bugs. Since MPIRequest manages an MPI_Request resource that should not be copied (cancelling/freeing the same MPI request twice is undefined behavior), you should either:

Delete copy/move operations: MPIRequest(const MPIRequest&) = delete; MPIRequest& operator=(const MPIRequest&) = delete;
Or implement proper copy/move semantics (though this is complex with MPI requests)

While the class is currently always used through std::shared_ptr, this protection is not enforced at the type level and could be accidentally violated in future code.

class MPIRequest : public Request {
public:
    explicit MPIRequest(MPI_Request req) : req_(req) {}
    ~MPIRequest() override;

    Status wait() override;
    std::optional<Status> test() override;
    void cancel() override;
    bool active() const override;

private:
    mutable MPI_Request req_{};
    bool done_{};
};

Copilot · 2026-02-15T18:36:03Z

tt_metal/distributed/multihost/mpi_distributed_context.cpp

+MPIRequest::~MPIRequest() {
+    if (done_ || was_mpi_finalized()) {
+        return;
+    }
+    // Cancel and free the incomplete request
+    // Not using MPI_CHECK here to avoid throwing in destructor
+    // Errors are ignored because we cannot throw from a destructor
+    // and the request will be freed regardless
+    [[maybe_unused]] int cancel_err = MPI_Cancel(&req_);
+    [[maybe_unused]] int free_err = MPI_Request_free(&req_);
+}


Potential race condition: The done_ flag is checked without synchronization, but in a multi-threaded MPI environment (this code uses MPI_THREAD_MULTIPLE), another thread could be concurrently calling wait(), test(), or cancel() which also access done_. This could lead to:

Double-free: destructor runs while cancel() is executing

Use-after-free: destructor frees req_ while wait() is using it

Consider adding std::atomic<bool> for done_ or using a mutex to protect both req_ and done_ access across all methods. Note that cancel() also has this issue (line 162-169), but the destructor makes the race more likely since it can be called from any thread when the shared_ptr refcount drops to zero.

Initial plan

ae2503a

Copilot AI assigned Copilot and blozano-tt Feb 15, 2026

Copilot started work on behalf of blozano-tt February 15, 2026 08:12 View session

blozano-tt added the copilot-autofix PRs automatically created by Copilot to fix ClangSA issues label Feb 15, 2026

blozano-tt marked this pull request as ready for review February 15, 2026 08:13

Copilot AI review requested due to automatic review settings February 15, 2026 08:13

Copilot AI reviewed Feb 15, 2026

View reviewed changes

Add destructor to MPIRequest to fix MPI-Checker warning

ae7cba9

Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>

Copilot AI requested review from a team, aliuTT, cfjchu, jbaumanTT, nhuang-tt and tt-asaigal as code owners February 15, 2026 08:14

Copilot AI and others added 2 commits February 15, 2026 08:16

Address code review: document error handling in destructor

8345975

Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>

Improve variable naming in destructor per review feedback

1fbb7de

Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Clang Static Analyzer issue with MPI request~~ Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup Feb 15, 2026

Copilot AI requested a review from blozano-tt February 15, 2026 08:18

Copilot finished work on behalf of blozano-tt February 15, 2026 08:18

blozano-tt requested a review from Copilot February 15, 2026 18:32

Copilot started reviewing on behalf of blozano-tt February 15, 2026 18:32 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup#37915

Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup#37915
Copilot wants to merge 4 commits intomainfrom
copilot/fix-mpi-checker-issue-again

Copilot AI commented Feb 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

blozano-tt commented Feb 15, 2026

Uh oh!

tenstorrent-github-bot commented Feb 15, 2026

Uh oh!

tenstorrent-github-bot commented Feb 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Problem description

What's changed

Checklist

Model tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

blozano-tt commented Feb 15, 2026

Uh oh!

tenstorrent-github-bot commented Feb 15, 2026

CodeOwners Group Analysis

Group Information:

Uh oh!

tenstorrent-github-bot commented Feb 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 15, 2026 •

edited

Loading