Skip to content

Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup#37915

Open
Copilot wants to merge 4 commits intomainfrom
copilot/fix-mpi-checker-issue-again
Open

Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup#37915
Copilot wants to merge 4 commits intomainfrom
copilot/fix-mpi-checker-issue-again

Conversation

Copy link
Contributor

Copilot AI commented Feb 15, 2026

Ticket

N/A

Problem description

Clang Static Analyzer MPI-Checker flagged optin.mpi.MPI-Checker warning at line 254: "Request 'req' has no matching wait."

MPIRequest wraps MPI_Request but lacked a destructor. Per MPI standard, requests must be completed via MPI_Wait(), MPI_Test(), or cancelled/freed before program exit. Without a destructor, incomplete requests leak when MPIRequest objects are destroyed without explicit wait()/cancel() calls.

What's changed

Added ~MPIRequest() destructor that:

  • Guards against double-cleanup via done_ flag
  • Guards against post-finalization calls via was_mpi_finalized() check
  • Cancels and frees incomplete requests using MPI_Cancel() + MPI_Request_free()
  • Captures error codes without throwing (destructor safety)

Mirrors existing cancel() method pattern. 13 LOC added across 2 files.

MPIRequest::~MPIRequest() {
    if (done_ || was_mpi_finalized()) {
        return;
    }
    [[maybe_unused]] int cancel_err = MPI_Cancel(&req_);
    [[maybe_unused]] int free_err = MPI_Request_free(&req_);
}

Checklist

  • All post-commit tests
  • Blackhole Post commit
  • cpp-unit-tests
  • New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

Original prompt

Fix this Clang Static Analyzer issue:

  • optin.mpi.MPI-Checker in tt_metal/distributed/multihost/mpi_distributed_context.cpp:254 — Request 'req' has no matching wait.

Verify it's a real issue, apply minimal fix, follow coding standards.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@blozano-tt blozano-tt added the copilot-autofix PRs automatically created by Copilot to fix ClangSA issues label Feb 15, 2026
@blozano-tt blozano-tt marked this pull request as ready for review February 15, 2026 08:13
Copilot AI review requested due to automatic review settings February 15, 2026 08:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>
Copilot AI and others added 2 commits February 15, 2026 08:16
Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>
Co-authored-by: blozano-tt <181790211+blozano-tt@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Clang Static Analyzer issue with MPI request Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup Feb 15, 2026
Copilot AI requested a review from blozano-tt February 15, 2026 08:18
@blozano-tt blozano-tt requested a review from Copilot February 15, 2026 18:32
@blozano-tt
Copy link
Contributor

/codeowners ping Please check for validity. Maybe we can fix the resource leaks observed in CI?

@tenstorrent-github-bot
Copy link

CodeOwners Group Analysis

This PR requires approval from one member of each of the following groups:

Summary: 1 pending groups, 0 approved groups

Group Information:

Note: At least one approval from each group is sufficient.

@tenstorrent-github-bot
Copy link

Hi Allan Liu (@aliuTT), Joseph Chu (@cfjchu), this PR Fix MPI-Checker warning: Add MPIRequest destructor for proper resource cleanup by Copilot (@Copilot) needs your approval/review to merge this.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tt_metal/distributed/multihost/mpi_distributed_context.hpp:52

  • Rule-of-5 violation: Adding a custom destructor without deleting or defining copy/move constructors and assignment operators creates a risk of double-free bugs. Since MPIRequest manages an MPI_Request resource that should not be copied (cancelling/freeing the same MPI request twice is undefined behavior), you should either:
  1. Delete copy/move operations: MPIRequest(const MPIRequest&) = delete; MPIRequest& operator=(const MPIRequest&) = delete;
  2. Or implement proper copy/move semantics (though this is complex with MPI requests)

While the class is currently always used through std::shared_ptr, this protection is not enforced at the type level and could be accidentally violated in future code.

class MPIRequest : public Request {
public:
    explicit MPIRequest(MPI_Request req) : req_(req) {}
    ~MPIRequest() override;

    Status wait() override;
    std::optional<Status> test() override;
    void cancel() override;
    bool active() const override;

private:
    mutable MPI_Request req_{};
    bool done_{};
};

Comment on lines +126 to +136
MPIRequest::~MPIRequest() {
if (done_ || was_mpi_finalized()) {
return;
}
// Cancel and free the incomplete request
// Not using MPI_CHECK here to avoid throwing in destructor
// Errors are ignored because we cannot throw from a destructor
// and the request will be freed regardless
[[maybe_unused]] int cancel_err = MPI_Cancel(&req_);
[[maybe_unused]] int free_err = MPI_Request_free(&req_);
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: The done_ flag is checked without synchronization, but in a multi-threaded MPI environment (this code uses MPI_THREAD_MULTIPLE), another thread could be concurrently calling wait(), test(), or cancel() which also access done_. This could lead to:

  1. Double-free: destructor runs while cancel() is executing
  2. Use-after-free: destructor frees req_ while wait() is using it

Consider adding std::atomic<bool> for done_ or using a mutex to protect both req_ and done_ access across all methods. Note that cancel() also has this issue (line 162-169), but the destructor makes the race more likely since it can be called from any thread when the shared_ptr refcount drops to zero.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copilot-autofix PRs automatically created by Copilot to fix ClangSA issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants