Skip to content

BUG: Parallel deadlock in multigrid #4293

Open
@pbrubeck

Description

@pbrubeck

Describe the bug
Multigrid transfers use the wrong communicators. A script that repeatedly solves a problem with multigrid may hang in parallel.

Steps to Reproduce
Steps to reproduce the behavior:

from firedrake import *
from firedrake.petsc import PETSc
print = PETSc.Sys.Print

base = UnitSquareMesh(4, 4)
mh = MeshHierarchy(base, 2)

solver_parameters = {
    "ksp_converged_reason": None,
    "ksp_type": "cg",
    "pc_type": "mg",
    "mg_levels_ksp_type": "chebyshev",
    "mg_levels_pc_type": "jacobi",
    "mg_coarse_ksp_type": "preonly",
    "mg_coarse_pc_type": "lu",
    "mg_coarse_pc_factor_mat_solver_type": "mumps",
}

def mg_solve():
    start = 1
    for i, msh in enumerate(mh[start:], start=start):
        print(f"Level {i}")
        V = FunctionSpace(msh, "CG", 1)
        bcs = DirichletBC(V, 0, "on_boundary")
        v = TestFunction(V)
        u = TrialFunction(V)
        a = inner(grad(u), grad(v))*dx
        L = Cofunction(V.dual()).assign(1)
        uh = Function(V)
        solve(a == L, uh, bcs=bcs, solver_parameters=solver_parameters)

for k in range(6):
    print(f"Run {k}")
    mg_solve()

Then run with mpiexec -n 4 python script.py

Expected behavior
The script above should not hang in parallel.

Error message
No error message, the script just hangs.

Additional Info
A workaround is to disable garbage collection by commenting this line.

The relevant communicators are those for prolongation and injection

mat = PETSc.Mat().create(comm=dmc.comm)
and
mat = PETSc.Mat().create(comm=dmc.comm)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions