Open
Description
Describe the bug
Multigrid transfers use the wrong communicators. A script that repeatedly solves a problem with multigrid may hang in parallel.
Steps to Reproduce
Steps to reproduce the behavior:
from firedrake import *
from firedrake.petsc import PETSc
print = PETSc.Sys.Print
base = UnitSquareMesh(4, 4)
mh = MeshHierarchy(base, 2)
solver_parameters = {
"ksp_converged_reason": None,
"ksp_type": "cg",
"pc_type": "mg",
"mg_levels_ksp_type": "chebyshev",
"mg_levels_pc_type": "jacobi",
"mg_coarse_ksp_type": "preonly",
"mg_coarse_pc_type": "lu",
"mg_coarse_pc_factor_mat_solver_type": "mumps",
}
def mg_solve():
start = 1
for i, msh in enumerate(mh[start:], start=start):
print(f"Level {i}")
V = FunctionSpace(msh, "CG", 1)
bcs = DirichletBC(V, 0, "on_boundary")
v = TestFunction(V)
u = TrialFunction(V)
a = inner(grad(u), grad(v))*dx
L = Cofunction(V.dual()).assign(1)
uh = Function(V)
solve(a == L, uh, bcs=bcs, solver_parameters=solver_parameters)
for k in range(6):
print(f"Run {k}")
mg_solve()
Then run with mpiexec -n 4 python script.py
Expected behavior
The script above should not hang in parallel.
Error message
No error message, the script just hangs.
Additional Info
A workaround is to disable garbage collection by commenting this line.
The relevant communicators are those for prolongation and injection
firedrake/firedrake/mg/ufl_utils.py
Line 384 in 7008cd7
firedrake/firedrake/mg/ufl_utils.py
Line 408 in 7008cd7