Handle deadlocks

The following test deadlocks for all schedulers:

```Python
import pytest_parallel

from time import sleep

@pytest_parallel.mark.parallel(2)
def test_raise_then_coll(comm):
  if comm.rank==0:
    sleep(1)
    raise RuntimeError('my excpetion message')
  comm.allreduce(42, comm)
```

What happens:
**Rank 0**
- during test execution, raise the exception and never calls `allreduce`
- the exception is caught by pytest, and added to the report
- pytest waits for rank 1 to send its report

**Rank 1**
- no exception raised
- the tests executes `allreduce` and waits for rank 0

How can we solve or at least improve the problem :
- We can crash pytest if a test encounters an exception (through `pytest_exception_interact` maybe?). Not ideal because not all tests will be run.
- We can use a timeout parameter when we wait for test reports (suggested by @cbenazet). 
    - But then we need a mechanism to signal the other ranks that they should cancel their current test
    - We still need to handle the case where the proc that does the report gathering is the one stuck in the `allreduce`
    - Note that the rank which raised the exception is not stuck in the test and will send its report (but no garantee it will be received)
- Maybe we can hook mpi4py blocking functions to return an error if they timeout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle deadlocks #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle deadlocks #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions