Skip to content

Optimizing covalent start/stop time #1933

@kessler-frost

Description

@kessler-frost

Average time taken to start: ~8 seconds
Average time taken to stop the server when there is at least 1 dispatch done: ~30 seconds

The start time mostly taken up by the verification of whether the server is ready to accept dispatches, that's why it is a kind of acceptable. But the stop time taken is actually a lot and we should try to reduce it. The majority of the time when stopping the server is actually taken up by the _terminate_child_processes function (can be found here.

Currently we are sending the SIGINT signal to the leader process and then shutting down its children and we know that this is working fine albeit slow. But as soon as I tried to use other methods of trying to terminate the process, such as the terminate and kill commands made available by psutils, none of them worked and the command got stuck in waiting forever.

We need to look further into fixing it by using the Dask APIs to possibly stop the cluster of workers instead of shutting down their processes directly.

It would also be better if we have tell the user what stage exactly is being loaded when starting/stopping the server and be more verbose.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions