I encountered strange behavior of the program after printing the final output.
The program is stalling for a few milliseconds after the final output is displayed.
To investigate this, I made the following program that is supposed to measure time after exiting from the main function:
import atexit
import sys
import time
_shutdown_t0: float | None = None
def _report_shutdown() -> None:
global _shutdown_t0
dt_ms = (time.perf_counter() - _shutdown_t0) * 1e3
print(f"[shutdown] end of atexit chain: {dt_ms:.3f} ms", file=sys.stderr)
atexit.register(_report_shutdown)
import cupy as cp
import pykokkos as pk
@pk.workunit
def fill_one(idx, v) -> None:
v[idx] = 1.0
def main() -> None:
global _shutdown_t0
pk.set_default_space(pk.ExecutionSpace.Cuda)
N = 4096
v = cp.zeros(N, dtype=cp.float64)
p = pk.RangePolicy(0, N)
pk.parallel_for("fill", p, fill_one, v=v)
print("main: done printing", flush=True)
_shutdown_t0 = time.perf_counter()
if __name__ == "__main__":
main()
The import order matters in this case!
- The standard output in here is (concrete time result depends on the system):
$ python measure_shutdown_timing.py
main: done printing
[shutdown] end of atexit chain: 65.331 ms
- If we comment out the
pk.parallel_for("fill", p, fill_one, v=v) line, we will get the following results:
$ python measure_shutdown_timing.py
main: done printing
[shutdown] end of atexit chain: 11.056 ms
- If we comment out the whole
pykokkos block, we will get:
main: done printing
[shutdown] end of atexit chain: 0.025 ms
These deltas do not make any sense. Why do we have any time difference after the main function execution, and why does it depend on some parallel_ function execution?
I encountered strange behavior of the program after printing the final output.
The program is stalling for a few milliseconds after the final output is displayed.
To investigate this, I made the following program that is supposed to measure time after exiting from the
mainfunction:The import order matters in this case!
pk.parallel_for("fill", p, fill_one, v=v)line, we will get the following results:pykokkosblock, we will get:These deltas do not make any sense. Why do we have any time difference after the
mainfunction execution, and why does it depend on someparallel_function execution?