Skip to content

WIP, ENH: support parallel runtests#60

Open
tylerjereddy wants to merge 1 commit intokokkos:mainfrom
tylerjereddy:treddy_runtests_parallel
Open

WIP, ENH: support parallel runtests#60
tylerjereddy wants to merge 1 commit intokokkos:mainfrom
tylerjereddy:treddy_runtests_parallel

Conversation

@tylerjereddy
Copy link
Copy Markdown
Contributor

Unfortunately a decent number of tests fail with i.e., python runtests.py -n 10, so this is still a WIP.

Sample traceback below, I'll try to check a bit later today

____________________________________________________________________________________________________________________________________ test_sign_1d_special_cases[in_arr0-float-float32] _____________________________________________________________________________________________________________________________________
[gw3] linux -- Python 3.10.4 /home/tyler/python_310_pykokkos_work/bin/python

in_arr = array([-5. ,  4.5,  nan], dtype=float32), pk_dtype = <class 'pykokkos.interface.data_types.float'>, numpy_dtype = <class 'numpy.float32'>

    @pytest.mark.parametrize("pk_dtype, numpy_dtype", [
            (pk.double, np.float64),
            (pk.float, np.float32),
    ])
    @pytest.mark.parametrize("in_arr", [
        np.array([-5, 4.5, np.nan]),
        np.array([np.nan, np.nan, np.nan]),
    ])
    def test_sign_1d_special_cases(in_arr, pk_dtype, numpy_dtype):
        in_arr = in_arr.astype(numpy_dtype)
        view: pk.View1D = pk.View([in_arr.size], pk_dtype)
        view[:] = in_arr
        expected = np.sign(in_arr)
>       actual = pk.sign(view=view)

tests/test_ufuncs.py:390: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pykokkos/lib/ufuncs.py:257: in sign
    pk.parallel_for(view.shape[0], sign_impl_1d_float, view=view)
pykokkos/interface/parallel_dispatch.py:167: in parallel_for
    func, args = runtime_singleton.runtime.run_workunit(
pykokkos/core/runtime.py:85: in run_workunit
    return self.execute(workunit, module_setup, members, policy=policy, name=name, **kwargs)
pykokkos/core/runtime.py:119: in execute
    module = self.import_module(module_setup.name, module_setup.path)
pykokkos/core/runtime.py:148: in import_module
    module = importlib.util.module_from_spec(spec)
<frozen importlib._bootstrap>:571: in module_from_spec
    ???
<frozen importlib._bootstrap_external>:1176: in create_module
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <built-in function create_dynamic>, args = (ModuleSpec(name='pk_cpp_pk_console_ufuncs_sign_impl_1d_float_OpenMP_kernel_cpython_310_x86_64_linux_gnu_so', loader=<.../github_projects/pykokkos/pk_cpp/pk_console/ufuncs_sign_impl_1d_float/OpenMP/kernel.cpython-310-x86_64-linux-gnu.so'),), kwds = {}

>   ???
E   ImportError: /home/tyler/github_projects/pykokkos/pk_cpp/pk_console/ufuncs_sign_impl_1d_float/OpenMP/kernel.cpython-310-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory

<frozen importlib._bootstrap>:241: ImportError

* add a `runtests.py` parallel option
following kokkosgh-59
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

I'm still studying this a bit. Here is a plot suggesting a potential race condition on compilation vs. opening compiled objects when using pytest-xdist to run tests in parallel (for the sleeping diff below):

race_cond_maybe

--- a/pykokkos/core/runtime.py
+++ b/pykokkos/core/runtime.py
@@ -1,3 +1,4 @@
+import time
 import importlib.util
 import sys
 from typing import Any, Callable, Dict, Optional, Tuple, Type, Union
@@ -145,6 +146,7 @@ class Runtime:
             return sys.modules[module_name]
 
         spec = importlib.util.spec_from_file_location(module_name, module_path)
+        time.sleep(15)
         module = importlib.util.module_from_spec(spec)
         sys.modules[module_name] = module
         spec.loader.exec_module(module)

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

A hack like this helps reduce the parallel error rate, but not completely:

--- a/pykokkos/core/runtime.py
+++ b/pykokkos/core/runtime.py
@@ -1,3 +1,5 @@
+import time
+from pathlib import Path
 import importlib.util
 import sys
 from typing import Any, Callable, Dict, Optional, Tuple, Type, Union
@@ -145,6 +147,17 @@ class Runtime:
             return sys.modules[module_name]
 
         spec = importlib.util.spec_from_file_location(module_name, module_path)
+        # poll for the compiled file
+        while not Path(spec.origin).exists():
+            # NOTE: what is a reasonable delay time
+            # here? unfortunately, short times appear
+            # to be able to cause a segfault, perhaps
+            # because the shared object is only partially
+            # written when a load attempt is made?
+            time.sleep(0.3)
+        # even if the file exists, it may not have
+        # been fully flushed?
+        time.sleep(1.0)
         module = importlib.util.module_from_spec(spec)
         sys.modules[module_name] = module
         spec.loader.exec_module(module)

tylerjereddy added a commit to tylerjereddy/pykokkos that referenced this pull request Aug 22, 2022
* the current `develop` branch appears to not be parallel
safe as described at kokkos#60 (comment)

* this branch allows pykokkos to compile/run code both
in serial and in parallel by providing genuinely unique
identifiers (file paths) to each "compilation unit"; careful though,
this will slow down the serial execution time for the testsuite
substantially, probably because it removes reuse in favor
of safety from a compilation standpoint--there's probably an approach
that is both fast and safe, and I'm certainly open to that, but
I'd also argue that safe and slow > (parallel) unsafe and fast

* combined with kokkosgh-60, this allows:
  - `OMP_NUM_THREADS=1 python runtests.py -n 10`
  - `123 passed, 9 skipped, 9 xfailed, 16 warnings in 92.10s (0:01:32)`
  - that's more than twice as fast as the serial test run on `develop`
  - `python runtests.py`
  - `123 passed, 9 skipped, 9 xfailed, 16 warnings in 212.40s (0:03:32)`
  - however, this branch slows down the serial test run a lot, to 11.5
    minutes!

* of course, you'd probably have to thoroughly test `OMP_NUM_THREADS`
  values and so on to benchmark the hierarchical parallelism situation
  to determine scenarios where you'd even want to use "parallel pykokkos,"
  but I certainly think we should try to be "safe" for concurrent usage
  so that we don't impose a certain model of concurrency on our
  consumers
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

gh-61 has a more solid explanation I think, and does allow pytest-xdist concurrency finally, with one notable drawback

@NaderAlAwar NaderAlAwar changed the base branch from develop to main May 24, 2023 20:31
@kennykos
Copy link
Copy Markdown
Collaborator

Is this PR fixed by #93?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants