Skip to content

In some machines, tests take too much time to complete when using oversubscribe #902

@sanvila

Description

@sanvila

Hello. I reported this to Debian here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101363

On AWS instances of types c7a.large, m7a.large, r7a.large, which incidentally have 2 vCPUs, the Debian package for dbcsr used to take less than 4 minutes to build.

After I added PRTE_MCA_rmaps_default_mapping_policy=:oversubscribe, so that it also builds ok on systems with a single CPU, the build on systems with 2 CPUs now fails with timeout, like this:

11:  **********************************************************************
11:   -- TESTING dbcsr_multiply (N, C,            5 , A, N, N) ............... PASSED !
11:  **********************************************************************
11:  test_name multiply_LIMITS_MIX_3
11:  The solution is CORRECT !
11:  **********************************************************************
11:   -- TESTING dbcsr_multiply (T, N,            5 , A, N, N) ............... PASSED !
11:  **********************************************************************
11/19 Test #11: dbcsr_unittest1 .......................................***Timeout 1500.01 sec
[...]
The following tests FAILED:
	 11 - dbcsr_unittest1 (Timeout)

I tried increasing the timeout, like this:

--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -140,6 +140,7 @@ foreach (dbcsr_test ${DBCSR_TESTS_FTN})
   endif ()
   set_tests_properties(
     ${dbcsr_test} PROPERTIES ENVIRONMENT OMP_NUM_THREADS=${NUM_THREADS}
+                             TIMEOUT 3600
                              PROCESSORS ${test_processors})
 endforeach ()

but 3600 was not enough, and 7200 was not enouth either (still timeouts), which makes me to think that maybe the proper fix should be somewhere else.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions