Description
In file ompi/oshmem/mca/spml/ucx/spml_ucx.c
around line 1140,
/* Check if we have an idle context to reuse */
SHMEM_MUTEX_LOCK(mca_spml_ucx.internal_mutex);
for (i = 0; i < idle_array->ctxs_count; i++) {
if (idle_array->ctxs[i]->options & options) {
ucx_ctx = idle_array->ctxs[i];
_ctx_remove(idle_array, ucx_ctx, i);
break;
}
}
We have the above code that attempts to reuse idle contexts from an array of idle contexts. However, the condition (idle_array->ctxs[i]->options & options
at line 1140 suggests that an idle context with option 0 will never get reused.
If you modify the OSHMEM code to print out ctxs_count
which is the length of the array idle_array
, and run the following simple program,
#include <shmem.h>
int main() {
shmem_init();
for (int i = 0; i < 10; i++) {
shmem_ctx_t ctx;
shmem_ctx_create(0, &ctx);
shmem_ctx_destroy(ctx);
}
shmem_finalize();
}
you will observe that ctxs_count
grows from 0 to 9, indicating the idle contexts are not getting reused and the idle array explodes. I believe this is not the expected behavior and the fix would be as simple as changing the condition from idle_array->ctxs[i]->options & options
to idle_array->ctxs[i]->options == options
.
Moreover, the current code can lead to correctness issue as well because it can potentially assign a more restrictive context as a context with less restrictive option. Consider the following program.
#include <shmem.h>
int main() {
shmem_init();
shmem_ctx_t ctx1, ctx2;
shmem_ctx_create(SHMEM_CTX_NOSTORE | SHMEM_CTX_PRIVATE, &ctx1);
shmem_ctx_destroy(ctx1);
shmem_ctx_create(SHMEM_CTX_NOSTORE, &ctx2);
shmem_ctx_destroy(ctx2);
shmem_finalize();
}
With the current code, a context configured with both SHMEM_CTX_NOSTORE
and SHMEM_CTX_PRIVATE
will be assigned as a context configured with only SHMEM_CTX_NOSTORE
. This will lead to correctness issue as ctx2 may be shared. In fact, the current code crushes when the above program runs. Changing &
to ==
fixes this issue as well.
It is clearly stated in section 9.4.1 of the OpenSHMEM 1.4 specification that multiple options can be combined with a bitwise OR operation.