BUG: handle segfault with hierarchical kernel when team size is larger than available threads

As noted here: https://github.com/kokkos/pykokkos/pull/146#issuecomment-1475063753

If you write a PyKokkos kernel that uses a team barrier synchronization and probably other related hierarchical parallelism features, it seems that you can get a hard segfault if you have `OMP_NUM_THREADS=1` in your environment.

While Kokkos core probably has a case for not behaving so well here, since the code is already compiled, if we have ahead-of-compile-time knowledge of the number of threads that will be available, I wonder if we should do something more useful than segfaulting by default.

I checked that deleting the barrier syncs isn't sufficient to make the segfault go away, so something broader about the hierarchical kernel is likely to blame.

Copy of the crashing workunit below the fold, in case it gets mutated a lot in the matching PR:

<details>

```python
@pk.workunit
def dgemm_impl_tiled_no_view_c(team_member: pk.TeamMember,
                               k_a: int,
                               alpha: float,
                               view_a: pk.View2D[pk.double],
                               view_b: pk.View2D[pk.double],
                               out: pk.View2D[pk.double]):
    printf("tiled workunit checkpoint 1")
    # early attempt at tiled matrix multiplication in PyKokkos

    # for now, let's assume a 2x2 tiling arrangement and
    # that `view_a`, `view_b`, and `out` views are all 4 x 4 matrices
    tile_size: int = 4 # this is really just the team size...
    width: int = 4

    # start off by getting a global thread id
    global_tid: int = team_member.league_rank() * team_member.team_size() + team_member.team_rank()
    printf("tiled workunit checkpoint 2 for thread id: %d\n", global_tid)

    # TODO: I have no idea how to get 2D scratch memory views?
    scratch_mem_a: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
    scratch_mem_b: pk.ScratchView1D[float] = pk.ScratchView1D(team_member.team_scratch(0), tile_size)
    printf("tiled workunit checkpoint 3 for thread id: %d\n", global_tid)
    # in a 4 x 4 matrix with 2 x 2 tiling the leagues
    # and teams have matching row/col assignment approaches
    bx: int = team_member.league_rank() / 2
    by: int = 0
    if team_member.league_rank() % 2 != 0:
        by = 1
    tx: int = team_member.team_rank() / 2
    ty: int = 0
    if team_member.team_rank() % 2 != 0:
        ty = 1
    tmp: float = 0
    col: int = by * 2 + ty
    row: int = bx * 2 + tx
    printf("tiled workunit checkpoint 4 for thread id: %d\n", global_tid)

    # these variables are a bit silly--can we not get
    # 2D scratch memory indexing?
    a_index: int = 0
    b_index: int = 0

    for i in range(out.extent(1) / 2):
        scratch_mem_a[team_member.team_rank()] = view_a[row][i * 2 + ty]
        scratch_mem_b[team_member.team_rank()] = view_b[i * 2 + tx][col]
        printf("tiled workunit checkpoint 5 for thread id: %d\n", global_tid)
        team_member.team_barrier()
        printf("tiled workunit checkpoint 6 for thread id: %d\n", global_tid)

        for k in range(2):
            a_index = k + ((team_member.team_rank() // 2) * 2)
            b_index = ty + (k * 2)
            tmp += scratch_mem_a[a_index] * scratch_mem_b[b_index]
            team_member.team_barrier()
            printf("tiled workunit checkpoint 7 for thread id: %d\n", global_tid)

    printf("tiled workunit checkpoint 8 for thread id: %d\n", global_tid)
    out[row][col] = tmp
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: handle segfault with hierarchical kernel when team size is larger than available threads #185

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: handle segfault with hierarchical kernel when team size is larger than available threads #185

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions