bug: Poor GPU thread blocking for at-points operators with all `CEED_EVAL_NONE` inputs and outputs

We currently have poor GPU thread blocking for at-points operators with all CEED_EVAL_NONE inputs and outputs.

The issue is in `backends/cuda-gen/ceed-cuda-gen-operator-build.cpp:CeedOperatorBuildKernel_Cuda_gen`:

```c
  // Lines 1228-1234
  if (Q_1d == 0) {
    if (is_at_points) Q_1d = max_num_points;
    else CeedCallBackend(CeedOperatorGetNumQuadraturePoints(op, &Q_1d));
  }
  if (Q == 0) Q = Q_1d;
  data->Q    = Q;
  data->Q_1d = Q_1d;
```

Rather than blocking by cell/slices of cells, we use a 1D blocking strategy with `Q_1d = max_num_points`. This can hit the maximum thread count/block size limits or the shared memory limits, causing a launch-time failure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Poor GPU thread blocking for at-points operators with all `CEED_EVAL_NONE` inputs and outputs #1901

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Poor GPU thread blocking for at-points operators with all CEED_EVAL_NONE inputs and outputs #1901

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

bug: Poor GPU thread blocking for at-points operators with all `CEED_EVAL_NONE` inputs and outputs #1901