Skip to content

bug: Poor GPU thread blocking for at-points operators with all CEED_EVAL_NONE inputs and outputs #1901

@zatkins-dev

Description

@zatkins-dev

We currently have poor GPU thread blocking for at-points operators with all CEED_EVAL_NONE inputs and outputs.

The issue is in backends/cuda-gen/ceed-cuda-gen-operator-build.cpp:CeedOperatorBuildKernel_Cuda_gen:

  // Lines 1228-1234
  if (Q_1d == 0) {
    if (is_at_points) Q_1d = max_num_points;
    else CeedCallBackend(CeedOperatorGetNumQuadraturePoints(op, &Q_1d));
  }
  if (Q == 0) Q = Q_1d;
  data->Q    = Q;
  data->Q_1d = Q_1d;

Rather than blocking by cell/slices of cells, we use a 1D blocking strategy with Q_1d = max_num_points. This can hit the maximum thread count/block size limits or the shared memory limits, causing a launch-time failure.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions