Skip to content

Excessive loads for single float values #343

Open
@alexreinking

Description

@alexreinking

This proc

@proc
def sgemv(
  alpha: f32,
  beta: f32,
  m: size,
  n: size,
  a: f32[m, n],
  x: f32[n],
  y: f32[m],
):
  for i in seq(0, m):
    y[i] = beta * y[i]
    for j in seq(0, n):
      y[i] += alpha * x[j] * a[i, j]

compiles to the following code:

void sgemv(void *ctxt, const float *alpha, const float *beta, int_fast32_t m,
           int_fast32_t n, const float *a, const float *x, float *y) {
  for (int i = 0; i < m; i++) {
    y[(i) * (1)] = *beta * y[(i) * (1)];
    for (int j = 0; j < n; j++) {
      y[(i) * (1)] += *alpha * x[(j) * (1)] * a[(i) * (n) + (j) * (1)];
    }
  }
}

However, dereferencing alpha in every iteration of the inner loop causes it to be reloaded on every iteration. C doesn't allow you to hoist it because it could technically be memory-mapped and change underneath you. The code we should generate looks more like this:

void sgemv(void *ctxt, const float *alpha, const float *beta, int_fast32_t m,
           int_fast32_t n, const float *a, const float *x, float *y) {
  const float alpha_ = *alpha;
  const float beta_ = *beta;
  for (int i = 0; i < m; i++) {
    y[(i) * (1)] = beta_ * y[(i) * (1)];
    for (int j = 0; j < n; j++) {
      y[(i) * (1)] += alpha_ * x[(j) * (1)] * a[(i) * (n) + (j) * (1)];
    }
  }
}

Here, the values are loaded only once, at the start of the pipeline.

See the following Godbolt interaction to see the assembly diff: https://gcc.godbolt.org/z/WExnhEd68

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: CodegenThe final C code generationS: AvailableAvailable to be worked upon

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions