Excessive loads for single float values

This proc

```
@proc
def sgemv(
  alpha: f32,
  beta: f32,
  m: size,
  n: size,
  a: f32[m, n],
  x: f32[n],
  y: f32[m],
):
  for i in seq(0, m):
    y[i] = beta * y[i]
    for j in seq(0, n):
      y[i] += alpha * x[j] * a[i, j]
```

compiles to the following code:

```
void sgemv(void *ctxt, const float *alpha, const float *beta, int_fast32_t m,
           int_fast32_t n, const float *a, const float *x, float *y) {
  for (int i = 0; i < m; i++) {
    y[(i) * (1)] = *beta * y[(i) * (1)];
    for (int j = 0; j < n; j++) {
      y[(i) * (1)] += *alpha * x[(j) * (1)] * a[(i) * (n) + (j) * (1)];
    }
  }
}
```

However, dereferencing `alpha` in every iteration of the inner loop causes it to be _reloaded_ on every iteration. C doesn't allow you to hoist it because it could _technically_ be memory-mapped and change underneath you. The code we should generate looks more like this:

```
void sgemv(void *ctxt, const float *alpha, const float *beta, int_fast32_t m,
           int_fast32_t n, const float *a, const float *x, float *y) {
  const float alpha_ = *alpha;
  const float beta_ = *beta;
  for (int i = 0; i < m; i++) {
    y[(i) * (1)] = beta_ * y[(i) * (1)];
    for (int j = 0; j < n; j++) {
      y[(i) * (1)] += alpha_ * x[(j) * (1)] * a[(i) * (n) + (j) * (1)];
    }
  }
}
```

Here, the values are loaded only once, at the start of the pipeline.

See the following Godbolt interaction to see the assembly diff: https://gcc.godbolt.org/z/WExnhEd68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive loads for single float values #343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Excessive loads for single float values #343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions