Binds one or more numeric or data-value type expressions into a new intermediate, scalar-sized buffer. It attempts to perform common subexpression elimination while binding. The primitive will stop upon encountering a read of any buffer that the expression depends on. The precision of the new allocation matches that of the bound expression.
args:
expr_cursors - a list of cursors to multiple instances of the same expression
new_name - a string to name the new buffer
rewrite:
bind_expr(..., '32.0 * x[i]', 'b')
a = 32.0 * x[i] + 4.0
-->
b : R
b = 32.0 * x[i]
a = b + 4.0
Resizes the dim_idx
-th dimension of the buffer buf_cursor
to size
. The offset
argument specifies how to adjust the indices relative to the old buffer. If fold
is set to True
, the primitive will attempt to perform a circular buffer optimization, which ignores the offset
argument. The primitive will fail if there are any accesses to the dim_idx
-th dimension outside of the range (offset, offset + size)
.
args:
buf_cursor - cursor pointing to the Alloc
dim_idx - which dimension to shrink
size - new size as a positive expression
offset - offset for adjusting the buffer access
rewrite:
x : T[n, ...] ; s
-->
x : T[size, ...] ; s[ x[idx, ...] -> x[idx - offset, ...] ]
rewrite (if fold == True):
x : T ; s
-->
x : T ; s[ x[i] -> x[i % size] ]
checks:
The provided dimension size is checked for positivity and the
provided indexing expression is checked to make sure it is in-bounds
Expands the number of dimensions of a buffer variable (buf_cursor
). After the expansion, the existing code will initially use only specific entries of the new dimension, which are selected by the provided indexing_expr
.
args:
buf_cursor - cursor pointing to the Alloc to expand
alloc_dim - (string) an expression for the size of the new buffer dimension.
indexing_expr - (string) an expression to index the newly created dimension with.
rewrite:
x : T[...] ; s
-->
x : T[alloc_dim, ...] ; s[ x[...] -> x[indexing_expr, ...] ]
checks:
The provided dimension size is checked for positivity and the
provided indexing expression is checked to make sure it is in-bounds
Rearranges the dimensions of the specified buffer allocation according to the provided permutation (permute_vector
).
args:
buf_cursor - cursor pointing to an Alloc statement for an N-dimensional array
permute_vector - a permutation of the integers (0,1,...,N-1)
rewrite:
(with permute_vector = [2,0,1])
x : T[N,M,K]
-->
x : T[K,N,M]
Divides the dim_idx
-th buffer dimension into higher-order and lower-order dimensions, where the constant integer quotient
specifies the size of the lower-order dimension.
args:
alloc_cursor - cursor to the allocation to divide a dimension of
dim_idx - the index of the dimension to divide
quotient - (positive int) the factor to divide by
rewrite:
divide_dim(..., 1, 4)
x : R[n, 12, m]
x[i, j, k] = ...
-->
x : R[n, 3, 4, m]
x[i, j / 4, j % 4, k] = ...
Multiplies the hi_dim_idx
-th buffer dimension by the low_dim_idx
-th buffer dimension to create a single buffer dimension. This operation is only allowed when the lo_dim_idx
-th dimension is a constant integer value.
args:
alloc_cursor - cursor to the allocation to divide a dimension of
hi_dim_idx - the index of the higher order dimension to multiply
lo_dim_idx - the index of the lower order dimension to multiply
rewrite:
mult_dim(..., 0, 2)
x : R[n, m, 4]
x[i, j, k] = ...
-->
x : R[4*n, m]
x[4*i + k, j] = ...
Unrolls the buffer allocation with constant dimension.
args:
alloc_cursor - cursor to the buffer with constant dimension
dimension - dimension to unroll
rewrite:
buf : T[2] <- alloc_cursor
...
-->
buf_0 : T
buf_1 : T
...
Lifts a buffer allocation up and out of various loops and conditions.
args:
alloc_cursor - cursor to the allocation to lift up
n_lifts - number of times to try to move the allocation up
rewrite:
for i in _:
buf : T <- alloc_cursor
...
-->
buf : T
for i in _:
...
Sinks a buffer allocation into a scope (for loop/if statement). This scope must come immediately after the alloc statement.
args:
alloc_cursor - cursor to the allocation to sink up
rewrite:
buf : T <- alloc_cursor
for i in _:
...
-->
for i in _:
buf : T
...
Reuses existing buffer (buf_cursor
) instead of allocating a new buffer (replace_cursor
).
args:
buf_cursor - cursor pointing to the Alloc to reuse
replace_cursor - cursor pointing to the Alloc to eliminate
rewrite:
x : T ; ... ; y : T ; s
-->
x : T ; ... ; s[ y -> x ]
checks:
Can only be performed if the variable x is dead at the statement y : T.
Stages the window of memory specified by win_expr
into a new buffer
before the indicated code block and move the memory back after the
indicated code block. If code analysis allows one to omit either
the load or store between the original buffer and staging buffer, then
the load/store loops/statements will be omitted.
If code analysis determines determines that win_expr
accesses
out-of-bounds locations of the buffer, it will generate loop nests
for the load/store stages corresponding to that window, but will add
guards within the inner loop to ensure that all accesses to the buffer
are within the buffer's bounds.
In the event that the indicated block of code strictly reduces into
the specified window, then the optional argument accum
can be set
to initialize the staging memory to zero, accumulate into it, and
then accumulate that result back to the original buffer, rather than
loading and storing. This is especially valuable when one's target
platform can more easily zero out memory and thereby either
reduce memory traffic outright, or at least improve locality of access.
args:
block_cursor - the block of statements to stage around
win_expr - (string) of the form `name[pt_or_slice*]` e.g. 'x[32, i:i+4]'
In this case `x` should be accessed in the block, but only at locations (32, i), (32, i+1), (32, i+2), or (32, i+3)
new_buf_name - the name of the newly created staging buffer
accum - (optional, bool) see above
rewrite:
stage_mem(..., 'x[0:n,j-1:j]', 'xtmp')
for i in seq(0,n-1):
x[i,j] = 2 * x[i+1,j-1]
-->
for k0 in seq(0,n):
for k1 in seq(0,2):
xtmp[k0,k1] = x[k0,j-1+k1]
for i in seq(0,n-1):
xtmp[i,j-(j-1)] = 2 * xtmp[i+1,(j-1)-(j-1)]
for k0 in seq(0,n):
for k1 in seq(0,2):
x[k0,j-1+k1] = xtmp[k0,k1]
Deletes buf_cursor
if it is unused.