Description
Precision
The current state of precision in Exo is that they are hard-coded types (e.g. f32
, f64
, i8
, i32
), and they mostly just signify C types and nothing more. However, precisions should have a better representation within Exo, because they have significant impact on our scheduling decisions, and have more complexity than we have previously thought about. Some ways they might impact scheduling decisions:
- amount of memory accessed depends on precision (e.g. loading two
f32
s vs loading onef64
). - they use different hardware instructions, which should result in different schedules (Kill R, change set_precision, make replace precision aware #440)
There are many precisions users might want to support (#296), and it's not a good model to make it the compiler dev's responsibility to implement those. We should think about how to make precisions more extensible.
Quantization and Mixed-Precision
Another issue related to precision is dealing with mixed-precisions within the algorithm. Some hardware architectures (Gemmini, AMX) have intrinsics with different input/output precisions. Some algorithms also involve quantization (especially in machine learning). These conversions between precisions are actually part of the actual algorithm/schedule. For instance in matmul, there's a scale
parameter used for clamping. It isn't possible to hard-code the scale
parameter into types because it is often a learned parameter.
For more info: some slides Yuka made here