Quantization and Precision 

# Precision
The current state of precision in Exo is that they are hard-coded types (e.g. `f32`, `f64`, `i8`, `i32`), and they mostly just signify C types and nothing more. However, precisions should have a better representation within Exo, because they have significant impact on our scheduling decisions, and have more complexity than we have previously thought about. Some ways they might impact scheduling decisions:
- amount of memory accessed depends on precision (e.g. loading two `f32`s vs loading one `f64`).
- they use different hardware instructions, which should result in different schedules (#440)

There are many precisions users might want to support (#296), and it's not a good model to make it the compiler dev's responsibility to implement those. We should think about how to make precisions more extensible.

# Quantization and Mixed-Precision
Another issue related to precision is dealing with mixed-precisions within the algorithm. Some hardware architectures (Gemmini, AMX) have intrinsics with different input/output precisions. Some algorithms also involve **quantization** (especially in machine learning). These conversions between precisions are actually part of the actual algorithm/schedule. For instance in matmul, there's a `scale` parameter used for clamping. It isn't possible to hard-code the `scale` parameter into types because it is often a learned parameter.

For more info: some slides Yuka made [here](https://docs.google.com/presentation/d/1MCmJy5zo90l-Qlv-KF9aJ2OSDBsxrXDvBkehQRZhiGs/edit?usp=sharing)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization and Precision #485

Precision

Quantization and Mixed-Precision

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization and Precision #485

Description

Precision

Quantization and Mixed-Precision

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions