Skip to content

Quantization and Precision  #485

Open
@skeqiqevian

Description

@skeqiqevian

Precision

The current state of precision in Exo is that they are hard-coded types (e.g. f32, f64, i8, i32), and they mostly just signify C types and nothing more. However, precisions should have a better representation within Exo, because they have significant impact on our scheduling decisions, and have more complexity than we have previously thought about. Some ways they might impact scheduling decisions:

There are many precisions users might want to support (#296), and it's not a good model to make it the compiler dev's responsibility to implement those. We should think about how to make precisions more extensible.

Quantization and Mixed-Precision

Another issue related to precision is dealing with mixed-precisions within the algorithm. Some hardware architectures (Gemmini, AMX) have intrinsics with different input/output precisions. Some algorithms also involve quantization (especially in machine learning). These conversions between precisions are actually part of the actual algorithm/schedule. For instance in matmul, there's a scale parameter used for clamping. It isn't possible to hard-code the scale parameter into types because it is often a learned parameter.

For more info: some slides Yuka made here

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: LanguageThe semantics of the languageTrackerTrack a large project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions