Skip to content

Cranelift: make CLIF behavior platform-independent w.r.t. endianness #3369

Open
@cfallin

Description

@cfallin

Currently, CLIF has three kinds of endianness for memory loads and stores: big, little, and native. The meaning of a native-endian operation depends on the platform on which the CLIF executes.

The purpose of this three-option design, as we discussed in #2124, was to allow for convenience at the CLIF producer side: loads and stores that are meant to access platform-native values (such as pointers in a stack frame or data passed to and from code produced by other compilers) can simply use the "native" option, and the CLIF becomes parametric to endianness, working correctly on platforms of both endians.

It appears that, in the discussion in #2124, we initially (comment, comment) were leaning toward a strict two-option (big/little), always-explicit endianness flag on memory ops, but then it became apparent that this would require some more plumbing to know the endianness upfront.

The new forcing function that we have, however, is the CLIF interpreter. Because we now have an interpreter that is platform-independent, it becomes important to define what result a given CLIF execution should provide. It seems very important that this should be the same result regardless of the platform we happen to be running on. Otherwise, if a CLIF program can have multiple results depending on platform, then many other endianness issues could occur at higher levels of the system.

In essence, we're late-binding endianness, after the CLIF is produced. In contrast, other compilers, such as LLVM, use a form of early-binding: e.g., the data layout that is a part of a program in LLVM IR specifies the endianness assumed by the IR.

In this issue I'm suggesting that we consider doing the same: it would provide well-defined CLIF semantics, and shouldn't impact the ergonomics of most CLIF producers, requiring a bit more info when creating a builder (target platform) but then using the target's native endianness where "native" would have been used before.

One alternative is to disallow (i.e., declare to be undefined behavior) any CLIF that has a native-endian load/store interact with another access in a way that exposes endian-dependent behavior, but that seems much more problematic, because many real programs do this (e.g., Rust compiled via cg_clif can perfectly legally store a u32 to memory and load its first byte). Another alternative is to bias the interpreter toward one endianness or another (e.g., the interpreter always behaves like a little-endian machine), but then the results differ between interpretation and native execution on opposite-endianness machines (e.g. big-endian), which is also undesirable.

This is a continuation of the discussion in #3329; cc @uweigand @afonso360 @fitzgen and others. Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions