diff --git a/proposals/0043-hlsl-intrinsic-tablegen.md b/proposals/0043-hlsl-intrinsic-tablegen.md new file mode 100644 index 0000000..c0db376 --- /dev/null +++ b/proposals/0043-hlsl-intrinsic-tablegen.md @@ -0,0 +1,602 @@ +--- +title: "[NNNN] - HLSL Intrinsic TableGen" +params: + authors: + - icohedron: Deric Cheung + status: Under Consideration +--- + +## Introduction + +This proposal introduces a TableGen-based system for generating HLSL +intrinsic function overload declarations and definitions. HLSL intrinsics +require many explicit overloads for each combination of element type (half, +float, int, etc.) and shape (scalar, vector, matrix). A single function +like `clamp` [needs 36 hand-written overloads for scalars and vectors +alone](https://github.com/llvm/llvm-project/blob/dd76cf68d392a6bcbdfefc2970a391486aa48825/clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h#L614-L707). +The TableGen approach replaces these with compact declarative +definitions that are expanded by a backend into the required HLSL +declarations, significantly reducing the amount of hand-written code +in the HLSL intrinsic headers. + +## Motivation + +The HLSL intrinsic headers contain thousands of lines of repetitive +overload declarations. For example, the `and` function requires 40 +lines of hand-written code to cover its scalar, vector, and matrix +overloads — all following an identical pattern of +`_HLSL_BUILTIN_ALIAS` followed by a function signature: + +```hlsl +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool and(bool x, bool y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool2 and(bool2 x, bool2 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool3 and(bool3 x, bool3 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool4 and(bool4 x, bool4 y); + +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool1x2 and(bool1x2 x, bool1x2 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool1x3 and(bool1x3 x, bool1x3 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool1x4 and(bool1x4 x, bool1x4 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool2x1 and(bool2x1 x, bool2x1 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool2x2 and(bool2x2 x, bool2x2 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool2x3 and(bool2x3 x, bool2x3 y); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool2x4 and(bool2x4 x, bool2x4 y); +// ... 9 more matrix overloads ... +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_and) +bool4x4 and(bool4x4 x, bool4x4 y); +``` + +This pattern is repeated for each of the ~60 alias intrinsic functions. +A similar pattern applies to ~14 detail-wrapper intrinsics (inline +functions that forward to `__detail::*_impl` helpers) and a handful +of inline-body intrinsics (e.g., unsigned `abs` as a constexpr +identity). In all cases, every type × shape combination must be +written by hand. The repetition creates several problems: + +1. **Maintenance burden.** Adding a new element type or shape to an + intrinsic requires adding overloads by hand to every affected + function. As matrix support is extended to more intrinsics, each + one will need up to 15 additional overloads per element type — + for a function like `clamp` that supports 9 element types, that + means 135 new hand-written overloads. + +2. **Inconsistency risk.** With thousands of similar declarations, + it is easy to introduce subtle errors (wrong type, missing + availability attribute, wrong builtin alias) that are hard to + spot in review. + +3. **16-bit availability complexity.** Half and 16-bit integer types + require conditional availability attributes + (`_HLSL_16BIT_AVAILABILITY` vs `_HLSL_AVAILABILITY`) and + `#ifdef __HLSL_ENABLE_16_BIT` guards. Getting this right for + every overload is tedious and error-prone. + +4. **Template instantiation differs from overload resolution.** Some + existing intrinsics use C++ templates to reduce repetition, but + this changes call-site semantics — preventing implicit conversions, + scalar-to-aggregate splats, and truncations that work with + explicit overloads (see [C++ templates](#c-templates) in + Alternatives considered). + +A TableGen-based approach addresses all of these by capturing each +intrinsic's type and shape requirements declaratively, and generating +the correct explicit overloads — whether alias declarations, detail +function wrappers, or inline bodies — with proper availability +attributes and `#ifdef` guards automatically. + +## Proposed solution and design + +Define HLSL intrinsics declaratively in a TableGen file +(`HLSLIntrinsics.td`) and use a custom TableGen backend +(`HLSLEmitter`) to generate the overload declarations. +For instance, the `and` example in the motivation above becomes: + +```tablegen +def hlsl_and : HLSLTwoArgBuiltin<"and", "__builtin_hlsl_and"> { + let VaryingTypes = [BoolTy]; +} +``` + +This 3-line definition generates all 19 overloads (1 scalar + +3 vector + 15 matrix) that previously required 40 lines of +hand-written code. + +### The `HLSLBuiltin` class + +Each intrinsic is defined as an `HLSLBuiltin` record that describes +**what** types it supports and **how** it maps to the underlying +implementation. The TableGen emitter reads these records and +generates the full set of explicit overloads. + +The `HLSLBuiltin` class takes two positional parameters: `name` (the +HLSL function name) and `builtin` (the Clang builtin to alias, which +defaults to `""`). These populate the `Name` and `Builtin` fields +respectively. The remaining fields are set via `let` overrides: + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `Name` | `string` | *(positional, required)* | The HLSL function name (e.g., `"clamp"`). Populated by the first parameter. | +| `Builtin` | `string` | `""` *(positional)* | The Clang builtin to alias (e.g., `"__builtin_hlsl_elementwise_clamp"`). Populated by the second parameter. When set to a non-empty string, overloads are emitted with `_HLSL_BUILTIN_ALIAS`. Mutually exclusive with `DetailFunc` and `Body` if set to a non-empty string. | +| `Doc` | `string` | `""` | Doxygen comment emitted before the overloads. | +| `ReturnType` | `HLSLReturnType` | `Void` | How the return type is derived for each overload (see [Argument and return type descriptors](#argument-and-return-type-descriptors)). | +| `Args` | `list` | `[]` | Argument list. Each entry is a type descriptor. The length determines the argument count. | +| `VaryingTypes` | `list` | `[]` | Element types to expand over. One overload set (scalar + vectors + matrices) is generated per type. | +| `VaryingScalar` | `bit` | `0` | Whether to generate scalar overloads for Varying-typed arguments. | +| `VaryingVecSizes` | `list` | `[]` | Vector sizes to generate (e.g., `[2, 3, 4]`) for Varying-typed arguments. | +| `VaryingMatDims` | `list` | `[]` | Matrix dimensions to generate (e.g., `AllMatDims`, `[Mat4x4]`) for Varying-typed arguments. | +| `DetailFunc` | `string` | `""` | When set, generates an inline function that forwards to `__detail::DetailFunc(args...)`. Mutually exclusive with `Builtin` and `Body`. | +| `Body` | `string` | `""` | When set, generates an inline function with this literal body text. Mutually exclusive with `Builtin` and `DetailFunc`. | +| `ParamNames` | `list` | `[]` | Custom parameter names for the arguments. When empty, inline functions use `p0`, `p1`, ... | +| `IsConstexpr` | `bit` | `0` | Emits `constexpr` instead of `inline` for inline functions. | +| `IsConvergent` | `bit` | `0` | Marks the function as convergent. | +| `Availability` | `ShaderModel` | `NoSM` | Minimum shader model version. When set, overloads are annotated with `_HLSL_AVAILABILITY`. | + +The `Availability` field uses `ShaderModel` records: + +```tablegen +class ShaderModel { + int Major = major; + int Minor = minor; +} + +def NoSM : ShaderModel<0, 0>; // no availability annotation +def SM6_0 : ShaderModel<6, 0>; +def SM6_2 : ShaderModel<6, 2>; +def SM6_4 : ShaderModel<6, 4>; +``` + +A matrix dimension class and named records for each valid dimension +are also provided for filling out the `VaryingMatDims` field: + +```tablegen +class MatDim { + int Rows = rows; + int Cols = cols; +} + +def Mat1x2 : MatDim<1, 2>; +def Mat1x3 : MatDim<1, 3>; +// ... Mat1x4, Mat2x1, ..., Mat4x4 +``` + +| Group | Dimensions | +|-------|------------| +| `AllMatDims` | `Mat1x2` through `Mat4x4` (15 records, excluding 1×1) | + + +### Element types and type groups + +Each HLSL scalar type is defined as an `HLSLType` record with flags +controlling 16-bit availability behavior: + +| Record | HLSL type | `Is16Bit` | `IsConditionally16Bit` | +|--------|-----------|-----------|------------------------| +| `BoolTy` | `bool` | | | +| `HalfTy` | `half` | | ✓ | +| `FloatTy` | `float` | | | +| `DoubleTy` | `double` | | | +| `Int16Ty` | `int16_t` | ✓ | | +| `UInt16Ty` | `uint16_t` | ✓ | | +| `IntTy` | `int` | | | +| `UIntTy` | `uint` | | | +| `Int64Ty` | `int64_t` | | | +| `UInt64Ty` | `uint64_t` | | | +| `UInt32Ty` | `uint32_t` | | | + +Commonly-used groups of types are provided as lists: + +| Group | Types | +|-------|-------| +| `AllFloatTypes` | `half`, `float`, `double` | +| `SignedIntTypes` | `int16_t`, `int`, `int64_t` | +| `UnsignedIntTypes` | `uint16_t`, `uint`, `uint64_t` | +| `AllIntTypes` | `int16_t`, `uint16_t`, `int`, `uint`, `int64_t`, `uint64_t` | +| `SignedTypes` | `int16_t`, `half`, `int`, `float`, `int64_t`, `double` | +| `AllNumericTypes` | all integer and float types | +| `AllTypesWithBool` | `bool` + all numeric types | +| `NumericTypesNoDbl` | all numeric types except `double` | + +### Overload expansion + +An intrinsic like `clamp` supports many +types (`int`, `float`, `half`, ...) and many shapes (scalar, `vec2`, +`vec3`, `vec4`, and matrices). Rather than listing every combination +by hand, the definition uses `Varying` as a placeholder for the +return type and arguments. `VaryingTypes` specifies which element +types to expand over, and `VaryingScalar`, `VaryingVecSizes`, +and `VaryingMatDims` specify which shapes to generate. The emitter +then substitutes `Varying` with each type × shape combination to +produce the full set of overloads: + +```tablegen +def hlsl_clamp : HLSLBuiltin<"clamp", + "__builtin_hlsl_elementwise_clamp"> { + let ReturnType = Varying; + let Args = [Varying, Varying, Varying]; + let VaryingTypes = AllNumericTypes; + let VaryingScalar = 1; + let VaryingVecSizes = [2, 3, 4]; + let VaryingMatDims = AllMatDims; +} +``` + +This generates overloads across all numeric types, each +with scalar, vec2/3/4, and all 15 matrix shapes (1×2 through 4×4). + +```hlsl +// clamp overloads +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +half clamp(half, half, half); +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +half2 clamp(half2, half2, half2); +// ... half3, half4 ... + +#ifdef __HLSL_ENABLE_16_BIT +_HLSL_AVAILABILITY(shadermodel, 6.2) +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +int16_t clamp(int16_t, int16_t, int16_t); +// ... int16_t2–4, uint16_t–uint16_t4 ... +#endif + +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +int clamp(int, int, int); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +int2 clamp(int2, int2, int2); +// ... int3, int4, uint–uint4, float–float4, int64_t–int64_t4, ... +// ... uint64_t–uint64_t4, double–double4 ... + +// matrix overloads (for each type above) +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +int1x2 clamp(int1x2, int1x2, int1x2); +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_elementwise_clamp) +int1x3 clamp(int1x3, int1x3, int1x3); +// ... int1x4, int2x1, int2x2, ..., int4x4 ... +// ... same 15 matrix shapes for uint, float, double, int64_t, uint64_t ... +``` + +#### Common helper subclasses + +Because the pattern of "all arguments and the return type share the +same type, with scalar + vector + matrix shapes" is so common, helper +subclasses are provided. For example, `HLSLThreeArgBuiltin` is +defined as: + +```tablegen +class HLSLThreeArgBuiltin + : HLSLBuiltin { + let Args = [Varying, Varying, Varying]; + let ReturnType = Varying; + let VaryingScalar = 1; + let VaryingVecSizes = [2, 3, 4]; + let VaryingMatDims = AllMatDims; +} +``` + +`HLSLOneArgBuiltin` and `HLSLTwoArgBuiltin` follow the same pattern +with one and two arguments respectively. Similar helpers exist for +detail function and inline body modes (see +[Three generation modes](#three-generation-modes)): + +```tablegen +class HLSLTwoArgDetail : HLSLBuiltin { + let DetailFunc = detail; + let Args = [Varying, Varying]; + let ReturnType = Varying; + let VaryingScalar = 1; + let VaryingVecSizes = [2, 3, 4]; + let VaryingMatDims = AllMatDims; +} + +class HLSLOneArgInlineBuiltin : HLSLBuiltin { + let Args = [Varying]; + let ReturnType = Varying; + let VaryingScalar = 1; + let VaryingVecSizes = [2, 3, 4]; + let VaryingMatDims = AllMatDims; +} +``` + +Using these helpers, the `clamp` definition above can be shortened to: + +```tablegen +def hlsl_clamp : HLSLThreeArgBuiltin<"clamp", + "__builtin_hlsl_elementwise_clamp"> { + let VaryingTypes = AllNumericTypes; +} +``` + +### Argument and return type descriptors + +Most intrinsics have +arguments and return types that directly follow the varying type — +e.g., `float3 clamp(float3, float3, float3)`. But some intrinsics +need arguments or return types that differ from the varying type in a +structured way. A set of type descriptor classes express these +relationships: + +- `Varying` — directly uses the varying type. +- `VaryingElemType` — always the scalar element type regardless of + shape. For example, `refract` takes a scalar `eta` parameter even + when operating on vectors: `float3 refract(float3, float3, float)`. +- `VaryingShape` — same shape as the varying type but with a + fixed element type `T`. For example, `countbits` returns `uint3` + for an `int3` input: `uint3 countbits(int3)`. +- `T` (an `HLSLType` record, e.g. `FloatTy`), `VectorType` — + fully fixed types that do not change across overloads. + +For example, `refract` is defined as: + +```tablegen +def hlsl_refract : HLSLBuiltin<"refract"> { + let DetailFunc = "refract_impl"; + let VaryingTypes = [HalfTy, FloatTy]; + let Args = [Varying, Varying, VaryingElemType]; + let ReturnType = Varying; + let VaryingScalar = 1; + let VaryingVecSizes = [2, 3, 4]; +} +``` + +The first two arguments and the return type use `Varying`, so they +follow the current type and shape (e.g., `float3`). The third +argument uses `VaryingElemType`, so it is always the scalar element +type (e.g., `float`) regardless of the vector size. This produces +overloads: + +```c++ +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +inline half refract(half p0, half p1, half p2) { + return __detail::refract_impl(p0, p1, p2); +} +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +inline half2 refract(half2 p0, half2 p1, half p2) { + return __detail::refract_impl(p0, p1, p2); +} +// ... half3, half4 ... +inline float refract(float p0, float p1, float p2) { + return __detail::refract_impl(p0, p1, p2); +} +inline float2 refract(float2 p0, float2 p1, float p2) { + return __detail::refract_impl(p0, p1, p2); +} +// ... float3, float4 ... +``` + +### Three generation modes + +An `HLSLBuiltin` generates code in one of three modes: + +1. **Alias mode** (`Builtin` is set) — emits `_HLSL_BUILTIN_ALIAS(builtin)` + before each declaration. Used for intrinsics that map directly to + a Clang builtin. + + ```tablegen + def hlsl_ceil : HLSLOneArgBuiltin<"ceil", "__builtin_elementwise_ceil"> { + let VaryingTypes = [HalfTy, FloatTy]; + let VaryingMatDims = []; + } + ``` + + Generates: + ```hlsl + _HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) + _HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) + half ceil(half); + _HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) + _HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) + half2 ceil(half2); + // ... half3, half4 ... + _HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) + float ceil(float); + // ... float2, float3, float4 ... + ``` + +2. **Detail function mode** (`DetailFunc` is set) — emits an inline + function that forwards to a `__detail::*_impl` helper defined in + `hlsl_intrinsic_helpers.h`. + + ```tablegen + def hlsl_fmod : HLSLTwoArgDetail<"fmod", "fmod_impl"> { + let ParamNames = ["X", "Y"]; + let VaryingTypes = [HalfTy, FloatTy]; + let VaryingMatDims = []; + } + ``` + + Generates: + ```hlsl + _HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) + inline half fmod(half X, half Y) { + return __detail::fmod_impl(X, Y); + } + _HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) + inline half2 fmod(half2 X, half2 Y) { + return __detail::fmod_impl(X, Y); + } + // ... half3, half4 ... + inline float fmod(float X, float Y) { + return __detail::fmod_impl(X, Y); + } + // ... float2, float3, float4 ... + ``` + +3. **Inline body mode** (`Body` is set) — emits an inline function + with literal body text. Used for simple inline implementations + like the unsigned `abs` identity. + + ```tablegen + def hlsl_abs_unsigned : HLSLOneArgInlineBuiltin<"abs"> { + let ParamNames = ["V"]; + let Body = "return V;"; + let IsConstexpr = 1; + let VaryingTypes = UnsignedIntTypes; + let VaryingMatDims = []; + } + ``` + + Generates: + ```hlsl + constexpr uint16_t abs(uint16_t V) { return V; } + constexpr uint abs(uint V) { return V; } + constexpr uint64_t abs(uint64_t V) { return V; } + // ... plus vector overloads + ``` + +### Availability + +The `Availability` field specifies a minimum shader model version for +an intrinsic. When set, every overload is annotated with +`_HLSL_AVAILABILITY(shadermodel, .)`. + +For example, `dot4add_i8packed` requires shader model 6.4: + +```tablegen +def hlsl_dot4add_i8packed : + HLSLBuiltin<"dot4add_i8packed", "__builtin_hlsl_dot4add_i8packed"> { + let Args = [UIntTy, UIntTy, IntTy]; + let ReturnType = IntTy; + let Availability = SM6_4; +} +``` + +Generates: + +```hlsl +_HLSL_AVAILABILITY(shadermodel, 6.4) +_HLSL_BUILTIN_ALIAS(__builtin_hlsl_dot4add_i8packed) +int dot4add_i8packed(uint, uint, int); +``` + +#### 16-bit availability + +Separately from per-intrinsic availability, when the emitter generates +overloads for a 16-bit type it automatically adds the appropriate +availability annotations based on flags on the `HLSLType` record: + +- `Is16Bit` — the type is a true 16-bit type (e.g., `int16_t`, + `uint16_t`). Overloads are wrapped in `#ifdef __HLSL_ENABLE_16_BIT` + / `#endif` guards and emitted with + `_HLSL_AVAILABILITY(shadermodel, 6.2)`. +- `IsConditionally16Bit` — the type has a 16-bit variant but is not always + 16-bit (e.g., `half`, which is a true 16-bit float only when + `__HLSL_ENABLE_16_BIT` is defined, otherwise an alias for `float`). + Overloads are emitted with + `_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2)`, which expands to an + availability attribute only when `__HLSL_ENABLE_16_BIT` is defined + and otherwise expands to nothing. + +For either flag, if the intrinsic's own `Availability` is SM 6.2 or +later, `_HLSL_AVAILABILITY` is used instead since 16-bit support is +already implied. + +This ensures that 16-bit overloads are only visible when the +target supports them, without the intrinsic author needing to +handle it manually. For example, `ceil` uses `[HalfTy, FloatTy]` +as its `VaryingTypes`. Since `HalfTy` has `IsConditionally16Bit` +set, the emitter automatically annotates the `half` overloads: + +```hlsl +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +_HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) +half ceil(half); +_HLSL_16BIT_AVAILABILITY(shadermodel, 6.2) +_HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) +half2 ceil(half2); +// ... half3, half4 ... + +// float overloads have no availability annotation +_HLSL_BUILTIN_ALIAS(__builtin_elementwise_ceil) +float ceil(float); +// ... float2, float3, float4 ... +``` + +### Generated file structure + +The emitter produces two `.inc` files: + +- `hlsl_intrinsics_gen.inc` — alias intrinsics + (`_HLSL_BUILTIN_ALIAS` declarations) +- `hlsl_detail_intrinsics_gen.inc` — detail and inline-body + intrinsics + +These are included from `hlsl_intrinsics.h` with +`hlsl_intrinsic_helpers.h` (containing `__detail::*_impl` helper +functions) included between them to satisfy the dependency chain: +alias declarations → helpers (which reference alias-declared +functions) → detail intrinsics (which call helpers). + +### Intrinsics that remain hand-written + +Some intrinsics remain hand-written in `hlsl_intrinsics.h` because +they don't fit the "generate all overloads for a list of types" +pattern. These are: + +- `asfloat`/`asint`/`asuint` — reinterpret the bits of a scalar or + vector as a different element type, returning the same shape. + These rely on a `sizeof` check in the template to reject types + whose size doesn't match the target type (e.g., `asfloat(half)` + must fail when native half is enabled because `sizeof(half) != + sizeof(float)`). With explicit overloads, the `half` argument + would instead be implicitly promoted to `float` before the + bit-cast, silently changing semantics. +- `asuint` (splitdouble variant) — uses `out` parameters +- `firstbithigh` — calls a helper templated on a `BitWidth` + constant that differs per type group (16 for `int16_t`/`uint16_t`, + 32 for `int`/`uint`, 64 for `int64_t`/`uint64_t`). The detail + function mechanism can only forward arguments to a helper, not + pass type-dependent template arguments. +- `mul` — 9 cases mixing scalar, vector, and matrix operands where + the return type depends on both argument kinds and matrices + require compatible inner dimensions (e.g., `M×K * K×N → M×N`) +- `select` — the condition argument is always `bool`/`boolN` while + the value arguments are templated over any type `T` + +## Alternatives considered + +### C++ templates + +The initial approach attempted to replace the explicit overloads with +C++ function templates. While this worked for simple one-argument +functions, it failed because template argument +instantiation and function overload resolution behave fundamentally +differently in ways that change observable semantics: + +**No implicit conversions across arguments.** With explicit +overloads, a call like `clamp(int_val, float_val, float_val)` is +rejected as ambiguous — the compiler lists all candidate overloads, +helping the user identify the mismatch. With a template +`T clamp(T, T, T)`, the same call is rejected with "deduced +conflicting types for parameter 'T'" — a less actionable diagnostic +that doesn't show which overloads were available. Both approaches +reject the call, but overloads produce more helpful errors. + +**No implicit vector/matrix truncation.** With explicit overloads, +passing a `float4` to `cross(float3, float3)` truncates the +`float4` to a `float3` (with a warning). With a template +`vector cross(vector, vector)`, the `float4` argument +cannot match `vector` and the call is rejected. + +**No implicit scalar-to-vector splat.** With explicit overloads, +`lerp(float3_val, float3_val, 0.5)` splats the scalar `0.5` into a +`float3`. With a template `T lerp(T, T, T)`, the compiler deduces +conflicting types (`float3` vs `float`) and rejects the call. + +### Preprocessor macros + +Overloads could be generated using X-macros or similar preprocessor +patterns. This would reduce line count but at the cost of +readability, debuggability, and IDE support. TableGen provides a +structured, type-safe approach with clear error messages and +integration with the existing LLVM build infrastructure.