This document describes the Polang type system, including type inference and polymorphic types.
- Overview
- Primitive Types
- Type Conversions
- Type Inference
- Type Variables
- Unification Algorithm
- Implementation Details
- Examples
- Monomorphization
- Error Handling
- Known Limitations
Polang uses a Hindley-Milner style type system with:
- Static typing: All types are determined at compile time
- Type inference: Types can be inferred from context, reducing annotation burden
- Polymorphic functions: Functions can work with multiple types via type variables
- Explicit conversions only: No implicit type coercions; use
asfor numeric conversions
The type system is implemented in two phases:
- Parser-level type checking: Validates types, detects errors early, performs Hindley-Milner type inference, and performs capture analysis for closures
- MLIR-level passes: Resolves type parameters in generic functions via monomorphization
Polang supports a variety of numeric types with explicit width and signedness, plus a boolean type:
| Type | Description | Size | MLIR Type | LLVM Type |
|---|---|---|---|---|
i8 |
Signed 8-bit integer | 8-bit | !polang.integer<8, signed> |
i8 |
i16 |
Signed 16-bit integer | 16-bit | !polang.integer<16, signed> |
i16 |
i32 |
Signed 32-bit integer | 32-bit | !polang.integer<32, signed> |
i32 |
i64 |
Signed 64-bit integer | 64-bit | !polang.integer<64, signed> |
i64 |
u8 |
Unsigned 8-bit integer | 8-bit | !polang.integer<8, unsigned> |
i8 |
u16 |
Unsigned 16-bit integer | 16-bit | !polang.integer<16, unsigned> |
i16 |
u32 |
Unsigned 32-bit integer | 32-bit | !polang.integer<32, unsigned> |
i32 |
u64 |
Unsigned 64-bit integer | 64-bit | !polang.integer<64, unsigned> |
i64 |
| Type | Description | Size | MLIR Type | LLVM Type |
|---|---|---|---|---|
f32 |
Single-precision float | 32-bit | !polang.float<32> |
f32 |
f64 |
Double-precision float | 64-bit | !polang.float<64> |
f64 |
| Type | Description | Size | MLIR Type | LLVM Type |
|---|---|---|---|---|
isize |
Signed index (pointer width) | Platform-dependent | !polang.index<signed> |
i64 (platform-dependent) |
usize |
Unsigned index (pointer width) | Platform-dependent | !polang.index<unsigned> |
i64 (platform-dependent) |
Index types map to the platform-native pointer width. usize is intended for array indexing (Phase 2). Both types support explicit casting with as to/from integer and float types.
| Type | Description | Size | MLIR Type | LLVM Type |
|---|---|---|---|---|
bool |
Boolean | 1-bit | !polang.bool |
i1 |
| Type | Description | Size | MLIR Type | LLVM Type |
|---|---|---|---|---|
() |
Unit type (single value) | 0-bit | !polang.unit |
n/a |
The unit type () has exactly one value, also written (). It is used in function type signatures:
- Zero-parameter functions:
f : () -> i64declares a functionfthat takes no arguments and returnsi64 - Side-effect functions:
f : () -> ()declares a function that takes no arguments and returns no meaningful value
In type signatures, () appears as the parameter type for zero-parameter functions. Previously, zero-parameter functions used bare return type syntax (e.g., f : i64); the () -> T form makes the function nature explicit.
(* Zero-parameter function *)
get_value : () -> i64
get_value() = 42
(* Side-effect function (future use) *)
do_something : () -> ()
Type names are defined as compile-time constants in parser/include/parser/polang_types.hpp:
struct TypeNames {
// Signed integers
static constexpr const char* I8 = "i8";
static constexpr const char* I16 = "i16";
static constexpr const char* I32 = "i32";
static constexpr const char* I64 = "i64";
// Unsigned integers
static constexpr const char* U8 = "u8";
static constexpr const char* U16 = "u16";
static constexpr const char* U32 = "u32";
static constexpr const char* U64 = "u64";
// Floating-point
static constexpr const char* F32 = "f32";
static constexpr const char* F64 = "f64";
// Index types
static constexpr const char* ISIZE = "isize";
static constexpr const char* USIZE = "usize";
// Other types
static constexpr const char* BOOL = "bool";
static constexpr const char* FUNCTION = "function";
static constexpr const char* TYPEVAR = "typevar";
static constexpr const char* UNKNOWN = "unknown";
static constexpr const char* UNIT = "()";
};When type inference encounters a numeric literal without explicit type context, it defaults to:
- Integer literals →
i64(signed 64-bit) - Float literals →
f64(64-bit double precision)
This means 42 has type i64 and 3.14 has type f64 by default.
Polang supports explicit type conversions between numeric types using the as keyword. All conversions must be explicit; there are no implicit type coercions.
The as keyword converts a value from one numeric type to another:
expression as TargetType
Examples:
a : i32
a = 1000
b : i64
b = a as i64 (* widen i32 to i64 *)
c : i8
c = a as i8 (* narrow i32 to i8 (truncates) *)
d : f64
d = a as f64 (* convert integer to float *)
e : i32
e = 3.14 as i32 (* convert float to integer (truncates toward zero) *)
The as operator has higher precedence than arithmetic operators but lower precedence than unary operators:
| Precedence | Operators | Associativity |
|---|---|---|
| 9 | . (member access) |
Left |
| 8 | Unary -, ! |
Right (prefix) |
| 7 | as (type conversion) |
Left |
| 6 | *, / |
Left |
| 5 | +, - |
Left |
| 4 | ==, !=, <, <=, >, >= |
Non-associative |
| 3 | && |
Left |
| 2 | ` | |
| 1 | if/then/else, let/in |
Right |
Precedence examples:
a + b as i32 (* means: a + (b as i32) *)
-x as i32 (* means: (-x) as i32 *)
a * b as f64 + c (* means: (a * (b as f64)) + c *)
(a + b) as i32 (* parentheses override precedence *)
Only numeric-to-numeric conversions are permitted:
| From Type | To Type | Allowed |
|---|---|---|
Any integer (i8-i64, u8-u64) |
Any integer | ✓ |
| Any integer | Any float (f32, f64) |
✓ |
| Any integer | Any index (isize, usize) |
✓ |
| Any float | Any integer | ✓ |
| Any float | Any float | ✓ |
| Any float | Any index | ✓ |
| Any index | Any integer | ✓ |
| Any index | Any float | ✓ |
| Any index | Any index | ✓ |
bool |
Any type | ✗ |
| Any type | bool |
✗ |
Type error examples:
x : i32
x = true as i32 (* Error: cannot convert bool to i32 *)
b : bool
b = 1 as bool (* Error: cannot convert i32 to bool *)
Converting to a larger integer type preserves the value exactly:
| Conversion | Operation | Example |
|---|---|---|
| Signed → Larger signed | Sign-extend | (-1i8) as i64 → -1i64 |
| Unsigned → Larger unsigned | Zero-extend | 255u8 as u64 → 255u64 |
| Unsigned → Larger signed | Zero-extend | 255u8 as i64 → 255i64 |
| Signed → Larger unsigned | Sign-extend then reinterpret | (-1i8) as u64 → 18446744073709551615u64 |
Converting to a smaller integer type truncates the value, keeping only the low-order bits (wrap-around behavior):
a : i32
a = 256
b : i8
b = a as i8 (* b = 0 (256 mod 256) *)
c : i32
c = 257
d : i8
d = c as i8 (* d = 1 (257 mod 256) *)
e : i32
e = -1
f : u8
f = e as u8 (* f = 255 (bit pattern preserved) *)
Converting between signed and unsigned types of the same width reinterprets the bit pattern:
a : i32
a = -1
b : u32
b = a as u32 (* b = 4294967295 (same bits, different interpretation) *)
c : u32
c = 4294967295
d : i32
d = c as i32 (* d = -1 *)
Converting f32 to f64 is always exact with no precision loss:
a : f32
a = 3.14
b : f64
b = a as f64 (* exact conversion *)
Converting f64 to f32 rounds to the nearest representable value (IEEE 754 round-to-nearest, ties-to-even):
a : f64
a = 3.141592653589793
b : f32
b = a as f32 (* rounded to f32 precision *)
Very large f64 values may overflow to ±infinity when converted to f32.
Integer values are converted to the nearest representable floating-point value:
a : i64
a = 42
b : f64
b = a as f64 (* b = 42.0 *)
c : i64
c = 9007199254740993 (* larger than f64 can represent exactly *)
d : f64
d = c as f64 (* may lose precision *)
Note: f64 has 53 bits of mantissa precision, so i64 values larger than 2^53 may not be exactly representable.
Float-to-integer conversion uses saturating truncation toward zero:
- Truncation: The fractional part is discarded, rounding toward zero
- Saturation: Values outside the target range are clamped to the minimum or maximum
- NaN handling:
NaNconverts to0
(* Truncation toward zero *)
a : i32
a = 3.7 as i32 (* a = 3 *)
b : i32
b = -3.7 as i32 (* b = -3 *)
(* Saturation at bounds *)
c : i8
c = 1000.0 as i8 (* c = 127 (i8 max) *)
d : i8
d = -1000.0 as i8 (* d = -128 (i8 min) *)
e : u8
e = -1.0 as u8 (* e = 0 (u8 min, since negative) *)
f : u8
f = 1000.0 as u8 (* f = 255 (u8 max) *)
(* Infinity handling *)
g : i32
g = (1.0 / 0.0) as i32 (* g = 2147483647 (i32 max) *)
h : i32
h = (-1.0 / 0.0) as i32 (* h = -2147483648 (i32 min) *)
(* NaN handling *)
i : i32
i = (0.0 / 0.0) as i32 (* i = 0 *)
Converting an integer to isize or usize uses arith.index_cast (signed) or arith.index_castui (unsigned):
a : i64
a = 42
b : isize
b = a as isize (* signed index cast *)
c : usize
c = a as usize (* unsigned index cast *)
Converting isize or usize to an integer type uses the corresponding signed or unsigned index cast:
a : isize
a = 42 as isize
b : i64
b = a as i64 (* signed index cast to i64 *)
Float-to-index and index-to-float conversions use a two-step process via i64:
a : f64
a = 3.14
b : isize
b = a as isize (* float → i64 (saturating truncation) → index *)
c : isize
c = 42 as isize
d : f64
d = c as f64 (* index → i64 → float *)
Numeric literals adapt to their declared type context when the value fits:
a : i8
a = 42 (* OK: 42 fits in i8, literal is i8 *)
b : i16
b = 1000 (* OK: 1000 fits in i16, literal is i16 *)
c : f32
c = 3.14 (* OK: literal becomes f32 *)
d : u8
d = 200 (* OK: 200 fits in u8 *)
Compile-time error if the literal value doesn't fit in the target type (see Literal Range Checking):
x : i8
x = 1000 (* Error: 1000 doesn't fit in i8 (-128 to 127) *)
y : u8
y = 256 (* Error: 256 doesn't fit in u8 (0 to 255) *)
z : u8
z = -1 (* Error: -1 doesn't fit in u8 (unsigned) *)
Integer literals also adapt to index type contexts:
x : isize
x = 42 (* OK: literal becomes isize *)
y : usize
y = 100 (* OK: literal becomes usize *)
When no type context is available, literals default to i64 (integers) or f64 (floats):
a = 42 (* a: i64 *)
b = 3.14 (* b: f64 *)
Boolean values cannot be converted to or from numeric types using as. Use explicit expressions instead:
(* Instead of: x as bool (not allowed) *)
is_nonzero : bool
is_nonzero = x != 0
(* Instead of: b as i32 (not allowed) *)
int_value : i32
int_value = if b then 1 else 0
This design ensures that boolean semantics are always explicit in the code.
When an integer literal is assigned to a typed context (variable with type annotation, function argument, or let binding), the compiler verifies at compile time that the literal value fits within the target type's range. This is specified in §5.1 and §12.4.
Error format:
Type error: literal <value> does not fit in type <type> (range <min> to <max>) at line L, column C
Contexts that trigger range checking:
- Variable declarations with type annotations:
x : i8 = 1000 - Let bindings with type annotations:
let x : i8 = 1000 in x - Function call arguments:
f(1000)wheref : i8 -> i8 - Polymorphic function arguments (after type resolution)
- Negated integer literals:
x : u8 = -1orx : i8 = -200
Contexts that do NOT trigger range checking:
ascasts:300 as i8is an explicit narrowing conversion that truncates- Default type resolution: bare
42defaults toi64and always fits - Generic/unresolved types:
{int}hasn't been resolved yet isize/usize: platform-dependent width, no fixed range at compile time
Integer type ranges:
| Type | Min | Max |
|---|---|---|
i8 |
-128 | 127 |
i16 |
-32768 | 32767 |
i32 |
-2147483648 | 2147483647 |
i64 |
-9223372036854775808 | 9223372036854775807 |
u8 |
0 | 255 |
u16 |
0 | 65535 |
u32 |
0 | 4294967295 |
u64 |
0 | 18446744073709551615 |
Implementation: isLiteralInRange() and getIntegerRangeString() in polang_types.hpp, checkLiteralRange() in type_checker.cpp.
The parser's type checker (parser/src/type_checker.cpp) handles error detection, type inference via Hindley-Milner unification, and capture analysis. Untyped polymorphic parameters are marked as typevar and represented as !polang.type_param in MLIR for monomorphization.
Type Checker Responsibilities:
| Responsibility | Description |
|---|---|
| Error detection | Undefined variables, arity mismatch |
| Type validation | Validates explicit type annotations match usage |
| Capture analysis | Identifies free variables for closures via FreeVariableCollector |
| Type variable setup | Marks untyped polymorphic parameters as typevar for MLIR monomorphization |
Example:
double_it(x) = x * 2 (* x marked as typevar, MLIR infers int *)
half(x) = x / 2.0 (* x marked as typevar, MLIR infers double *)
identity(x) = x (* x marked as typevar, MLIR infers from call site *)
Implementation:
The type checker uses the FreeVariableCollector class for capture analysis:
class FreeVariableCollector : public Visitor {
std::set<std::string> localNames;
std::set<std::string> referencedNonLocals;
void visit(const NIdentifier& node) override {
// If not locally defined, it's a free variable (capture)
if (localNames.find(node.name) == localNames.end()) {
referencedNonLocals.insert(node.name);
}
}
};If-expressions in Polang are typed based on their branches:
- Condition: Must be
bool - Branches: Both
thenandelsebranches must have the same type - Result: The if-expression's type is the type of both branches
Example:
a = if true then 1 else 2 (* a is int (branches are int) *)
b = if false then 1.0 else 2.0 (* b is double (branches are double) *)
c = if true then true else false (* c is bool (branches are bool) *)
Type Error:
x = if true then 1 else 2.0 (* Error: branches have different types *)
The type checker validates branch type consistency in TypeChecker::visit(const NIfExpression&).
All untyped parameters are marked as type variables by the parser. Type inference happens primarily at the AST level in the TypeChecker. For polymorphic functions, the MLIR representation uses !polang.type_param<"name"> type parameters in polang.generic_func operations, which are resolved during monomorphization.
Example:
identity(x) = x (* x marked as typevar *)
identity(42) (* call site constrains x to int *)
Pipeline:
Source: identity(x) = x
↓
Parser: x marked as TYPEVAR (type checker)
↓
MLIRGen: generic_func @identity<a>(%arg0: !polang.type_param<"a">) -> !polang.type_param<"a">
↓
Monomorphization: Creates identity$i64 specialized version from instantiate call sites
↓
Result: fn identity$i64(%arg0: !polang.integer<64, signed>) -> !polang.integer<64, signed>
Type variables represent unknown types that will be resolved during type inference.
In the parser, type variables are represented as the string "typevar":
// When local inference fails, assign a type variable
mutable_arg.type = new NIdentifier(TypeNames::TYPEVAR);In MLIR, type parameters are named types used in generic function templates:
!polang.type_param<"a"> ; Named type parameter "a"
!polang.type_param<"b"> ; Named type parameter "b"Type parameters use ML-style naming convention ('a, 'b, etc.) and appear in polang.generic_func signatures. They are bound to concrete types by polang.instantiate operations at call sites.
Type parameters are defined in mlir/include/polang/Dialect/PolangTypes.td:
def Polang_TypeParamType : Polang_Type<"TypeParam", "type_param"> {
let summary = "Named type parameter for generic functions";
let description = [{
Represents a named type parameter in a generic function template.
Uses ML-style naming convention ('a, 'b, etc.).
Example: !polang.type_param<"a">
}];
let parameters = (ins StringRefParameter<"">:$name);
let assemblyFormat = "`<` $name `>`";
}
The MLIRGenVisitor generates type parameters for untyped function parameters in generic functions. Type parameters are named based on parameter position (e.g., "a", "b", "c").
Type inference uses the standard Hindley-Milner unification algorithm at the AST level in the TypeChecker (parser/src/type_checker.cpp). The algorithm:
-
Collects constraints from expressions:
- Arithmetic operations: operands and result must have same type
- Function calls: argument types must match parameter types
- Return statements: return value must match function return type
-
Unifies types using the standard algorithm:
- Type variables can be bound to concrete types or other type variables
- Occurs check prevents infinite types
- Constraints are solved to produce a substitution mapping
-
Applies substitution to resolve all type variables to concrete types
At the MLIR level, type parameters in polang.generic_func operations are resolved during monomorphization, where specialized function copies are created for each unique set of concrete type arguments from polang.instantiate call sites.
parser/
├── include/parser/
│ ├── polang_types.hpp # Type constants (INT, DOUBLE, BOOL, TYPEVAR)
│ ├── type_checker.hpp # Type checker interface
│ └── type_inference.hpp # Unification, substitution, trait constraints,
│ # and function signature types (MonoSignature,
│ # PolymorphicSignature, FunctionSignature variant)
└── src/
└── type_checker.cpp # Type inference, error detection, capture analysis
# Contains: TypeChecker, FreeVariableCollector
mlir/
├── include/polang/
│ ├── Dialect/
│ │ ├── PolangTypes.td # TypeParamType definition
│ │ └── PolangTypes.h # Generated type classes
│ └── Transforms/
│ └── Passes.h # Monomorphization pass declarations
└── lib/
├── Dialect/
│ ├── PolangTypes.cpp # Type implementations
│ └── PolangTypeInference.cpp # MLIR-level type inference pass
└── Transforms/
└── Monomorphization.cpp # Function specialization for call sites
The type checker tracks function signatures using a std::variant-based type defined in type_inference.hpp:
MonoSignature: Concrete parameter types and return type (e.g.,(i64, i64) -> i64)PolymorphicSignature: Type-parameterised signature with trait bounds (e.g.,forall 'a:Numeric. ('a, 'a) -> 'a)FunctionSignature:std::variant<MonoSignature, PolymorphicSignature>— the type checker stores one entry per function infunctionSignatures
This variant design makes the monomorphic/polymorphic distinction explicit at the type level, preventing bugs from accidentally treating one as the other.
1. Parser builds AST
└── Type annotations stored in NVariableDeclaration::type
2. Type checker runs (type_checker.cpp)
├── Validates explicit type annotations
├── Detects errors (undefined vars, arity)
├── Performs Hindley-Milner type inference (unification)
├── Stores function signatures in functionSignatures map
├── Performs capture analysis (FreeVariableCollector)
└── Marks untyped polymorphic parameters as TYPEVAR
3. MLIRGen generates MLIR (MLIRGen.cpp)
├── Converts types to MLIR types (!polang.integer, etc.)
├── Generates polang.generic_func with !polang.type_param<"name"> for polymorphic functions
└── Generates polang.instantiate at call sites with concrete type bindings
4. Type inference pass runs (PolangTypeInference.cpp)
└── Resolves residual type parameters in generic functions
5. Monomorphization pass runs (Monomorphization.cpp)
├── Creates specialized function copies for each unique type combination
└── Replaces polang.instantiate with polang.call to specialized functions
6. Lowering proceeds with fully-typed MLIR
Operation verifiers allow type parameters during intermediate stages (before monomorphization resolves them):
// In PolangOps.cpp
namespace {
bool typesAreCompatible(Type t1, Type t2) {
if (t1 == t2) return true;
// Allow type parameters - they will be resolved during monomorphization
if (isa<TypeParamType>(t1) || isa<TypeParamType>(t2)) return true;
return false;
}
}
LogicalResult ReturnOp::verify() {
// ...
if (!typesAreCompatible(getValue().getType(), expectedType))
return emitOpError("type mismatch");
return success();
}add(x: i64, y) = x + y
add(1, 2)
Inference:
xis explicitlyi64yis marked astypevarby the parser, inferred asi64fromx + y- Return type inferred as
i64(result of+) - Monomorphization creates
add$i64_i64
identity(x) = x
identity(42)
Inference:
xhas no local constraints → assigned as type parameter"a"- Return type depends on
x→ same type parameter"a" - Call
identity(42)provides concrete typei64viapolang.instantiate - Monomorphization creates specialized
identity$i64: (i64) -> i64
first(x, y) = x
first(1, 2.0)
Inference:
xandyboth get type variables- Call site:
xconstrained toi64(integer literal),yconstrained tof64(float literal) - Final type:
first$i64_f64: (i64, f64) -> i64
is_positive(x) = x > 0
is_positive(5)
Inference:
xmarked astypevarby parser, inferred as integer fromx > 0- At call site,
5is an integer literal → resolves toi64 - Return type is
bool(result of comparison) - Final type:
is_positive$i64: (i64) -> bool
add(x, y) = x + y
add(1.5, 2.0)
Inference:
xandyget type variables withFloatkind (from float literals at call site)- Float-kind type variables default to
f64 - Final type:
add$f64_f64: (f64, f64) -> f64
Polymorphic functions are specialized (monomorphized) for each unique set of argument types at call sites.
- Identification: Functions with type variables in their signature are marked as polymorphic
- Call Site Analysis: Each call to a polymorphic function records the concrete argument types
- Specialization: A new function is created for each unique type signature
- Name Mangling: Specialized functions use a mangled name (e.g.,
identity$int) - Call Update: Calls are updated to reference the specialized version
identity(x) = x
identity(42) (* Creates identity$i64 *)
identity(true) (* Creates identity$bool *)
Generated MLIR:
polang.generic_func @identity<a>(%arg0: !polang.type_param<"a">) -> !polang.type_param<"a"> { ... }
polang.func @identity$i64(%arg0: !polang.integer<64, signed>) -> !polang.integer<64, signed> { ... }
polang.func @identity$bool(%arg0: !polang.bool) -> !polang.bool { ... }Generic functions that are never instantiated remain as polang.generic_func in MLIR. They are erased during lowering to standard dialects (since there's no concrete type to lower to).
unused(x) = x (* Kept as polang.generic_func, erased during lowering *)
42 (* No call to unused, so no specialization created *)
Error messages include source location information (line and column) to help identify where problems occur.
Type checker errors use the format:
Type error: <message> at line <line>, column <column>
MLIR-level errors use MLIR's location format:
loc("<source>":line:column): error: <message>
When types cannot be unified, the compiler reports an error:
f : i64 -> i64
f(x) = x
f(3.14) (* Type error: argument 1 expects i64, got f64 *)
References to undefined variables report the location of the reference:
x + 1 (* Type error: Undeclared variable: x *)
Generic functions without call sites remain as polang.generic_func with their type parameters unresolved. They are erased during lowering but can be instantiated in subsequent REPL inputs.