Add statement-level IR with codegen, interpreter, and test suite by pgoodman · Pull Request #581 · trailofbits/multiplier

pgoodman · 2026-04-12T19:13:43Z

Summary

IR codegen (bin/Index/IRGen.cpp): Statement-level IR generation at index time. Produces per-function CFGs with nested expression trees, structural scoping (ENTER_SCOPE/EXIT_SCOPE), and explicit type/width information in every opcode.
Opcodes: Type-safe opcode set — signed integer (ADD/SUB/MUL/DIV/REM), unsigned (UDIV/UREM/USHR/UCMP_*), float (FADD_32/64, FMUL_32/64, FCMP_EQ_32/64, etc.), memory (width-specific LOAD/STORE, MEMCPY, MEMSET, bitfield access), control flow (COND_BRANCH, SWITCH with case values, GOTO with compensation blocks), and calling convention (EXPRESSION_SCOPE, PARAM_PTR, RETURN_PTR, STRING_PTR).
Interpreter (bin/InterpretIR/): Concrete interpreter for testing — walks the CFG, evaluates instructions, tracks memory. 16/24 test functions pass (all intraprocedural); 8 remaining need interprocedural call support.
Printer (bin/Examples/PrintIR.cpp): Human-readable IR printer with recursive sub-expression display, switch case values, and named instruction references.
Test suite (tests/InterpretIR/): 24 C test files covering arithmetic, casts, bitfields, control flow, goto/labels, switch (including nested), pointers, scopes, dynamic alloca, string literals, structs, unsigned ops, C23 features, and more.
Other: SQLite updated to 3.53.0. IR documentation in docs/IR.md.

Test plan

16/24 interpreter tests pass (bash tests/InterpretIR/run_tests.sh /path/to/db)
All 24 test files have IR annotations matching mx-print-ir output
VerifyBlocks checks: is_root flag, no post-terminator instructions, symmetric edges
No phantom predecessor edges across all functions
Index a large codebase (e.g., with nested switches) without crashes
Build on 32-bit target to verify target-aware size constants

🤖 Generated with Claude Code

Generates a per-function intermediate representation at index time by walking the PASTA AST. The IR is serialized as flat lists (functions, blocks, instructions, objects) inside each fragment's Cap'n Proto message. Key design decisions: - Statement-level CFG: expressions stay as nested instruction trees, short-circuit operators (&&, ||) and ternary (?:) are instructions not control-flow splits - Single unified OpCode enum (67 opcodes) covering constants, memory, arithmetic, casts, calls, terminators, variadic args, and unknown - Entity ID provenance: every instruction carries the source entity ID of the originating AST node; calls reference callee FunctionDecl, GEP fields reference FieldDecl, objects reference VarDecl - Flat layout with hierarchical entity IDs: IRBlockId embeds BlockKind, IRInstructionId embeds OpCode, enabling kind queries without loading entity data - Children-before-parents instruction ordering with parentOffset for efficient bottom-up and top-down traversal - Address-taken classification: local/localValue, parameter/parameterValue - Dominator and post-dominator trees computed at index time - break/continue/goto/label properly handled with loop stack - Explicit vs implicit goto and fallthrough opcodes - isConditionallyExecuted flag on short-circuit RHS and ternary branches - va_start/va_arg/va_end/va_copy/va_pack opcodes for variadic handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds an 8th `ir_` parameter slot to MX_FOR_EACH_ENTITY_CATEGORY and registers IRFunction, IRBlock, IRInstruction, IRObject as entity categories (values 15-18). Updates all 60 call sites across 25 files. Provider implementations (EntityProvider, SQLiteEntityProvider, CachingEntityProvider, InvalidEntityProvider) use MX_IGNORE for the IR slot since IR entities are stored inside fragments, not in separate tables. Generic infrastructure (forward declarations, VariantEntity, VariantId, EntityCategory enum, type mappings) includes the IR types. Adds dummy IR entity classes with stub id() methods and minimal Impl classes to satisfy the macro expansions. These will be fleshed out when the read-side API is implemented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regenerated via PythonBindings.py to pick up the VariantEntity change from adding IR entity categories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add IRFunction, IRBlock, IRInstruction, IRObject to ENTITY_KINDS - Include IR headers in Bootstrap/Python.cpp - Regenerate all Python bindings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enable IR entity getter/dispatch macros in Index.cpp, Reference.cpp, and all EntityProvider implementations. The provider methods return nullptr (stub) since IR data lives in fragments, not separate tables. Fix entity lister macros to IGNORE for IR (no IRFunctionKind enum). Add IR includes to all provider files and EntityProvider.h. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- PredefinedExpr (__func__ etc.) unwraps to its StringLiteral child - Add manual SQLite provider stubs for IR entity getters/listers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

InitListExpr ({a, b, c}) is now a distinct instruction with all initializer values as operands, rather than silently returning the last value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Reorder IR generation before AST serialization in PersistFragment - Split GenerateAndSerializeIR into GenerateIR + SerializeIR - Add ir_for_entity reverse map to EntityMapper (AST entity ID → IR instruction entity ID), populated during IR generation - Add IRInstructionId() lookup method to EntityMapper - Modify PASTA.cpp bootstrap to add ir_instruction() field to Decl and Stmt protos, serialized via es.IRInstructionId(e) Requires running mx-bootstrap-pasta to regenerate AST.capnp, Serialize.cpp/h, PASTA.cpp/h, and the API code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- CXXThisExpr emits a load (stub -- needs proper this parameter object) - PredefinedExpr unwraps to its StringLiteral child TODO: Model 'this' as an explicit parameter object in C++ methods, and pass the base object as 'this' at CXXMemberCallExpr call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Explicit 'this' parameter object (THIS_PARAMETER ObjectKind) for C++ instance methods - CXXThisExpr references the this object via ADDRESS_OF - CXXMemberCallExpr emits METHOD_CALL or VIRTUAL_METHOD_CALL with the implicit object argument as op[0] - CXXNewExpr emits NEW or NEW_ARRAY with allocated type and placement args - CXXDeleteExpr emits DELETE or DELETE_ARRAY Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Placement new/new[] get their own opcodes (PLACEMENT_NEW, PLACEMENT_NEW_ARRAY) separate from regular new/new[]. Placement address and extra placement arguments are operands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Template-dependent and unresolved expressions (ParenListExpr, CXXDependentScopeMemberExpr, DependentScopeDeclRefExpr, etc.) emit UNKNOWN without DCHECK -- these are "known unknowns" that we can't lower meaningfully. The DCHECK fires only for expressions we should handle but haven't implemented yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Block arguments (BLOCK_ARG_DEF opcode, numArguments on blocks, args on branch targets) were never implemented -- no instructions were ever emitted, numArguments was always 0, args was always empty. Remove all traces and renumber opcodes. The IR is alloca-based. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Real IRFunction, IRBlock, IRInstruction, IRObject classes backed by fragment capnp readers. IRInstruction provides: opcode, operands (top-down), parent_instruction (bottom-up via parentOffset), parent_block, source_statement, is_terminator, is_conditionally_executed, and raw field accessors for derived class use. IRBlock provides: kind, all_instructions (post-order), instructions (top-level roots only), successors/predecessors, dominator tree queries. IRFunction provides: declaration, entry_block, blocks (RPO), objects. IRObject provides: kind, size, alignment, needs_memory. All entity resolution goes through entity IDs -- no names stored. Fragment-local navigation (operands, blocks, etc.) creates new Impl objects from the same FragmentImplPtr. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- SQLiteEntityProvider IR getters extract fragment_id from entity ID, load the fragment, and create Impl with the offset - IRFunction::from(FunctionDecl) searches the containing fragment's IR function list for a matching funcDeclEntityId - IRFunction::declaration() resolves the funcDeclEntityId back to a FunctionDecl - Install IR headers via cmake install rules - All IR entities are now end-to-end accessible: create index, find FunctionDecl, get its IR, iterate blocks/instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove GEP_INDEX opcode; PTR_ADD handles all pointer+index ops - Binary ptr+int and int+ptr now emit PTR_ADD instead of ADD - Binary ptr-ptr emits PTR_DIFF - Binary ptr-int emits PTR_ADD(ptr, NEG(int)) - ArraySubscriptExpr uses PTR_ADD (element size derived from type) - Strip IR.capnp to bare field definitions (no comments) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PTR_ADD now stores typeEntityId (element type) and sizeBytes (element size/scale) so the consumer has full stride information without chasing through the type chain. For ptr-int subtraction, the index is NEG'd and the scale stays positive. - ptr + int → PTR_ADD(ptr, int, type=elem, size=sizeof(elem)) - ptr - int → PTR_ADD(ptr, NEG(int), type=elem, size=sizeof(elem)) - ptr - ptr → PTR_DIFF(ptr, ptr) - arr[i] → PTR_ADD(arr, i, type=elem, size=sizeof(elem)) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

target_entity_id, type_entity_id, object_entity_id, int_value, uint_value, float_value, width, size_bytes, flags, compound_op are all opcode-specific and belong on derived instruction classes, not the base. The base class keeps only: opcode, id, operands, parent, source_statement, parent_block, is_terminator, is_conditionally_executed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Users get IR entities via optional-returning APIs, not by constructing them directly. The implicit bool conversion was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

One-line descriptions for each field. These won't change frequently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fragment-level entityPool and intPool replace per-instruction lists. Instruction: 7 fields (entityOffset, constOffset, parentOffset, numOperands, opcode, constWidth, flags) vs previous 17 fields. Block: 7 fields (entityOffset, numInstructions, numSuccessors, numPredecessors, numDominators, numPostDominators, kind). Function: 5 fields (funcDeclEntityId, entryBlockId, numBlocks, numObjects, entityOffset). Entity pool layout per instruction: [parentBlockId, sourceEntityId, operands..., extras...] Entity pool layout per block: [instructions..., successors..., predecessors..., idom+dominators..., ipdom+postDominators...] Entity pool layout per function: [blocks_rpo..., objects...] Int pool stores constants, switch values, byte offsets, element sizes, and compound opcodes. All pools are fragment-level List(UInt64)/List(Int64) shared across all IR entities in the fragment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove METHOD_CALL, VIRTUAL_METHOD_CALL, NEW, NEW_ARRAY, PLACEMENT_NEW, PLACEMENT_NEW_ARRAY, DELETE, DELETE_ARRAY opcodes. Remove THIS_PARAMETER ObjectKind. Remove CXXThisExpr, CXXMemberCallExpr, CXXNewExpr, CXXDeleteExpr handlers from IRGen. These were half-implemented and broken. C++ constructs now emit UNKNOWN with source entity ID for inspection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Every value-producing instruction has its result type entity ID at pool position 2. Effect instructions (store, terminators, va_*, unknown) skip the type slot. The opcode determines the layout. - Pool position 0 is parentBlockOrInstruction: IRBlockId for roots, IRInstructionId for sub-expressions. Dropped parentOffset field from Instruction capnp (now 6 fields, 12 bytes). - Dropped parent_block_index from InstructionIR; parent block is derived from the parent chain or from which BlockIR contains the instruction. - Added VerifyBlocks() post-generation pass that checks: - Entry block exists - Every block has at least one instruction - Last top-level instruction in each block is a terminator - Successor/predecessor edges are symmetric Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All ALLOCA instructions are now emitted in the entry block before any control flow, matching LLVM's convention. EmitEntryBlockAllocas walks the function body pre-scan to find all VarDecls and emits their allocas upfront. EmitDeclStmt now only emits the initialization store at the original location. This ensures an interpreter/analysis doesn't re-allocate stack slots on every loop iteration or branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each instruction now has a `users` list (List(UInt64) in capnp) of entity IDs for instructions that use this instruction's value as a data-flow operand. Computed during serialization by reversing the operand map. Read-side API: IRInstruction::users() generator and num_users(). This enables taint propagation (follow users forward), backward slicing (follow operands backward), and other data-flow analyses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New IndexVersion struct with index_id (random uint64 generated at index creation) and version (monotonically increasing). Replaces the separate VersionNumber return with a combined type. Supports ==, !=, and same_index() for comparing index identity vs state. - index_id stored in new index_id SQLite table - Index::version() returns IndexVersion - EntityProvider::GetIndexVersion() virtual method Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Derived instruction classes with typed accessors: Constants: ConstIntInst (signed/unsigned value, width, type), ConstFloatInst (value, width, type), ConstNullInst (type) Memory: AllocaInst (allocated_type, object), LoadInst (address, loaded_type), StoreInst (address, stored_value), AddressOfInst (type, object) Field/Pointer: GEPFieldInst (base, result_type, field, byte_offset), PtrAddInst (base, index, result_type, element_type, element_size) Arithmetic: BinaryInst (lhs, rhs, result_type), ComparisonInst (lhs, rhs, result_type), UnaryInst (operand, result_type) Cast: CastInst (operand, result_type) Call: CallInst (result_type, target, is_indirect, arguments) Compound: IncDecInst, CompoundAssignInst, SelectInst, CopyInst Aggregate: InitListInst (elements, result_type) Terminators: RetInst (return_value), BranchInst (target_block), CondBranchInst (condition, true_block, false_block), SwitchInst, UnreachableInst, UnknownInst All classes use static from(IRInstruction) for downcasting. Pool positions are computed from opcode to read the right data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace switch_values vector with switch_cases (low/high pairs) - Support GNU range cases (case 1...5:) via CaseStatementIsGNURange - Store case count in int pool for reliable deserialization - Entity pool extras: [caseType, case0_block, ..., default_block] - Int pool: [num_cases, case0_low, case0_high, ...] - SwitchCaseValue struct with low, high, block, is_range() - SwitchInst::cases() returns SwitchCaseValue generator - SwitchInst::case_type(), num_cases(), default_block() - Non-default cases sorted before default in branch targets Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Switch cases are now their own entity with AST provenance. Each case stores low/high values (supporting GNU range cases), target block ID, source CaseStmt/DefaultStmt entity ID, value type, and is_default flag. - New IRSwitchCaseId in Types.h with Pack/Unpack support - New SwitchCase capnp struct in IR.capnp - irSwitchCases flat list in Fragment - IRSwitchCase public API class with typed accessors - Provider stubs (SQLiteEntityProvider, Invalid, Caching) - Added to MX_FOR_EACH_ENTITY_CATEGORY ir_ slot WIP: Switch instruction operands not yet converted to case entity IDs. Serializer not yet updated to emit SwitchCase entities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

String literal ALLOCA size used sl->Tokens().Data().size() which returns the source text length (includes quotes, wrong count). Now uses sl->ByteLength() which returns the actual byte length including null terminator. MEMCPY size for string init also uses ByteLength() to avoid copying past the literal's storage when Clang widens the type to match the destination array. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clang's getByteLength() returns strlen without the null terminator. The null terminator size depends on the character width: 1 for char, 2 for char16_t/wchar_t(16), 4 for char32_t/wchar_t(32). Using CharacterByteWidth() instead of hardcoded +1, matching Clang's CodeGen approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New checks in test_string_literals: - L"hi" (wchar_t, 4 bytes per char on this platform) - u"abc" (char16_t, 2 bytes per char, C11) - U"ab" (char32_t, 4 bytes per char, C11) - Oversized destination: char x[20] = "hi" (3 bytes copied, rest zero) Tests CharacterByteWidth() for null terminator sizing across different string literal kinds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test_c23.c: tests C23 features with -std=c23: - bool/true/false keywords - typeof - binary literals (0b...) - digit separators (1'000'000) - static_assert without message - nullptr - u8 character literals - auto type inference Added #embed as an IR gap (C23 feature not handled). Updated compile_commands.json and run_tests.sh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…value Added helper functions that always set both int_value and uint_value when creating CONST instructions, preventing the recurring bug where only one value slot was set. Fixed character literal codegen (u8'A', L'x', etc.) to set uint_value alongside int_value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Character literals now use llvm::APInt(width, value) to properly compute getSExtValue() and getZExtValue() based on the character width (8/16/32 bits). This ensures int_value and uint_value are correctly sign/zero extended from the source width, matching how IntegerLiteral already handles it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Int pool values are host byte order. Index is architecture-specific (type sizes, alignment, ABI are target-dependent). Cross-endian DB portability is not a goal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

String literal content is available from the AST at interpreter time via StringLiteral::Bytes() (raw bytes in target byte order). The IR doesn't need to emit per-character STOREs — the interpreter resolves the STRING_LITERAL object's source_entity_id to the StringLiteral expression and copies Bytes() into the allocated storage. Note: Bytes() does NOT include the null terminator. The object size is ByteLength() + CharacterByteWidth(). Consumers must zero-fill the last CharacterByteWidth() bytes (or zero-fill the whole object first, then copy Bytes()). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

StringLiteral::Bytes() returns raw bytes in target byte order WITHOUT the trailing null terminator. Object size_bytes() INCLUDES the null (ByteLength + CharacterByteWidth). Consumers must zero-fill then copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

String literal initialization always uses MEMCPY regardless of whether the size is scalar (1/2/4/8). EmitRValue for a string literal returns a pointer (the ALLOCA), not a scalar value. Previously, u"abc" (8 bytes) was treated as a scalar STORE_LE_64, writing the ALLOCA pointer into the destination instead of copying the string content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All string literal initializations now correctly use MEMCPY regardless of whether the size is a scalar power-of-2. Includes wide string (L"hi"), u16 (u"abc"), u32 (U"ab"), and oversized destination fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Struct and array initialization now always uses MEMCPY regardless of size. Previously, 8-byte structs (e.g., struct { int x; int y; }) used LOAD_LE_64 + STORE_LE_64, which works for data but loses pointer identity through the interpreter's shadow map for struct members that are pointers. MEMCPY preserves the byte-level copy semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ling Codegen fixes: - Add UCMP_LT/LE/GT/GE for unsigned integer and pointer comparisons - Add FCMP_EQ/NE/LT/LE/GT/GE_32/_64 for width-specific float comparisons - Add FADD/FSUB/FMUL/FDIV/FREM/FNEG_32/_64 for width-specific float arithmetic - Add STRING_PTR opcode replacing ALLOCA/STRING_LITERAL for string literals - Fix alignment double-conversion (pasta returns bytes, was dividing by 8 again) - Fix ImplicitValueInitExpr to use correct type width for zero constants - Fix ++/-- delta constants to use operand width (not hardcoded INT64) - Replace all hardcoded UINT64 size constants with target-aware EmitSizeConst - Derive sizeof/offsetof/__builtin_object_size width from expression type - Cache size_t and ptrdiff_t entity IDs from ASTContext at construction time - Pointer comparisons use unsigned semantics (IsAnyPointerType check) - Float comparisons and arithmetic never use integer opcodes Structural integrity: - ALLOCAs are roots in FRAME block (is_root flag on InstructionIR) - Lazy IF_MERGE block creation (only when a branch falls through) - Dead-code skipping in EmitBody (continue past terminated blocks) - SwitchToDeadBlock() immediately terminates with IMPLICIT_UNREACHABLE - Delegate CompoundStmt from EmitStmt to EmitBody (single code path) - Fix is_function_body check to use entity ID match, not structure kind - Remove spurious switch dispatch entries from pending_gotos_ - VerifyBlocks checks: is_root, parent_instruction_index, no post-terminator insns Interpreter: - STRING_PTR handler populates memory from StringLiteral::bytes() - UCMP/FCMP with proper uint64_t/double comparison - Pointer comparison using (object_id, offset) pairs - FADD/FSUB/FMUL/FDIV/FREM/FNEG with float semantics - Integer ADD/SUB/MUL/DIV/REM no longer handle floats Printer: - SWITCH shows case values with target blocks - Sub-expressions printed recursively (full depth, deduplicated) Other: - Update vendored SQLite to 3.53.0 - LOG(ERROR) in catch blocks instead of silent DCHECK(false) - macOS grep -P fix in run_tests.sh - New tests: pointer comparisons, float comparisons, scope/goto edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d into inner switches collect_cases was recursively walking all children including nested SwitchStmt bodies, adding inner switch case labels to the outer switch's case list. This caused structure_index to be UINT32_MAX for the inner cases (since emit_case_bodies only processed the outer switch's direct cases), leading to a crash in serialization. Similarly, emit_case_bodies' CompoundStmt handler now dispatches nested SwitchStmts to EmitStmt instead of recursing into them. Added test cases for nested switches (test 5: basic nesting, test 6: nested switch with outer fallthrough). Also fixed is_function_body detection to use entity ID match instead of structure kind check (prevents nested CompoundStmts from being mistakenly identified as the function body). Also delegated CompoundStmt handling from EmitStmt to EmitBody (single code path for dead-code skipping and scope management). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove .claude/, .mcp.json, *_PROMPT.md, bin/Bootstrap/mx-workspace/, and tests/InterpretIR/mx-workspace/ from git tracking. Update .gitignore to prevent them from being re-added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The try/catch(...) blocks were hiding real failures during shutdown. Wrap ExitRecords teardown in an exclusive transaction and reset the metadata check statement before writing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Every integer arithmetic, comparison, bitwise, shift, and unary opcode is now width-specific (_8/_16/_32/_64). Pointer-producing ops have _32/_64 variants. Atomic and overflow-checked ops are sized. No unsized integer or pointer opcodes remain in the OpCode enum. The interpreter is width-correct: every sized operation casts operands to the declared width before operating. Float ops use float precision for _32 and double for _64. Casts (SEXT, ZEXT, TRUNC, int↔float, float↔float) all operate at their declared source/destination widths. Fixes: float compound assign (was using integer ADD), unsigned compound assign signedness (>>=, /=, %=), RMW float memory read, TRUNC sign extension, SEXT no-op assumption, BITCAST pass-through, PTR_TO_I32 truncation, unsigned-to-float sign extension. Removes EXPECT/ASSUME from BitwiseOp (compiler hints, not operations). Adds 10 interpreter test files covering width overflow, float precision, cast precision, and all RMW variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BITWISE is now width-specific (BITWISE_8/16/32/64) so the interpreter can dispatch CLZ/CTZ/POPCOUNT/FFS/PARITY/ROTL/ROTR at the correct width. ABS is a sized unary opcode (ABS_8/16/32/64), not a BitwiseOp sub-opcode. EXPECT and ASSUME removed entirely (compiler hints with no runtime semantics). Also fixes SEXT handler (was a no-op, now casts to source width), TRUNC handler (was masking without sign-extend), BITCAST (was pass-through, now reinterprets bits), and PTR_TO_I32 (was not truncating). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Every FloatOp sub-opcode now has _32 (float) and _64 (double) variants. The interpreter uses float-precision functions (sinf, fabsf, etc.) for _32 and double-precision for _64. IRGen selects the variant based on the argument type width. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

numOperands is UInt8 in the capnp schema (max 255). Most instructions have 0-3 operands; CALL has one per argument. Assert in debug builds that we never silently truncate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

OpCode max value is currently 250 (ATOMIC_EXCHANGE_64). The capnp schema stores it as UInt8. Assert in debug builds that we never silently truncate if the enum grows past 255. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pgoodman and others added 30 commits March 27, 2026 14:59

Regenerate Python bindings for IR entity types

60bc017

Regenerated via PythonBindings.py to pick up the VariantEntity change from adding IR entity categories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update PythonBindings.py for IR entities and regenerate bindings

ce47696

- Add IRFunction, IRBlock, IRInstruction, IRObject to ENTITY_KINDS - Include IR headers in Bootstrap/Python.cpp - Regenerate all Python bindings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Handle PredefinedExpr and fix SQLite IR entity stubs

6c29184

- PredefinedExpr (__func__ etc.) unwraps to its StringLiteral child - Add manual SQLite provider stubs for IR entity getters/listers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add INIT_LIST opcode for aggregate initialization

540261e

InitListExpr ({a, b, c}) is now a distinct instruction with all initializer values as operands, rather than silently returning the last value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Distinguish PLACEMENT_NEW from NEW opcodes

c364b37

Placement new/new[] get their own opcodes (PLACEMENT_NEW, PLACEMENT_NEW_ARRAY) separate from regular new/new[]. Placement address and extra placement arguments are operands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Handle CXXNullPtrLiteralExpr, CXXBoolLiteralExpr, ParenListExpr

623a1b7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove operator bool from IR entity classes

a2468c5

Users get IR entities via optional-returning APIs, not by constructing them directly. The implicit bool conversion was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add field-level comments to IR.capnp

46ca1fc

One-line descriptions for each field. These won't change frequently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pgoodman and others added 16 commits April 10, 2026 07:46

Document byte order assumption in IR serialization

c15bd74

Int pool values are host byte order. Index is architecture-specific (type sizes, alignment, ABI are target-dependent). Cross-endian DB portability is not a goal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regenerate IR annotations with string literal size fixes

852c680

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pgoodman force-pushed the feature/ir-codegen branch from c4c6cea to c6265d7 Compare April 12, 2026 19:44

pgoodman and others added 6 commits April 12, 2026 16:20

Rename BSWAP16/32/64 to BSWAP_16/_32/_64 for consistent naming

bec076a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update IR documentation for sized opcodes, bitwise, and float changes

a843bc2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pgoodman requested a review from kumarak April 13, 2026 16:39

kumarak reviewed Apr 14, 2026

View reviewed changes

Comment thread bin/Index/SerializeIR.cpp

pgoodman and others added 2 commits April 13, 2026 20:57

kumarak approved these changes Apr 14, 2026

View reviewed changes

kumarak merged commit c620dcc into main Apr 14, 2026
2 checks passed

kumarak deleted the feature/ir-codegen branch April 14, 2026 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add statement-level IR with codegen, interpreter, and test suite#581

Add statement-level IR with codegen, interpreter, and test suite#581
kumarak merged 168 commits intomainfrom
feature/ir-codegen

pgoodman commented Apr 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pgoodman commented Apr 12, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants