Enable GPR regalloc for F32/F64 IR values on 64-bit targets by TholeG · Pull Request #89 · anthropics/claudes-c-compiler

TholeG · 2026-02-06T23:21:11Z

PR Draft: Enable Register Allocation For F32/F64 On 64-bit Targets

Authorship note: This PR draft (and the proposed code change) was prepared by Codex (OpenAI) in this workspace.

Title

Enable GPR regalloc for F32/F64 IR values on 64-bit targets

Problem / Motivation

CCC's linear-scan register allocator currently treats all floating-point IR
values as "non-GPR" and excludes them from register allocation. In practice,
the 64-bit backends (x86-64, AArch64, RISC-V 64) represent F32/F64 values
as raw bit patterns in a single general-purpose register (accumulator paths:
rax/x0/t0) and only move them into FP regs (xmm*/d*/ft*) at the
actual FP instruction boundary.

As a result, float-heavy code ends up with excessive stack spilling and can be
an order of magnitude slower than clang/gcc for hot FP loops (e.g. n-body).

What This PR Changes

Treat IrType::F32 and IrType::F64 values as GPR-eligible on 64-bit targets.
Keep excluding IrType::F128 (long double) and I128/U128 from the GPR
allocator (these still require special codegen paths).
Keep the current conservative behavior on 32-bit targets (i686): floats remain
excluded because they do not fit cleanly into a single GPR without additional
special handling.
Also includes a small doc/comment fix so cargo test --release passes:
the if_convert module docs contained indented pseudo-code that Rust treats
as doctests; the PR wraps those snippets in ```text fences.

Implementation Details

Files changed:

src/backend/regalloc.rs
src/passes/if_convert.rs (doc-only change; fixes failing doctests)

Key logic changes:

is_non_gpr_type now:
- Excludes only F128 + I128/U128 on 64-bit targets.
- Excludes floats + I64/U64 on 32-bit targets.
collect_non_gpr_values and Copy-chain propagation were updated accordingly
(float constants are only treated as non-GPR on 32-bit targets).

No backend-specific codegen changes are required because the 64-bit backends
already load/store F32/F64 values via the normal accumulator + bitpattern
representation (with fmov/movd/fmv bridges at FP op boundaries).

Benchmarks (Runtime)

Environment:

Linux (Debian bookworm) in a Podman container on an Apple Silicon host
Target: AArch64 (ccc-arm)
Build flags: -O3 -DNDEBUG
Benchmark: scalar/non-SIMD nbody (benchmarksgame style), steps=20_000_000

Results:

Earlier run (avg of 3): CCC before ~6.95s, CCC after ~5.37s (about 23% faster)
One re-run (avg of 3, fresh builds in container): CCC before 7.166667s, CCC after 7.453333s (about 4.00% slower)
Latest re-run (avg of 10, interleaved base/patched): CCC before 6.840000s (sd=0.017321), CCC after 5.296000s (sd=0.041037) (about 22.57% faster)

Note: There was a single contradictory 3-run measurement. With more samples
the speedup is stable and matches the earlier ~23% improvement. I'd still
recommend running benchmarks on an otherwise-idle machine (or with more runs)
when presenting results.

Spill proxy (assembly text stats for CCC output of nbody.c):

before: ldr=219, str=203, lines=1742
after: ldr=189, str=166, lines=1804

Repro command (example):

Build and test CCC on Linux:
- cargo test --release
Build nbody and time it (AArch64 host or VM):
- ./target/release/ccc-arm -O3 -DNDEBUG -o nbody nbody.c -lm
- /usr/bin/time -p ./nbody 20000000

Testing

Ran cargo test --release in docker.io/library/rust:1.91-bookworm:

493 passed; 0 failed; 6 ignored

Notes / Follow-ups

This is an incremental improvement; CCC is still significantly slower than
clang/gcc on nbody. Further work likely needs:
- Better float value handling (e.g., real FP-reg allocation or smarter
  interval splitting / reducing reg pressure), and/or
- Additional backend peephole optimizations around fmov/bitpattern moves.

ChaseWNorton · 2026-02-07T00:44:57Z

Review: APPROVE (draft — experimental)

No linked issue — standalone optimization improvement
Reviewed: Commit dff07e7a

What it does

Allows F32/F64 values to participate in GPR register allocation on 64-bit targets. Previously all float types were excluded from regalloc. Since 64-bit backends already represent F32/F64 as raw bits in a single GPR (rax/x0/t0), this is safe and should improve code quality by reducing unnecessary spills.

Changes

is_non_gpr_type: On 64-bit, only excludes F128/I128/U128 (not F32/F64)
collect_non_gpr_values: Updated consistently — F32/F64 constants only non-GPR on 32-bit
Comments and doc strings updated

Risk Assessment

This is a correctness-sensitive change — incorrect regalloc can produce wrong code silently. The fact that it's still a draft suggests it needs more testing. Would benefit from end-to-end codegen tests that verify floating-point values survive through regalloc correctly.

TholeG added 2 commits February 7, 2026 00:19

Fix failing doctests in if_convert docs

e785e0b

Enable regalloc for f32/f64 values on 64-bit targets

dff07e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GPR regalloc for F32/F64 IR values on 64-bit targets#89

Enable GPR regalloc for F32/F64 IR values on 64-bit targets#89
TholeG wants to merge 2 commits intoanthropics:mainfrom
TholeG:pr-float-regalloc

TholeG commented Feb 6, 2026

Uh oh!

ChaseWNorton commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TholeG commented Feb 6, 2026

PR Draft: Enable Register Allocation For F32/F64 On 64-bit Targets

Title

Problem / Motivation

What This PR Changes

Implementation Details

Benchmarks (Runtime)

Testing

Notes / Follow-ups

Uh oh!

ChaseWNorton commented Feb 7, 2026

Review: APPROVE (draft — experimental)

What it does

Changes

Risk Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants