Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace inline type IDs with global constants in LLVM IR #15485

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Feb 18, 2025

Consider the following snippet:

N = 3000

{% if flag?(:bar) %}
  class Bar
  end

  Bar.new
{% end %}

{% for i in 0...N %}
  class Foo{{ i }}
  end
{% end %}

{% for i in 0...N %}
  Foo{{ i }}.new
{% end %}

We are going to compile this twice, first with a cold cache, then with -Dbar:

$ crystal clear_cache

$ time crystal run --prelude=empty test.cr
real    0m2.023s
user    0m5.610s
sys     0m2.894s

$ time crystal run --prelude=empty -Dbar test.cr 
real    0m1.962s
user    0m5.414s
sys     0m3.072s

These times suggest that the addition of Bar completely invalidates the object cache, and indeed, if you pass also --stats to the second compilation, it would say no previous .o files were reused. How could this be the case when none of the Foos depend on Bar?

This counterintuitive behavior arises from the way Crystal separates LLVM IR into LLVM modules. Each non-generic type or generic instance has its own LLVM module containing all of that type or instance's class and instance methods, and then the rest goes to a special LLVM module called _main. The bytecode for Foo0 can be disassembled back to LLVM IR using llvm-dis:

%Foo0 = type { i32 }

@":symbol_table" = external global [0 x ptr]

; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
alloca:
  %x = alloca ptr, align 8
  br label %entry

entry:                                            ; preds = %alloca
  %0 = call ptr @malloc(i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64))
  call void @llvm.memset.p0.i64(ptr align 4 %0, i8 0, i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64), i1 false)
  %1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
  store i32 7, ptr %1, align 4
  store ptr %0, ptr %x, align 8
  %2 = load ptr, ptr %x, align 8
  ret ptr %2
}

The line store i32 7, ptr %1, align 4 is where Foo0.allocate stores Foo0.crystal_type_id to the newly allocated memory area; this 7 corresponds to the compile-time value of Foo0.crystal_type_id. If we drop -Dbar again, the same value now becomes 6.

The compiler component responsible for generating these type IDs is the Crystal::LLVMId class; it assigns numerical IDs in sequential order, with types defined later in the source code or the compiler receiving larger IDs than their sibling types. In particular, all structs have larger type IDs than every class. (You can see this information by setting the environment variable CRYSTAL_DUMP_TYPE_ID to 1 during compilation.) Hence, by defining Bar at the beginning of the file, we have incremented the type ID of every single Foo by 1, and the inlining breaks the cache.

If we move Bar to the bottom of the file, then recompilations will be able to reuse the Foo object files, because their type IDs remain untouched. In practice, however, the splitting of source code into separate files renders this specific workaround nearly impossible to pull off, not to mention that other constructs like typeof and is_a? also inline type IDs, apart from Reference.new. In short, if your code tries to remove the Nil from an Int32?, its cache will get invalidated any time you add or remove a class.


This PR does not fight against the type ID assignment. It merely stops the inlining:

@"Foo0:type_id" = external constant i32

; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
; ...
  %1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
  %2 = load i32, ptr @"Foo0:type_id", align 4
  store i32 %2, ptr %1, align 4
; ...
}

Global variables are required to be addressable in LLVM, so this creates an actual constant in the read-only section, hence the extra load. The actual compile-time value is now defined in _main:

@"Foo0:type_id" = constant i32 6

With this simple trick, the object file cache is now working as intended:

$ bin/crystal clear_cache

$ time bin/crystal run --prelude=empty test.cr
real    0m2.102s
user    0m5.386s
sys     0m2.442s

$ time bin/crystal run --prelude=empty -Dbar test.cr 
real    0m1.482s
user    0m1.218s
sys     0m0.799s

As another example, we compile an empty file with the standard prelude, then add class Foo; end; Foo.new to it and recompile. These are the times:

Cold cache Before After
Codegen (crystal) 00:00:00.355781638 00:00:00.356948294 00:00:00.317487232
Codegen (bc+obj) 00:00:00.317895037 00:00:00.336335396 00:00:00.112650966
Codegen (linking) 00:00:00.198366216 00:00:00.181540888 00:00:00.169724852
.o files reused (none) 165/312 315/318

For an even larger codebase, we try this modification in src/compiler/crystal/codegen/codegen.cr of the Crystal compiler itself:

module Crystal
  class Foo
  end

  class Program
    def run(code, filename : String? = nil, debug = Debug::Default)
      Foo.new
      # ...
    end
  end
end

The times are:

Cold cache Before After
Codegen (crystal) 00:00:07.874727066 00:00:07.681402042 00:00:06.774886131
Codegen (bc+obj) 00:00:05.924711212 00:00:05.867124109 00:00:00.959626107
Codegen (linking) 00:00:04.859327472 00:00:04.942958413 00:00:04.937458099
.o files reused (none) 850/2124 2102/2124

This will hopefully improve build times in certain scenarios, such as rapid prototyping, and IDE integrations that run the whole compiler.

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding 🏆

@crysbot
Copy link
Collaborator

crysbot commented Feb 18, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/incremental-compilation-exploration/5224/119

@straight-shoota straight-shoota added this to the 1.16.0 milestone Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants