Skip to content

Replace inline type IDs with global constants in LLVM IR #15485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Feb 18, 2025

Consider the following snippet:

N = 3000

{% if flag?(:bar) %}
  class Bar
  end

  Bar.new
{% end %}

{% for i in 0...N %}
  class Foo{{ i }}
  end
{% end %}

{% for i in 0...N %}
  Foo{{ i }}.new
{% end %}

We are going to compile this twice, first with a cold cache, then with -Dbar:

$ crystal clear_cache

$ time crystal run --prelude=empty test.cr
real    0m2.023s
user    0m5.610s
sys     0m2.894s

$ time crystal run --prelude=empty -Dbar test.cr 
real    0m1.962s
user    0m5.414s
sys     0m3.072s

These times suggest that the addition of Bar completely invalidates the object cache, and indeed, if you pass also --stats to the second compilation, it would say no previous .o files were reused. How could this be the case when none of the Foos depend on Bar?

This counterintuitive behavior arises from the way Crystal separates LLVM IR into LLVM modules. Each non-generic type or generic instance has its own LLVM module containing all of that type or instance's class and instance methods, and then the rest goes to a special LLVM module called _main. The bytecode for Foo0 can be disassembled back to LLVM IR using llvm-dis:

%Foo0 = type { i32 }

@":symbol_table" = external global [0 x ptr]

; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
alloca:
  %x = alloca ptr, align 8
  br label %entry

entry:                                            ; preds = %alloca
  %0 = call ptr @malloc(i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64))
  call void @llvm.memset.p0.i64(ptr align 4 %0, i8 0, i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64), i1 false)
  %1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
  store i32 7, ptr %1, align 4
  store ptr %0, ptr %x, align 8
  %2 = load ptr, ptr %x, align 8
  ret ptr %2
}

The line store i32 7, ptr %1, align 4 is where Foo0.allocate stores Foo0.crystal_type_id to the newly allocated memory area; this 7 corresponds to the compile-time value of Foo0.crystal_type_id. If we drop -Dbar again, the same value now becomes 6.

The compiler component responsible for generating these type IDs is the Crystal::LLVMId class; it assigns numerical IDs in sequential order, with types defined later in the source code or the compiler receiving larger IDs than their sibling types. In particular, all structs have larger type IDs than every class. (You can see this information by setting the environment variable CRYSTAL_DUMP_TYPE_ID to 1 during compilation.) Hence, by defining Bar at the beginning of the file, we have incremented the type ID of every single Foo by 1, and the inlining breaks the cache.

If we move Bar to the bottom of the file, then recompilations will be able to reuse the Foo object files, because their type IDs remain untouched. In practice, however, the splitting of source code into separate files renders this specific workaround nearly impossible to pull off, not to mention that other constructs like typeof and is_a? also inline type IDs, apart from Reference.new. In short, if your code tries to remove the Nil from an Int32?, its cache will get invalidated any time you add or remove a class.


This PR does not fight against the type ID assignment. It merely stops the inlining:

@"Foo0:type_id" = external constant i32

; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
; ...
  %1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
  %2 = load i32, ptr @"Foo0:type_id", align 4
  store i32 %2, ptr %1, align 4
; ...
}

Global variables are required to be addressable in LLVM, so this creates an actual constant in the read-only section, hence the extra load. The actual compile-time value is now defined in _main:

@"Foo0:type_id" = constant i32 6

With this simple trick, the object file cache is now working as intended:

$ bin/crystal clear_cache

$ time bin/crystal run --prelude=empty test.cr
real    0m2.102s
user    0m5.386s
sys     0m2.442s

$ time bin/crystal run --prelude=empty -Dbar test.cr 
real    0m1.482s
user    0m1.218s
sys     0m0.799s

As another example, we compile an empty file with the standard prelude, then add class Foo; end; Foo.new to it and recompile. These are the times:

Cold cache Before After
Codegen (crystal) 00:00:00.355781638 00:00:00.356948294 00:00:00.317487232
Codegen (bc+obj) 00:00:00.317895037 00:00:00.336335396 00:00:00.112650966
Codegen (linking) 00:00:00.198366216 00:00:00.181540888 00:00:00.169724852
.o files reused (none) 165/312 315/318

For an even larger codebase, we try this modification in src/compiler/crystal/codegen/codegen.cr of the Crystal compiler itself:

module Crystal
  class Foo
  end

  class Program
    def run(code, filename : String? = nil, debug = Debug::Default)
      Foo.new
      # ...
    end
  end
end

The times are:

Cold cache Before After
Codegen (crystal) 00:00:07.874727066 00:00:07.681402042 00:00:06.774886131
Codegen (bc+obj) 00:00:05.924711212 00:00:05.867124109 00:00:00.959626107
Codegen (linking) 00:00:04.859327472 00:00:04.942958413 00:00:04.937458099
.o files reused (none) 850/2124 2102/2124

This will hopefully improve build times in certain scenarios, such as rapid prototyping, and IDE integrations that run the whole compiler.

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding 🏆

@crysbot
Copy link
Collaborator

crysbot commented Feb 18, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/incremental-compilation-exploration/5224/119

@straight-shoota straight-shoota added this to the 1.16.0 milestone Feb 20, 2025
@straight-shoota straight-shoota merged commit adaeee4 into crystal-lang:master Feb 21, 2025
32 checks passed
@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Feb 22, 2025

This change seems to have caused some segfaults in Athena spes: https://github.com/athena-framework/athena/actions/runs/13468376240/job/37638710966.

I'm working on reducing it, but for now it can be reproduced by updating https://github.com/athena-framework/athena/blob/34ae7deb253c7f9e6ea29d7b06a1656429cdbd00/src/components/framework/spec/listeners/error_spec.cr#L57-L65 to be like:

it "logs HTTPExceptions as warning", focus: true do
  p ATH::Events::Exception.new ATH::Request.new("GET", "/"), MockException.new "Something went wrong"
end

Then execute it via:

/path/to/locally/built/compiler src/components/framework/spec/listeners/error_spec.cr

Results in:

#<Athena::Framework::Events::Exception:0x7ffff7100d20 @propagation_stopped=false, @request=#<Athena::Framework::Request:0x7ffff70d4d80 @action=nil, @attributes=Athena::Framework::ParameterBag(@parameters={}), @request_data=nil, @request_format=nil, @request=#<HTTP::Request:0x7ffff70d3ee0 @method="GET", @headers=HTTP::Headers{}, @body=nil, @version="HTTP/1.1", @cookies=nil, @query_params=nil, @form_params=nil, @uri=nil, @remote_address=nil, @local_address=nil, @resource="/">, @is_host_valid=true, @is_forwarded_valid=true, @scheme="http", @trusted_values_cache={}>, @response=nil, @exception=#<Invalid memory access (signal 11) at address 0x0
[0x5555559e797a] *Exception::CallStack::print_backtrace:Nil +122 in ./coverage/bin/framework
[0x555555962743] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +275 in ./c                                                                overage/bin/framework
[0x7ffff715d1d0] ?? +140737338790352 in /usr/lib/libc.so.6
[0x555555a6abd3] *Slice(T)::new:read_only<Pointer(UInt8), Int32, Bool>:Slice(UInt8) +35 in ./coverage/bin/framework
[0x5555559e86a5] *String#to_slice:Slice(UInt8) +37 in ./coverage/bin/framework
[0x5555559ea9f8] *String#to_s<String::Builder>:Nil +24 in ./coverage/bin/framework
[0x555555a59efb] *String::Builder +27 in ./coverage/bin/framework
[0x555555b615e2] *Exception+ +34 in ./coverage/bin/framework
[0x555555b636da] *Exception+ +74 in ./coverage/bin/framework
[0x555556333adc] ?? +93825006779100 in ./coverage/bin/framework

I think the issue has to do with me defining private class MockException twice in two files with different parents? Haven't been able to reduce it outside of this specific context, but seems fairly likely as renaming the exception type makes it no longer error.

@HertzDevil HertzDevil deleted the perf/llvm-type-id-inline branch February 23, 2025 05:10
kojix2 pushed a commit to kojix2/crystal that referenced this pull request Feb 23, 2025
…g#15485)

Introduction and removal of types shifts the assignment of type IDs. They are numerical IDs in sequential order, with types defined later in the source code or the compiler receiving larger IDs than their sibling types.
Type IDs used to be inlined into the LLVM module, which can invalidate large portions of the object cache due to the change of type IDs.

This patch does not fight against the type ID assignment. It merely stops the inlining:

```llvm
@"Foo0:type_id" = external constant i32

; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
; ...
  %1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
  %2 = load i32, ptr @"Foo0:type_id", align 4
  store i32 %2, ptr %1, align 4
; ...
}
```

Global variables are required to be addressable in LLVM, so this creates an actual constant in the read-only section, hence the extra `load`. The actual compile-time value is now defined in `_main`:

```llvm
@"Foo0:type_id" = constant i32 6
```

With this simple trick, the object file cache can be reused more often.

This will hopefully improve build times in certain scenarios, such as rapid prototyping, and IDE integrations that run the whole compiler.
straight-shoota pushed a commit that referenced this pull request Mar 13, 2025
The compiler places the string contents of all symbols defined in the source code into a special `:symbol_table` LLVM global variable in the main LLVM module:

```llvm
@":symbol_table" = global [24 x ptr] [
  ptr @"'general'", ptr @"'no_error'", ptr @"'gc'", ptr @"'sequentially_consis...'",
  ptr @"'monotonic'", ptr @"'acquire'", ptr @"'evloop'", ptr @"'xchg'", ptr @"'release'",
  ptr @"'acquire_release'", ptr @"'io_write'", ptr @"'sched'", ptr @"'io_read'",
  ptr @"'skip'", ptr @"'none'", ptr @"'active'", ptr @"'done'", ptr @"'unchecked'",
  ptr @"'sleep'", ptr @"'relaxed'", ptr @"'add'", ptr @"'sub'", ptr @"'file'", ptr @"'target'"
]
```

Every LLVM module, without exception, also declares it:

```llvm
@":symbol_table" = external global [24 x ptr]
```

Since the total number of symbols is part of the variable type, adding or removing any symbol invalidates all object files in the Crystal cache, even though only `Symbol#to_s`, or rather `@[Primitive(:symbol_to_s)]`, has access to this variable.

This PR removes this declaration except where the primitive accesses it. Changes to the symbol table will only affect LLVM modules that call `Symbol#to_s` (the primitive body itself is inlined). Building an empty file, replacing it with `x = :abcde`, then rebuilding the same file will now give:

```
Codegen (bc+obj):
 - 315/317 .o files were reused

These modules were not reused:
 - _main (_main.o0.bc)
 - Crystal (C-rystal.o0.bc)
```

(`Crystal` is also invalidated because the return type of `Crystal.main` is now `Symbol` instead of `Nil`; appending `nil` to the file makes this module reusable.)

Note that it is still possible to invalidate large portions of the cache from the addition or removal of symbols, because their indices are allocated sequentially, and inlined on every use. Something like #15485 but for `Symbol` values would solve that.
Blacksmoke16 pushed a commit to Blacksmoke16/crystal that referenced this pull request Mar 18, 2025
…ng#15486)

The compiler places the string contents of all symbols defined in the source code into a special `:symbol_table` LLVM global variable in the main LLVM module:

```llvm
@":symbol_table" = global [24 x ptr] [
  ptr @"'general'", ptr @"'no_error'", ptr @"'gc'", ptr @"'sequentially_consis...'",
  ptr @"'monotonic'", ptr @"'acquire'", ptr @"'evloop'", ptr @"'xchg'", ptr @"'release'",
  ptr @"'acquire_release'", ptr @"'io_write'", ptr @"'sched'", ptr @"'io_read'",
  ptr @"'skip'", ptr @"'none'", ptr @"'active'", ptr @"'done'", ptr @"'unchecked'",
  ptr @"'sleep'", ptr @"'relaxed'", ptr @"'add'", ptr @"'sub'", ptr @"'file'", ptr @"'target'"
]
```

Every LLVM module, without exception, also declares it:

```llvm
@":symbol_table" = external global [24 x ptr]
```

Since the total number of symbols is part of the variable type, adding or removing any symbol invalidates all object files in the Crystal cache, even though only `Symbol#to_s`, or rather `@[Primitive(:symbol_to_s)]`, has access to this variable.

This PR removes this declaration except where the primitive accesses it. Changes to the symbol table will only affect LLVM modules that call `Symbol#to_s` (the primitive body itself is inlined). Building an empty file, replacing it with `x = :abcde`, then rebuilding the same file will now give:

```
Codegen (bc+obj):
 - 315/317 .o files were reused

These modules were not reused:
 - _main (_main.o0.bc)
 - Crystal (C-rystal.o0.bc)
```

(`Crystal` is also invalidated because the return type of `Crystal.main` is now `Symbol` instead of `Nil`; appending `nil` to the file makes this module reusable.)

Note that it is still possible to invalidate large portions of the cache from the addition or removal of symbols, because their indices are allocated sequentially, and inlined on every use. Something like crystal-lang#15485 but for `Symbol` values would solve that.
straight-shoota pushed a commit that referenced this pull request May 8, 2025
The compile time optimization to extract type IDs into global constants (#15485) means that types with the same name are now merged into one. This revealed a bug where two distinct types could share the same name and now would have the same ID and be considered a single type.
@crysbot
Copy link
Collaborator

crysbot commented May 19, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/crystal-1-16-2-is-released/8013/8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants