Shrink Crystal::System.print_error
's output size
#15490
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Crystal::System
is by far the single largest LLVM module when compiling a blank source file, even though all the module does by itself is defining.print_error
and friends. On my machine, with debug information stripped,C-rystal5858S-ystem.o0.bc
is 213.4 KiB big, compared to_main.o0.bc
's 101.7 KiB. Disassembling the bytecode back to LLVM IR produces a monstrosity with 33k lines. This PR brings the numbers down to 48.0 KiB and 6.1k lines, while slightly improving performance, using the following tricks:.as?(T)
is alwaysT?
and does not perform intersection, so even simple types likeInt32
are upcast into the wholeInt::Primitive?
, leading to a lot of redundant downcasts later. A simpleis_a?
will suffice as a type filter inread_arg
. (I believe this is mentioned somewhere but couldn't find it).to_int_slice
, thenum
variable is cast into anInt32 | UInt32 | Int64 | UInt64
, and each subsequent line dispatches over that union. The fix here is to split the rest of the body into a separate method, and call it with each variant of the union. This form of dispatching is akin to rewriting.to_int_slice
as an instance method on the integers..to_int_slice
is now non-yielding, as the inlining added too much bloat. The caller is responsible for preparing a suitably sized buffer.Additionally, this reduces the time for the bytecode generation phase from an average of 0.35s down to 0.26s.