Description
As discussed on Slack (#gpu), the LLVM code output shown after pressing L
in an Cthulhu interactive session is run through the wrong pipeline, resulting in host-side optimisations/… being applied to it.
Real-world example
For instance, consider this
using Metal
@metal (() -> (rand(Float32); nothing))()
which fails, as scalar rand() is not implemented on Metal yet. The error message is
ERROR: InvalidIRError: compiling MethodInstance for (::var"#3#4")() resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
[…]
and running the suggested code
julia> try
@metal (() -> (rand(Float32); nothing))()
catch err
code_typed(err; interactive=true)
end
initally gives the correct code_typed()
output
(::var"#3#4")() @ Main REPL[10]:2
∘ ── %0 = invoke #13()::Core.Const(nothing)
2 1 ── %1 = $(Expr(:foreigncall, :(:jl_get_current_task), Ref{Task}, svec(), 0, :(:ccall)))::Task │╻╷╷╷╷╷╷╷ rand
2 ── nothing::Nothing
[…]
(here, the RNG tries to access TLS state, hence the reference to jl_get_current_task
, which gets lowered to the offending julia.get_pgcstack
).
However, pressing L
to look at the actual LLVM IR gives the following (debug info turned off):
define swiftcc void @"julia_#1_2572"({}*** nonnull swiftself %0) #0 {
top:
%ptls_field4 = getelementptr inbounds {}**, {}*** %0, i64 2
%1 = bitcast {}*** %ptls_field4 to i64***
%ptls_load56 = load i64**, i64*** %1, align 8
%2 = getelementptr inbounds i64*, i64** %ptls_load56, i64 2
%safepoint = load i64*, i64** %2, align 8
fence syncscope("singlethread") seq_cst
%3 = load volatile i64, i64* %safepoint, align 8
fence syncscope("singlethread") seq_cst
%4 = getelementptr inbounds {}**, {}*** %0, i64 -7
%5 = bitcast {}*** %4 to i64*
%6 = load i64, i64* %5, align 8
%7 = getelementptr inbounds {}**, {}*** %0, i64 -6
%8 = bitcast {}*** %7 to i64*
%9 = load i64, i64* %8, align 8
%10 = getelementptr inbounds {}**, {}*** %0, i64 -5
%11 = bitcast {}*** %10 to i64*
%12 = load i64, i64* %11, align 8
%13 = getelementptr inbounds {}**, {}*** %0, i64 -4
%14 = bitcast {}*** %13 to i64*
%15 = load i64, i64* %14, align 8
%16 = shl i64 %9, 17
%17 = xor i64 %12, %6
%18 = xor i64 %15, %9
%19 = xor i64 %17, %9
%20 = xor i64 %18, %6
%21 = xor i64 %17, %16
%22 = call i64 @llvm.fshl.i64(i64 %18, i64 %18, i64 45)
store i64 %20, i64* %5, align 8
store i64 %19, i64* %8, align 8
store i64 %21, i64* %11, align 8
store i64 %22, i64* %14, align 8
ret void
}
which seems to have been run through the host-side pipeline, as the julia.get_pgcstack
call has been lowered into an extra argument, etc. This is obviously not desirable in any case, but here it is particularly confusing, as the host-side pipeline has removed the offending code.
@vchuravy has identified the root cause for this in the code_llvm
for the host pipeline being called here: https://github.com/JuliaDebug/Cthulhu.jl/blob/3cd0baf586ab07f18ecf9c6f040e6764e8155322/src/codeview.jl#L26