-
Notifications
You must be signed in to change notification settings - Fork 82
Description
I'm having trouble with compiling models with the latest Lux/Reactant versions. I'm on Rocky linux 8.10
The LSTM tutorial fails, and also this simple script:
using Lux, Reactant, Random
rng = Random.default_rng()
dev = reactant_device()
m=Chain(Conv((3,3), 2=>4, leakyrelu))
ps, st = Lux.setup(rng,m) |> dev
a = rand(Float32,10,10,2,1) |> dev
out, st = @jit Lux.apply(m,a,ps,st)
Julia version is 1.11.4
Pkg versions:
julia> Pkg.status()
Status /path/Project.toml
[7da242da] Enzyme v0.13.109
[b2108857] Lux v1.27.1
[3bd65402] Optimisers v0.4.7
[3c362404] Reactant v0.2.185
julia> Reactant_jll.is_available()
true
julia> Reactant_jll.host_platform
Linux x86_64 {cxxstring_abi=cxx11, gpu=cuda, gpu_version=12.9, julia_version=1.11.4, libc=glibc, libgfortran_version=5.0.0, libstdcxx_version=3.4.30, mode=opt}
It finds hardware and CUDA
julia> ps, st = Lux.setup(rng,m) |> dev
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1766045551.603328 598355 service.cc:153] XLA service 0x63843a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1766045551.603366 598355 service.cc:161] StreamExecutor device (0): NVIDIA A40, Compute Capability 8.6
I0000 00:00:1766045551.604502 598355 se_gpu_pjrt_client.cc:1440] Using BFC allocator.
I0000 00:00:1766045551.604537 598355 gpu_helpers.cc:141] XLA backend allocating 35781869568 bytes on device 0 for BFCAllocator.
I0000 00:00:1766045551.604570 598355 gpu_helpers.cc:182] XLA backend will use up to 11927289856 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1766045551.634117 598355 cuda_dnn.cc:461] Loaded cuDNN version 91400
((layer_1 = (weight = ConcretePJRTArray{Float32, 4, 1}(Float32[0.2168731 -0.21829635 0.021970421; -0.20019113 -0.0011756442 0.07267339; -0.037982095 -0.18488145 -0.025027305;;; -0.08308116 -0.083196364 0.23397905; -0.13060631 -0.09887839 0.15277651; -0.22280036 0.009649015 0.047057267;;;; 0.05517559 -0.01687113 -0.012363862; 0.07931492 0.17914091 -0.1315057; 0.16668728 0.024479002 -0.011142334;;; 0.1636007 0.2022534 -0.118163295; -0.117578246 -0.22701325 -0.15792598; 0.17941107 -0.037423227 0.14115952;;;; 0.1687574 0.21836714 0.17667723; -0.23390144 -0.14125527 0.12690254; 0.09573904 -0.0916368 0.18706667;;; 0.17201677 0.0023923956 0.12816548; 0.008272274 -0.06775288 0.041589584; -0.1558286 -0.062512204 0.18791631;;;; 0.23415983 0.14652681 -0.15826698; -0.01860803 0.19823596 -0.15953152; -0.2299741 -0.1296339 -0.003415974;;; -0.16505618 -0.05167108 0.1538629; 0.19457056 -0.18084836 0.025250599; -0.045280356 -0.071394034 -0.176011]), bias = ConcretePJRTArray{Float32, 1, 1}(Float32[0.061203968, 0.032079708, -0.06641705, 0.028946258])),), (layer_1 = NamedTuple(),))
But then fails critically:
Error:
Invalid handle. Cannot load symbol cublasLtCreate
[598355] signal 6 (-6): Aborted
in expression starting at REPL[12]:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f241d8f2cf4)
unknown function (ip: 0x7f241e454c19)
unknown function (ip: 0x7f241e453391)
unknown function (ip: 0x7f241dcfc03d)
unknown function (ip: 0x7f241da56538)
_ZN5cudnn7backend6Engine17finalize_internalEv at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
_ZN5cudnn7backend10Descriptor8finalizeEv at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
_ZN5cudnn7backend12EngineConfig17finalize_internalEv at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
_ZN5cudnn7backend12EngcfgTmpVar7Results23emplace_back_on_successEONS0_12EngineConfigERNS1_10ErrorStackE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
_ZNK5cudnn7backend16EngineHeuristics12get_internalE27cudnnBackendAttributeName_t27cudnnBackendAttributeType_tlPlPv at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
cudnnBackendGetAttribute at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libcudnn_graph.so.9 (unknown line)
_ZN14cudnn_frontend19EngineHeuristics_v815getEngineConfigEl at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN14cudnn_frontendL24get_heuristics_list_implE22cudnnBackendHeurMode_tRNS_17OperationGraph_v8ESt8functionIFbPvEEiRSt6vectorISt10shared_ptrINS_20OpaqueBackendPointerEESaISA_EES8_IKNS_16DevicePropertiesEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN14cudnn_frontendL19get_heuristics_listERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EERNS_17OperationGraph_v8ESt8functionIFbPvEERS0_ISt10shared_ptrINS_20OpaqueBackendPointerEESaISJ_EEbiSH_IKNS_16DevicePropertiesEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN14cudnn_frontend19get_heuristics_listILm1EEESt6vectorI13cudnnStatus_tSaIS2_EESt5arrayINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEXT_EERNS_17OperationGraph_v8ESt8functionIFbPvEERS1_ISt10shared_ptrINS_20OpaqueBackendPointerEESaISL_EEb at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN15stream_executor3gpu12CudnnSupport18GetConvolveRunnersENS_3dnn15ConvolutionKindENS2_8DataTypeES4_PNS_6StreamERKNS2_15BatchDescriptorENS_17DeviceAddressBaseERKNS2_16FilterDescriptorESA_S9_SA_RKNS2_21ConvolutionDescriptorEbPNS_16ScratchAllocatorERKNS_13EngineOptionsEPSt6vectorISt10unique_ptrIKNS2_8OpRunnerIFvSA_SA_SA_EEESt14default_deleteISR_EESaISU_EE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu12_GLOBAL__N_113GetAlgorithmsEPN15stream_executor3dnn10DnnSupportENS3_15ConvolutionKindENS3_8DataTypeES7_PNS2_6StreamERKNS0_13GpuConvConfigERKNS2_13EngineOptionsEb at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu12CudnnBackend19GetSupportedConfigsERKNS_14HloInstructionE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla9Autotuner19GetSupportedConfigsEPNS_14HloInstructionE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla9Autotuner14TuneBestConfigEPNS_14HloInstructionE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla9Autotuner9GetConfigEPNS_14HloInstructionE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla9Autotuner8AutotuneEPNS_9HloModuleERKN4absl12lts_2025081411FunctionRefIFbRKNS_14HloInstructionEEEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu13AutotunerPass7RunImplEPNS_9HloModuleERKN4absl12lts_2025081413flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS5_18container_internal10StringHashENSB_8StringEqESaISA_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla16HloPassInterface3RunEPNS_9HloModuleERKN4absl12lts_2025081413flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS4_18container_internal10StringHashENSA_8StringEqESaIS9_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla15HloPassPipeline9RunHelperIPNS_9HloModuleEEEN4absl12lts_202508148StatusOrIbEEPNS_16HloPassInterfaceET_RKNS5_13flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS5_18container_internal10StringHashENSG_8StringEqESaISF_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla15HloPassPipeline17RunPassesInternalIPNS_9HloModuleEEEN4absl12lts_202508148StatusOrIbEET_RKNS_12DebugOptionsERKNS5_13flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS5_18container_internal10StringHashENSH_8StringEqESaISG_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla15HloPassPipeline7RunImplEPNS_9HloModuleERKN4absl12lts_2025081413flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS4_18container_internal10StringHashENSA_8StringEqESaIS9_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla16HloPassInterface3RunEPNS_9HloModuleERKN4absl12lts_2025081413flat_hash_setISt17basic_string_viewIcSt11char_traitsIcEENS4_18container_internal10StringHashENSA_8StringEqESaIS9_EEE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu11GpuCompiler31OptimizeHloPostLayoutAssignmentEPNS_9HloModuleEPN15stream_executor14StreamExecutorERKNS_8Compiler14CompileOptionsERKNS7_15GpuTargetConfigEPKNS0_12GpuAliasInfoEPN3tsl6thread10ThreadPoolE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu13NVPTXCompiler31OptimizeHloPostLayoutAssignmentEPNS_9HloModuleEPN15stream_executor14StreamExecutorERKNS_8Compiler14CompileOptionsERKNS7_15GpuTargetConfigEPKNS0_12GpuAliasInfoEPN3tsl6thread10ThreadPoolE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu11GpuCompiler17OptimizeHloModuleEPNS_9HloModuleEPN15stream_executor14StreamExecutorERKNS_8Compiler14CompileOptionsERKNS7_15GpuTargetConfigEPKNS0_12GpuAliasInfoE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla3gpu11GpuCompiler12RunHloPassesESt10unique_ptrINS_9HloModuleESt14default_deleteIS3_EEPN15stream_executor14StreamExecutorERKNS_8Compiler14CompileOptionsE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla7Service15BuildExecutableERKNS_14HloModuleProtoESt10unique_ptrINS_15HloModuleConfigESt14default_deleteIS5_EEPNS_7BackendEPN15stream_executor14StreamExecutorERKNS_8Compiler14CompileOptionsEb at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla12LocalService18CompileExecutablesERKNS_14XlaComputationEN4absl12lts_202508144SpanIKPKNS_5ShapeEEERKNS_22ExecutableBuildOptionsE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla11LocalClient7CompileERKNS_14XlaComputationEN4absl12lts_202508144SpanIKPKNS_5ShapeEEERKNS_22ExecutableBuildOptionsE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla24PjRtStreamExecutorClient15CompileInternalERKNS_14XlaComputationERKSt6vectorIPKNS_5ShapeESaIS7_EESt8functionIFN4absl12lts_202508148StatusOrISt4pairIS4_IS5_SaIS5_EES5_EEERKNS_9HloModuleEEENS_14CompileOptionsEb at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla24PjRtStreamExecutorClient7CompileEN4mlir8ModuleOpENS_14CompileOptionsEb at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla24PjRtStreamExecutorClient14CompileAndLoadEN4mlir8ModuleOpENS_14CompileOptionsE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
_ZN3xla23StreamExecutorGpuClient14CompileAndLoadEN4mlir8ModuleOpENS_14CompileOptionsE at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
ClientCompile at /hiddenpath/artifacts/377531aafc691817f8096dca88c7e7b26571dadb/lib/libReactantExtra.so (unknown line)
#25 at /hiddenpath/packages/Reactant/EDxwi/src/xla/PJRT/LoadedExecutable.jl:83
try_compile_dump_mlir at /hiddenpath/packages/Reactant/EDxwi/src/mlir/IR/Pass.jl:134
try_compile_dump_mlir at /hiddenpath/packages/Reactant/EDxwi/src/mlir/IR/Pass.jl:129 [inlined]
#compile#24 at /hiddenpath/packages/Reactant/EDxwi/src/xla/PJRT/LoadedExecutable.jl:82 [inlined]
compile at /hiddenpath/packages/Reactant/EDxwi/src/xla/PJRT/LoadedExecutable.jl:68
jfptr_compile_37042.1 at /hiddenpath/compiled/v1.11/Reactant/p9PzF_5JzdJ.so (unknown line)
#compile_xla#58 at /hiddenpath/packages/Reactant/EDxwi/src/Compiler.jl:3632
compile_xla at /hiddenpath/packages/Reactant/EDxwi/src/Compiler.jl:3560 [inlined]
#compile#59 at /hiddenpath/packages/Reactant/EDxwi/src/Compiler.jl:3664
compile at /hiddenpath/packages/Reactant/EDxwi/src/Compiler.jl:3661
unknown function (ip: 0x7f255d21237d)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/interpreter.c:666
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/interpreter.c:824
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10102.1 at /hiddenpath2/julia/julia-1.11.4/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14761.1 at /hiddenpath2/julia/julia-1.11.4/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73560.1 at /hiddenpath2/julia/julia-1.11.4/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 73429715 (Pool: 73423826; Big: 5889); GC: 398
Aborted (core dumped)