Skip to content

Make MPI.jl compatible with profilers that use LD_PRELOAD #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lyon-fnal
Copy link

@lyon-fnal lyon-fnal commented Dec 31, 2020

See #444. Use of ccall( (func, lib), ...) is not compatible with applications that use LD_PRELOAD to inject their own code for profiling and tracing. Darshan is an example of an application that does this for MPI and HDF5 I/O profiling.

This PR has changes to explicitly load the libmpi shared object with Libdl.dlopen in MPI.__init__. Most ccall statements are changed to not specify the MPI library. There are someccall statements in implementations.jl that I left alone because they get called before MPI.__init__ and all they do is determine MPI version and implementation details.

I tested on NERSC Cori Haswell nodes. MPI tests pass except for test_spawn.jl, but this fails with the released MPI.jl too. I was able to get Darshan to work with this PR.

Happy New Year too!

explicitly give library in ccall's
@lcw
Copy link
Member

lcw commented Dec 31, 2020

Thanks for the contribution! Looks like we just need to figure out why this is not working on windows.

@simonbyrne
Copy link
Member

@staticfloat is it possible to ccall global symbols on Windows?

@staticfloat
Copy link
Contributor

@vtjnash correct me if I'm wrong, but I do believe that Windows doesn't support the concept of RTLD_GLOBAL-like symbol resolution, and LD_PRELOAD-like tricks don't work on Windows.

I think the best way to support both of these is to have a macro that generates either Symbol(foo) or (Symbol(foo), libmpi), depending on the platform.

@vtjnash
Copy link

vtjnash commented Jan 4, 2021

It can be emulated (with Libdl.dllist—LLVMSupport even has an implementation), but not recommended and not fast. Darwin has RTLD_GLOBAL (though not default), and can do a LD_PRELOAD-like trick with setting DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES= before program startup.

@mofeing
Copy link

mofeing commented Feb 2, 2021

What is the state of this?
@lyon-fnal I have tried your repo with Extrae but it still crashes. Below I show the stacktrace of my script. The content of it is unrelevant because it crashes on the first line (line 5 of test-args.jl) which corresponds to the MPI.Init() call.

Stacktrace:

...

signal (11): Segmentation fault
in expression starting at /gpfs/home/bsc21/bsc21106/QCMPS.jl/test-args.jl:5
__GI_____strtol_l_internal at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f6509852aa9)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
qsort_r at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f6509855b3e)
unknown function (ip: 0x7f6509856c13)
unknown function (ip: 0x7f65097ed11a)
MPI_Init at /apps/INTEL/2017.4/impi/2017.3.196/lib64/libmpi.so.12 (unknown line)
MPI_Init_C_Wrapper at /apps/BSCTOOLS/extrae/src/src-extrae-3.8.3/build-impi_2017_4/src/tracer/wrappers/MPI/../../../../../src/tracer/wrappers/MPI/mpi_w
rapper.c:1894
MPI_Init at /apps/BSCTOOLS/extrae/src/src-extrae-3.8.3/build-impi_2017_4/src/tracer/interfaces/MPI/../../../../../src/tracer/interfaces/MPI/mpi_interfa
ce.c:877
#Init#30 at /home/bsc21/bsc21106/.julia/packages/MPI/WrP25/src/environment.jl:84
Init at /home/bsc21/bsc21106/.julia/packages/MPI/WrP25/src/environment.jl:84
unknown function (ip: 0x7f64eda642dc)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:819
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:872
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:877
include at ./Base.jl:377
exec_options at ./client.jl:288
_start at ./client.jl:484
jfptr__start_2075.clone_1 at /gpfs/apps/MN4/JULIA/1.4.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4015d4)
Allocations: 2505 (Pool: 2495; Big: 10); GC: 0
dump_buffer: Error writing to disk.
writev: Bad file descriptor

...

@vchuravy
Copy link
Member

vchuravy commented Feb 2, 2021

#451 got merged so this PR has been superseded. What ABI are you setting?

@mofeing
Copy link

mofeing commented Feb 2, 2021

What ABI are you setting?

I'm using MPICH ABI.

   Building MPI → `~/.julia/packages/MPI/G4FaL/deps/build.log`
[ Info: using system MPI
┌ Info: Using implementation
│   libmpi = "/apps/INTEL/2017.4/impi/2017.3.196/lib64/libmpi.so"
│   mpiexec_cmd = `/apps/INTEL/2017.4/impi/2017.3.196/bin64/mpiexec`
└   MPI_LIBRARY_VERSION_STRING = "Intel(R) MPI Library 2017 Update 3 for Linux* OS\n"
┌ Info: MPI implementation detected
│   impl = IntelMPI::MPIImpl = 4
│   version = v"2017.3.0"
└   abi = "MPICH"

But I actually progressed something. Apparently I had an old MPI.jl release because I'm using MPIClusterManagers. After forcing MPI.jl to the master branch, it now longers crashes on a segmentation fault but other error appears.

Extrae: Warning! OMP_NUM_THREADS is set but OpenMP is not supported!
Welcome to Extrae 3.8.3
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on $EXTRAE_HOME
Extrae: Generating intermediate files for Paraver traces.
Extrae: Tracing 5 level(s) of Sampling callers: [ 1 2 3 4 5 ]
Extrae: Sampling enabled with a period of 10000 microseconds and a variability of 3000 microseconds.
Extrae: Tracing buffer can hold 500000 events
Extrae: Circular buffer disabled.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /gpfs/home/bsc21/bsc21106/QCMPS.jl
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Warning! OMP_NUM_THREADS is set but OpenMP is not supported!
Welcome to Extrae 3.8.3
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on $EXTRAE_HOME
Extrae: Generating intermediate files for Paraver traces.
Extrae: Tracing 5 level(s) of Sampling callers: [ 1 2 3 4 5 ]
Extrae: Sampling enabled with a period of 10000 microseconds and a variability of 3000 microseconds.
Extrae: Tracing buffer can hold 500000 events
Extrae: Circular buffer disabled.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /gpfs/home/bsc21/bsc21106/QCMPS.jl
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Warning! OMP_NUM_THREADS is set but OpenMP is not supported!
Welcome to Extrae 3.8.3
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on $EXTRAE_HOME
Extrae: Generating intermediate files for Paraver traces.
[mpiexec@s03r2b38] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@s03r2b38] send_cmd_to_proxies (../../pm/pmiserv/pmiserv_pmci.c:79): unable to send command to proxies
[mpiexec@s03r2b38] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:122): error checkpointing processes
[mpiexec@s03r2b38] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@s03r2b38] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event
[mpiexec@s03r2b38] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion
Extrae: Tracing 5 level(s) of Sampling callers: [ 1 2 3 4 5 ]
Extrae: Sampling enabled with a period of 10000 microseconds and a variability of 3000 microseconds.
Extrae: Tracing buffer can hold 500000 events
Extrae: Circular buffer disabled.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /gpfs/home/bsc21/bsc21106/QCMPS.jl
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Intermediate raw trace file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Intermediate raw sym file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
ERROR: failed process: Process(`/apps/INTEL/2017.4/impi/2017.3.196/bin64/mpiexec -np 2 julia test-args.jl 64 100`, ProcessExited(255)) [255]

Stacktrace:
 [1] pipeline_error at ./process.jl:525 [inlined]
 [2] run(::Cmd; wait::Bool) at ./process.jl:440
 [3] run(::Cmd) at process.jl:438
 [4] (::var"#3#4")(::Cmd) at none:4
 [5] (::MPI.var"#26#27"{var"#3#4"})(::Cmd) at /home/bsc21/bsc21106/.julia/packages/MPI/G4FaL/src/environment.jl:25
 [6] _mpiexec(::MPI.var"#26#27"{var"#3#4"}) at /home/bsc21/bsc21106/.julia/packages/MPI/G4FaL/deps/deps.jl:6
 [7] mpiexec(::var"#3#4") at /home/bsc21/bsc21106/.julia/packages/MPI/G4FaL/src/environment.jl:25
 [8] top-level scope at none:4
Extrae: Intermediate raw trace file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Intermediate raw sample file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Intermediate raw sym file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
Extrae: Intermediate raw trace file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Intermediate raw sample file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Intermediate raw sym file created : /gpfs/home/bsc21/bsc21106/QCMPS.jl/set-0/[email protected]
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.

@simonbyrne
Copy link
Member

Can you run /apps/INTEL/2017.4/impi/2017.3.196/bin64/mpiexec -np 2 julia test-args.jl 64 100 directly and see if it works?

@mofeing
Copy link

mofeing commented Feb 2, 2021

@simonbyrne The problem persists.

Extrae: Warning! OMP_NUM_THREADS is set but OpenMP is not supported!
Welcome to Extrae 3.8.3
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on $EXTRAE_HOME
Extrae: Generating intermediate files for Paraver traces.
[mpiexec@s06r2b11] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@s06r2b11] send_cmd_to_proxies (../../pm/pmiserv/pmiserv_pmci.c:79): unable to send command to proxies
[mpiexec@s06r2b11] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:122): error checkpointing processes
[mpiexec@s06r2b11] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@s06r2b11] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event
[mpiexec@s06r2b11] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion
Extrae: Tracing 5 level(s) of Sampling callers: [ 1 2 3 4 5 ]
Extrae: Sampling enabled with a period of 10000 microseconds and a variability of 3000 microseconds.
Extrae: Tracing buffer can hold 500000 events
Extrae: Circular buffer disabled.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /gpfs/home/bsc21/bsc21106/QCMPS.jl
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Furthermore, if I force Extrae to intercept MPI calls (for recording callstack), the old segmentation fault reappears.

...
signal (11): Violación de segmento
in expression starting at /gpfs/home/bsc21/bsc21106/QCMPS.jl/test-args.jl:6
__GI_____strtol_l_internal at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f5378df6aa9)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
msort_with_tmp.part.0 at /lib64/libc.so.6 (unknown line)
qsort_r at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f5378df9b3e)
unknown function (ip: 0x7f5378dfac13)
unknown function (ip: 0x7f5378d9111a)
MPI_Init at /apps/INTEL/2017.4/impi/2017.3.196/lib64/libmpi.so.12 (unknown line)
MPI_Init_C_Wrapper at /apps/BSCTOOLS/extrae/src/src-extrae-3.8.3/build-impi_2017_4/src/tracer/wrappers/MPI/../../../../../src/tracer/wrappers/MPI/mpi_wrapper.c:1894
MPI_Init at /apps/BSCTOOLS/extrae/src/src-extrae-3.8.3/build-impi_2017_4/src/tracer/interfaces/MPI/../../../../../src/tracer/interfaces/MPI/mpi_interface.c:877
#Init#30 at /home/bsc21/bsc21106/.julia/packages/MPI/G4FaL/src/environment.jl:84
Init at /home/bsc21/bsc21106/.julia/packages/MPI/G4FaL/src/environment.jl:84 [inlined]
#start_main_loop#19 at /home/bsc21/bsc21106/.julia/packages/MPIClusterManagers/R8qKB/src/mpimanager.jl:331
start_main_loop at /home/bsc21/bsc21106/.julia/packages/MPIClusterManagers/R8qKB/src/mpimanager.jl:331
unknown function (ip: 0x7f535cff8851)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:819
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:872
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:877
include at ./Base.jl:377
exec_options at ./client.jl:288
_start at ./client.jl:484
jfptr__start_2075.clone_1 at /gpfs/apps/MN4/JULIA/1.4.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4015d4)
Allocations: 3543575 (Pool: 3542529; Big: 1046); GC: 4
dump_buffer: Error writing to disk.
writev: Bad file descriptor
...

@simonbyrne
Copy link
Member

The error looks like it can't write to a specific file. Are you able to run it on a single process?

@mofeing
Copy link

mofeing commented Feb 5, 2021

Nope, I receive the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants