-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hello and apologies if this question is in the wrong place. We are upgrading from Debian 8 to Debian 11. I am a developer with no particular background in system administration or configuration. Several weeks into a cycle of install/google-error-message/install-something-else, I have installed munge, slurm, slurm-drmaa, and bats(!). slurmctld and slurmd are now running, but calls to drmaa_run_job() result in seg faults. (The surrounding C++ code is copied from our Debian 8 host, where drmaa_run_job() runs successfully.) I'll print some debug output below, but what I'm really looking for is start-to-finish step-by-step instructions for configuring, installing, and running whatever it takes to make SLURM usable on Debian 11. Thanks in advance.
Last few steps of debug output from drmaa_run_job:
d #597f9 [ 40.42] * finalizing job constraints
d #597f9 [ 40.42] * set min_cpus to ntasks: 1
t #597f9 [ 40.42] <- slurmdrmaa_parse_native
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=0] [si_ptr=(nil)] [si_addr=0x1656]
kpedbg_dmp_stack()+394<-kpeDbgCrash()+204<-kpeDbgSignalHandler()+113<-skgesig_sigactionHandler()+258<-__sighandler()<-0x00007F06CFEC9B71<-slurm_pack_selected_step()+1286<-slurm_send_node_msg()+505<-slurm_send_recv_msg()+66<-slurm_send_recv_controller_msg()+315<-slurm_submit_batch_job()+119<-slurmdrmaa_session_run_bulk()+518<-slurmdrmaa_session_run_job()+179<-drmaa_run_job()+374<-_ZN19custom_code::submit_jobERKN5boost10filesystem4pathES4_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_bb()+4407<-0x0000000000000009<-0x7453705F6D00626F
runscript.sh: line 62: 366577 Segmentation fault
Stack trace from gdb:
Stack trace of thread 366585:
#0 0x00007f06d1914fe1 raise (libpthread.so.0 + 0x13fe1)
#1 0x00007f06c254893f skgesigOSCrash (libclntsh.so + 0x267293f)
#2 0x00007f06c2c63cdd kpeDbgSignalHandler (libclntsh.so + 0x2d8dcdd)
#3 0x00007f06c2548c12 skgesig_sigactionHandler (libclntsh.so + 0x2672c12)
#4 0x00007f06d1915140 __restore_rt (libpthread.so.0 + 0x14140)
#5 0x00007f06cfec9b71 __strlen_avx2 (libc.so.6 + 0x15fb71)
#6 0x00007f06d0467cb3 n/a (libslurm.so.36 + 0xf8cb3)
#7 0x00007f06d047c646 n/a (libslurm.so.36 + 0x10d646)
#8 0x00007f06d0456cf9 slurm_send_node_msg (libslurm.so.36 + 0xe7cf9)
#9 0x00007f06d0457f72 slurm_send_recv_msg (libslurm.so.36 + 0xe8f72)
#10 0x00007f06d04580db slurm_send_recv_controller_msg (libslurm.so.36 + 0xe90db)
#11 0x00007f06d03b76e7 slurm_submit_batch_job (libslurm.so.36 + 0x486e7)
#12 0x00007f06d05414f1 slurmdrmaa_session_run_bulk (libdrmaa.so.1 + 0xb4f1)
#13 0x00007f06d054123b slurmdrmaa_session_run_job (libdrmaa.so.1 + 0xb23b)
#14 0x00007f06d055c133 drmaa_run_job (libdrmaa.so.1 + 0x26133)
#15 0x000056442ad0bf37 n/a (XXX + 0xd1f37)
#16 0x0000000000000009 n/a (n/a + 0x0)
Any advice would be greatly appreciated.