Skip to content

recipe for installing SLURM and friends on Debian 11 #70

@judith-ipac

Description

@judith-ipac

Hello and apologies if this question is in the wrong place. We are upgrading from Debian 8 to Debian 11. I am a developer with no particular background in system administration or configuration. Several weeks into a cycle of install/google-error-message/install-something-else, I have installed munge, slurm, slurm-drmaa, and bats(!). slurmctld and slurmd are now running, but calls to drmaa_run_job() result in seg faults. (The surrounding C++ code is copied from our Debian 8 host, where drmaa_run_job() runs successfully.) I'll print some debug output below, but what I'm really looking for is start-to-finish step-by-step instructions for configuring, installing, and running whatever it takes to make SLURM usable on Debian 11. Thanks in advance.

Last few steps of debug output from drmaa_run_job:

d #597f9 [ 40.42] * finalizing job constraints
d #597f9 [ 40.42] * set min_cpus to ntasks: 1
t #597f9 [ 40.42] <- slurmdrmaa_parse_native
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=0] [si_ptr=(nil)] [si_addr=0x1656]
kpedbg_dmp_stack()+394<-kpeDbgCrash()+204<-kpeDbgSignalHandler()+113<-skgesig_sigactionHandler()+258<-__sighandler()<-0x00007F06CFEC9B71<-slurm_pack_selected_step()+1286<-slurm_send_node_msg()+505<-slurm_send_recv_msg()+66<-slurm_send_recv_controller_msg()+315<-slurm_submit_batch_job()+119<-slurmdrmaa_session_run_bulk()+518<-slurmdrmaa_session_run_job()+179<-drmaa_run_job()+374<-_ZN19custom_code::submit_jobERKN5boost10filesystem4pathES4_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_bb()+4407<-0x0000000000000009<-0x7453705F6D00626F

runscript.sh: line 62: 366577 Segmentation fault

Stack trace from gdb:

           Stack trace of thread 366585:
            #0  0x00007f06d1914fe1 raise (libpthread.so.0 + 0x13fe1)
            #1  0x00007f06c254893f skgesigOSCrash (libclntsh.so + 0x267293f)
            #2  0x00007f06c2c63cdd kpeDbgSignalHandler (libclntsh.so + 0x2d8dcdd)
            #3  0x00007f06c2548c12 skgesig_sigactionHandler (libclntsh.so + 0x2672c12)
            #4  0x00007f06d1915140 __restore_rt (libpthread.so.0 + 0x14140)
            #5  0x00007f06cfec9b71 __strlen_avx2 (libc.so.6 + 0x15fb71)
            #6  0x00007f06d0467cb3 n/a (libslurm.so.36 + 0xf8cb3)
            #7  0x00007f06d047c646 n/a (libslurm.so.36 + 0x10d646)
            #8  0x00007f06d0456cf9 slurm_send_node_msg (libslurm.so.36 + 0xe7cf9)
            #9  0x00007f06d0457f72 slurm_send_recv_msg (libslurm.so.36 + 0xe8f72)
            #10 0x00007f06d04580db slurm_send_recv_controller_msg (libslurm.so.36 + 0xe90db)
            #11 0x00007f06d03b76e7 slurm_submit_batch_job (libslurm.so.36 + 0x486e7)
            #12 0x00007f06d05414f1 slurmdrmaa_session_run_bulk (libdrmaa.so.1 + 0xb4f1)
            #13 0x00007f06d054123b slurmdrmaa_session_run_job (libdrmaa.so.1 + 0xb23b)
            #14 0x00007f06d055c133 drmaa_run_job (libdrmaa.so.1 + 0x26133)
            #15 0x000056442ad0bf37 n/a (XXX + 0xd1f37)
            #16 0x0000000000000009 n/a (n/a + 0x0)

Any advice would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions