-
Notifications
You must be signed in to change notification settings - Fork 0
Step 6: Set up Distributed Computing
SMRT Analysis provides support for distributed computation using an existing job management system. Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), LSF and PBS.
Note: Celera Assembler 7.0 will only work correctly with the SGE job management system. If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols:
RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak
This section describes setup for SGE and gives guidance for extensions to other Job Management Systems.
Following are the options in the $SEYMOUR_HOME/analysis/etc/smrtpipe.rc file that you can set to execute distributed SMRT Pipe runs.
IMAGE of Table here, or a link to the SMRT Pipe section when ready
The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates (JMTs). JMTs provide a flexible format for specifying how SMRT Analysis communicates with the resident JMS. There are two templates which must be modified for your system:
-
start.tmplis the legacy template used for assembly algorithms. -
interactive.tmplis the new template used for resequencing algorithms. The difference between the two is the additional requirement of a sync option ininteractive.tmpl. (kill.tmplis not used.)
Note: We are in the process of converting all protocols to use only interactive.tmpl.
To customize a JMS for a particular environment, edit or create start.tmpl and interactive.tmpl. For example, the installation includes the following sample start.tmpl and interactive.tmpl (respectively) for SGE:
qsub -pe smp ${NPROC} -S /bin/bash -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}
qsub -S /bin/bash -sync y -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -pe smp ${NPROC} ${CMD}
- Create a new directory in
etc/cluster/underNEW_NAME. - In
smrtpipe.rc, change theCLUSTER_MANAGERvariable toNEW_NAME, as described in “Smrtpipe.rc Configuration”. - Once you have a new JMS directory specified, edit the
interactive.tmplandstart.tmplfiles for your particular setup.
Sample SGE, LSF and PBS templates are included with the installation in $SEYMOUR_HOME/analysis/etc/cluster.
For this version (v1.4.0), you must still edit both interactive.tmpl and start.tmpl as follows:
- Change
secondaryto the queue name on your system. (This is the–qoption.) - Change
smpto the parallel environment on your system. (This is the-peoption.)
PBS does not have a –sync option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.
- Change the queue name to one that exists on your system. (This is the
–qoption.) - Change the parallel environment to one that exists on your system. (This is the
-peoption.) - Make sure that
interactive.tmplcalls the–PBSoption.
Create an interactive.tmpl file by copying the start.tmpl file and adding the –K functionality in the bsub call. Or, you can also edit the sample LSF templates.
We have not tested the –sync functionally on other systems. Find the equivalent to the –sync option for your JMS and create an interactive.tmpl file. If there is no -sync option available, you may need to edit the qsw.py script in $SEYMOUR_HOME/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/qsw.py to add additional options for wrapping jobs on your system.
The code for PBS and SGE looks like the following:
if '-PBS' in args:
args.remove('-PBS')
self.jobIdDecoder = PBS_JOB_ID_DECODER
self.noJobFoundCode = PBS_NO_JOB_FOUND_CODE
self.successCode = PBS_SUCCESS_CODE
self.qstatCmd = "qstat"
else:
self.jobIdDecoder = SGE_JOB_ID_DECODER
self.noJobFoundCode = SGE_NO_JOB_FOUND_CODE
self.successCode = SGE_SUCCESS_CODE
self.qstatCmd = "qstat -j"
Running jobs in distributed mode is 88disabled by default88 in SMRT Portal.
To enable distributed processing, set the jobsAreDistributed value in $SEYMOUR_HOME/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml to true:
<context-param>
<param-name>jobsAreDistributed</param-name>
<param-value>true</param-value>
</context-param>
You will need to restart Tomcat.