Table of Contents
unirep_analysis is a ChRIS app that is wrapped around the UniRep project (https://github.com/churchlab/UniRep)
This plugin is GPU-capable. The 64-unit model should be OK to run on any machine. The full-sized model will require a machine with more than 8GB of GPU RAM.
For full information about the underlying method, consult the UniRep publication:
Paper: https://www.nature.com/articles/s41592-019-0598-1
The source code of UniRep is available on Github: https://github.com/churchlab/UniRep.
unirep_analysis \
[--dimension <modelDimension>] \
[--batch_size <batchSize>] \
[--learning_rate <learningRate>] \
[--inputFile <inputFileToProcess>] \
[--inputGlob <inputGlobPattern>] \
[--modelWeightPath <pathToWeights>] \
[--outputFile <resultOutputFile>] \
[--topModelTraining] \
[--jointModelTraining] \
[--json] \
<inputDir>
<outputDir>
unirep_analysis is a ChRIS-based "plugin" application that is capable of inferencing protein sequence representations and generative modelling aka "babbling".
Simply pull the docker image,
docker pull fnndsc/pl-unirep_analysis
and go straight to the examples section.
[--dimension <modelDimension>]
By default, the <modelDimension> is 64. However, the value can be changed
to 1900 (full) or 256 and the corresponding weights files (present inside
the container) will be used.
[--batch_size <batchSize>]
This represents the batch size of the babbler. Default value is 12.
[--learning_rate <learningRate>]
Needed to build the model. Default is 0.001.
[--inputFile <inputFileToProcess>]
The name of the input text file that contains your amino acid sequences.
The default file name is an empty string. The full path to the
<inputFileToProcess> is constructed by concatenating <inputDir>
<inputDir>/<inputFileToProcess>
[--inputGlob <inputGlob>]
A glob pattern string, default '**/*txt', that specifies the file containing
an amino acid sequence. This parameter allows for dynamic searching in the
input space a sequence file, and the first "hit" is grabbed.
[--modelWeightPath <path>]
A path to a directory containing model weight files to use for inference.
[--outputFile <resultOutputFile>]
The name of the output or formatted 'txt' file. Default name is 'format.txt'
[--topModelTraining]
If specified, run a training model just optimizing top model
[--jointModelTraing]
If specified, jointly train top model and mLSTM
[-h]
Display inline help
[--json]
If specified, print a JSON representation of the app.
The execute vector of this plugin is via docker.
To run using docker, be sure to assign an "input" directory to /incoming and an output directory to /outgoing. Make sure that the $(pwd)/out directory is world writable!
Now, prefix all calls with
docker run --rm -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis \
unirep_analysis \Thus, getting inline help is:
mkdir in out && chmod 777 out
docker run --rm -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis \
unirep_analysis \
-h \
/incoming /outgoingAssuming that the <inputDir> layout conforms to
<inputDir>
│
└──█ sequence.txtto process this (by default on a GPU) do
docker run --rm --gpus all \
-v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis unirep_analysis \
--inputFile sequence.txt --outputFile formatted.txt \
/incoming /outgoing(note the --gpus all is not necessarily required) which will create in the <outputDir>:
<outputDir>
│
└──█ formatted.txtTo perform in-line debugging of the container, do
docker run --rm -it --userns=host -u $(id -u):$(id -g) \
-v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
-v $PWD/src:/usr/local/lib/python3.5/dist-packages/src \
-v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing \
local/pl-unirep_analysis2 unirep_analysis /incoming /outgoingNote, if you want to use pudb for debugging, then omit the -u $(id -u):$(id -g):
docker run --rm -it --userns=host \
-v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
-v $PWD/src:/usr/local/lib/python3.5/dist-packages/src \
-v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing \
local/pl-unirep_analysis2 unirep_analysis /incoming /outgoingOf course, in both cases above, use approrpiate CLI args if required.
_-30-_