| Usage | Requirement |
|---|---|
- AIM
- WARNINGS
- CONTENT
- HOW TO RUN
- OUTPUT
- VERSIONS
- LICENCE
- CITATION
- CREDITS
- ACKNOWLEDGEMENTS
- WHAT'S NEW IN
Return homopolymers info per DNA sequence in a batch of DNA sequences, as well as statistics about homopolymers for this batch.
The algorithm works as if it splits the input sequence according to homopolymers and then returns the info.
With the input sequence ATTTAAGCGGG, the homopolymers are:
A
TTT
AA
G
C
GGG
| Files and folder | Description |
|---|---|
| main.nf | File that can be executed using a linux terminal, a MacOS terminal or Windows 10 WSL2. |
| nextflow.config | Parameter settings for the main.nf file. Users have to open this file, set the desired settings and save these modifications before execution. |
| bin folder | Contains files required by the main.nf file. |
| Licence.txt | Licence of the release. |
| Required files |
|---|
| A fasta file. |
The dataset used in the nextflow.config file, as example, is available at https://zenodo.org/records/10681460.
| File name | Description |
|---|---|
| test.fasta | Fasta file . Available here. |
Installation of:
nextflow DSL2
Graphviz, sudo apt install graphviz for Linux ubuntu
Apptainer
- Mount a server if required:
DRIVE="Z" # change the letter to fit the correct drive
sudo mkdir /mnt/share
sudo mount -t drvfs $DRIVE: /mnt/share
Warning: if no mounting, it is possible that nextflow does nothing, or displays a message like:
Launching `main.nf` [loving_morse] - revision: d5aabe528b /mnt/share/Users
- Run the following command from where the main.nf and nextflow.config files are (example: \wsl$\Ubuntu-20.04\home\gael):
nextflow run main.nf -c nextflow.config
with -c to specify the name of the config file used.
Run the following command from where you want the results:
nextflow run gael-millot/homopolymer # github, or nextflow run http://github.com/gael-millot/homopolymer
nextflow run -hub pasteur gmillot/homopolymer -r v1.0.0 # gitlab
Copy-paste this after having modified the EXEC_PATH variable:
EXEC_PATH="/pasteur/zeus/projets/p01/BioIT/gmillot/homopolymer" # where the bin folder of the main.nf script is located
export CONF_BEFORE=/opt/gensoft/exe # on maestro
export JAVA_CONF=java/13.0.2
export JAVA_CONF_AFTER=bin/java # on maestro
export APP_CONF=apptainer/1.3.5
export APP_CONF_AFTER=bin/apptainer # on maestro
export GIT_CONF=git/2.39.1
export GIT_CONF_AFTER=bin/git # on maestro
export GRAPHVIZ_CONF=graphviz/2.42.3
export GRAPHVIZ_CONF_AFTER=bin/graphviz # on maestro
MODULES="${CONF_BEFORE}/${JAVA_CONF}/${JAVA_CONF_AFTER},${CONF_BEFORE}/${APP_CONF}/${APP_CONF_AFTER},${CONF_BEFORE}/${GIT_CONF}/${GIT_CONF_AFTER}/${GRAPHVIZ_CONF}/${GRAPHVIZ_CONF_AFTER}"
cd ${EXEC_PATH}
chmod 755 ${EXEC_PATH}/bin/*.*
module load ${JAVA_CONF} ${APP_CONF} ${GIT_CONF} ${GRAPHVIZ_CONF}
Modify the second line of the code below, and run from where the main.nf and nextflow.config files are (which has been set thanks to the EXEC_PATH variable above):
HOME_INI=$HOME
HOME="${HELIXHOME}/homopolymer/" # $HOME changed to allow the creation of .nextflow into /$HELIXHOME/homopolymer/, for instance. See NFX_HOME in the nextflow software script
nextflow run main.nf -c nextflow.config
HOME=$HOME_INI
Modify the first and third lines of the code below, and run (results will be where the EXEC_PATH variable has been set above):
VERSION="v1.0"
HOME_INI=$HOME
HOME="${HELIXHOME}/homopolymer/" # $HOME changed to allow the creation of .nextflow into /$HELIXHOME/homopolymer/, for instance. See NFX_HOME in the nextflow software script
nextflow run gael-millot/homopolymer -r $VERSION -c $HOME/nextflow.config #github, or nextflow run http://github.com/gael-millot/homopolymer -r $VERSION -c $HOME/nextflow.config
nextflow run -hub pasteur gmillot/homopolymer -r $VERSION -c $HOME/nextflow.config # gitlab
HOME=$HOME_INI
Unknown error accessing project `gmillot/homopolymer` -- Repository may be corrupted: /pasteur/sonic/homes/gmillot/.nextflow/assets/gmillot/homopolymer
Purge using:
rm -rf /pasteur/sonic/homes/gmillot/.nextflow/assets/gmillot*
WARN: Cannot read project manifest -- Cause: Remote resource not found: https://gitlab.pasteur.fr/api/v4/projects/gmillot%2Fhomopolymer
Contact Gael Millot (distant repository is not public).
permission denied
Use chmod to change the user rights. Example linked to files in the bin folder:
chmod 755 bin/*.*
An example of results is present at this address: https://zenodo.org/records/10681595/files/homopolymer_2_1708386120.zip.
| homopolymer_ <HOMOPOLYMER_MIN_LENGTH>_ <UNIQUE_ID> folder |
Description |
|---|---|
| report.html | Report of the analysis. |
| reports | Folder containing all the reports of the different processes, including the nextflow.config file used. |
| figures | Folder containing the graphs in png format that are used in the report.html file, as well as the corresponding svg vectorial files if needed. |
| files | Folder containing the following files:
|
The different releases are tagged here.
This package of scripts can be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchandability or fitness for a particular purpose. See the GNU General Public License for more details at https://www.gnu.org/licenses or in the Licence.txt attached file.
Not yet published.
Gael A. Millot, Hub, Institut Pasteur, Paris, France.
The developers & maintainers of the mentioned softwares and packages, including:
Special acknowledgement to Yoann Dufresne, Hub, Institut Pasteur, Paris, France.
- In the nextflow.config file, upgrade singularity -> apptainer (real one).
- In the nextflow.config file, upgrade singularity -> apptainer.
- Bugs fixed in report.
- README improved for github.
- nextflow.config input from zenodo.
Bug fixed in report.
Nextflow DSL1 -> DSL2.
Plot_raw_values.tsv file added and boxplot_stat_log.tsv file modified.
Plot and html report modified.
Boxplots modified.
Completemy rewritten.
Minimum length of homopolymer added as parameter, among other things.
Many things improved.
Completely modified. Now the file is a nextflow and outputs include tables ,graphs and stats.
New features included in the result table.
Everything.