Skip to content

Commit 780b6b9

Browse files
update the download script
1 parent 740c2ac commit 780b6b9

File tree

3 files changed

+9
-6
lines changed

3 files changed

+9
-6
lines changed

benchmark/download/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,12 @@ For the input of a single FASTA file (each sequence means a genome), RabbitTClus
88
## download genomes from RefSeq
99
The download script comes from [Bonsai](https://github.com/dnbaker/bonsai/tree/ac6f8c7ee1b2ae1128970a8f6dc01ddad19fdb37).
1010

11-
RefSeq bacterial genomes can be downloaded by `download_refseq.py` as follows:
11+
The latest release of RefSeq bacterial genomes can be downloaded by `download_refseq.py` as follows:
1212

1313
* `python3 download_genomes.py bacteria`
1414
* `python3 download_genomes.py -h` more details of help infos.
1515

1616
## download genomes from GenBank
17-
The FTP paths of the GenBank assembled bacterial genomes are listed in `bact_GenBank.list.gz`, which is generated from [assembly_summary_genbank.txt](https://ftp.ncbi.nlm.nih.gov/genomes/genbank/).
1817

19-
GenBank bacterial genomes can be downloaded by `download_genbank.sh` as follows:
18+
The latest release of GenBank bacterial genomes can be downloaded by `download_genbank.sh` as follows:
2019
* `./download_genbank.sh`
-10.2 MB
Binary file not shown.
Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,19 @@
11
#!/bin/bash
2+
3+
wget https://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt
4+
awk -F '\t' 'NR>2 {print $20}' assembly_summary.txt >ftp.list
5+
26
outputDir="genbankDir"
37
echo $#
48
if [ $# -ge 1 ]
59
then
610
outputDir=$1
711
fi
812
mkdir -p $outputDir
9-
zcat bact_GenBank.list.gz | while read -r line ;
13+
cat ftp.list | while read line
1014
do
11-
fname=$(echo $line | grep -o 'GCA_.*' | sed 's/$/_genomic.fna.gz/') ;
12-
#echo "$line/$fname" ;
15+
fname=$(echo $line | grep -o 'GCA_.*' | sed 's/$/_genomic.fna.gz/')
16+
#echo "$line/$fname"
1317
wget -c "$line/$fname" ;
1418
mv "$fname" $outputDir
1519
done

0 commit comments

Comments
 (0)