-
Notifications
You must be signed in to change notification settings - Fork 138
Description
I'm trying to make a pipeline using foldseek easy-search, and I'd like it to be GPU-compatible, both to generate structures with prostt5 and to run rapid comparisons. However, the padding function seems not to work. I created a database first:
foldseek createdb db_concat.faa foldseek_db --prostt5-model ~/prostt5 --gpu 1 --threads 64
Then I ran the following command, for which I've also provided output:
foldseek easy-search benchmark_all_00/positive_query.faa benchmark_all_00/db/foldseek_db foldseek_test.tsv tmp --threads 64 --gpu 1 --prostt5-model ~/prostt5
easy-search benchmark_all_00/positive_query.faa benchmark_all_00/db/foldseek_db foldseek_test.tsv tmp --threads 64 --gpu 1 --prostt5-model /people/stey877/prostt5
MMseqs Version: 10.941cd33
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Max reject 2147483647
Max accept 2147483647
Add backtrace false
TMscore threshold 0
TMscore threshold mode 0
TMalign hit order 0
TMalign fast 1
Preload mode 0
Threads 64
Verbosity 3
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Substitution matrix aa:3di.out,nucl:3di.out
Alignment mode 3
Alignment mode 0
E-value threshold 10
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Compressed 0
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 9.5
k-mer length 6
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max results per query 1000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 0
Mask residues probability 0.999995
Mask lower case residues 1
Mask lower letter repeating N times 6
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 1
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Exhaustive search mode false
Search iterations 1
Remove temporary files true
MPI runner
Force restart with latest tmp false
Cluster search 0
Path to ProstT5 /people/stey877/prostt5
Chain name mode 0
Createdb extraction mode 0
Interface distance threshold 8
Write mapping file 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Report mode 2
Greedy best hits false
createdb benchmark_all_00/positive_query.faa tmp/12571839706662637167/query --gpu 1 --prostt5-model /people/stey877/prostt5 --chain-name-mode 0 --db-extraction-mode 0 --distance-threshold 8 --write-mapping 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 64 -v 3
Converting sequences
Time for merging to query_h: 0h 0m 0s 198ms
Time for merging to query: 0h 0m 0s 234ms
Database type: Aminoacid
CUDA0
CPU
[=================================================================] 100.00% 96 3s 239ms
Time for merging to query_ss: 0h 0m 0s 448ms
Time for merging to query_ss_tmp: 0h 0m 0s 405ms
Time for processing: 0h 0m 6s 900ms
Create directory tmp/12571839706662637167/search_tmp
search tmp/12571839706662637167/query benchmark_all_00/db/foldseek_db tmp/12571839706662637167/result tmp/12571839706662637167/search_tmp --threads 64 --alignment-mode 3 -s 9.5 -k 6 --gpu 1 --remove-tmp-files 1
ungappedprefilter tmp/12571839706662637167/query_ss benchmark_all_00/db/foldseek_db_ss tmp/12571839706662637167/search_tmp/10720767736467567852/pref --sub-mat 'aa:3di.out,nucl:3di.out' -c 0 -e 1.79769e+308 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --min-ungapped-score 30 --max-seqs 1000 --db-load-mode 0 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 0 --threads 64 --compressed 0 -v 3
Database foldseek_db_ss is not a valid GPU database
Please call: makepaddedseqdb foldseek_db_ss foldseek_db_ss_pad
Error: Ungapped prefilter matching step died
Error: Search died
I get the following error when attempting to run foldseek makepaddedseqdb foldseek_db_ss foldseek_db_ss_pad manually:
foldseek makepaddedseqdb foldseek_db_ss foldseek_db_ss_padded
makepaddedseqdb foldseek_db_ss foldseek_db_ss_padded
MMseqs Version: 10.941cd33
Substitution matrix aa:3di.out,nucl:3di.out
Mask residues 0
Mask residues probability 0.999995
Write lookup file 1
Threads 32
Verbosity 3
Cluster search 0
Database foldseek_db_ss needs header information
But in addition, I thought easy-search handled padding on its own, cf #399
If I run the padding on the entire database, rather than just *_ss files, it does run successfully, but doesn't change the easy-search error. It also yields some empty files, and one unusual file that appears as a large empty space; wc -l reports it as having 22 lines, so I'm assuming it's 22 line-breaks or something.
Any help would be much appreciated! I'm running on Mamba, which was last updated in January, but I saw makepaddeddb.sh was updated in August; is that related to this problem?
Edit: I downloaded the AVX2 GPU version, since AVX2 is supported on my HPC:
grep -m1 -o 'avx2' /proc/cpuinfo
avx2
I re-ran createdb and makepaddseqdb:
foldseek createdb db_concat_01.faa foldseek_db --prostt5-model ~/prostt5 --gpu 1
createdb db_concat_01.faa foldseek_db --prostt5-model /people/stey877/prostt5 --gpu 1
MMseqs Version: d6204679ceef8a559be2e7a92e89760e31fbc21a
Use GPU 1
Path to ProstT5 /people/stey877/prostt5
Chain name mode 0
Model name mode 0
Createdb extraction mode 0
Interface distance threshold 10
Write mapping file 0
Write Foldcomp 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Threads 256
Verbosity 3
Converting sequences
[304] 0s 324ms
Sort single files in 0h 0m 1s 142ms
Merge all files 0h 0m 0s 386ms
Database type: Aminoacid
CUDA0
CPU
[=================================================================] 100.00% 348 11s 137ms
Time for merging to foldseek_db_ss: 0h 0m 2s 75ms
Time for merging to foldseek_db_ss_tmp: 0h 0m 2s 57ms
Time for processing: 0h 0m 21s 233ms
foldseek makepaddedseqdb foldseek_db foldseek_db_pad
makepaddedseqdb foldseek_db foldseek_db_pad
MMseqs Version: d6204679ceef8a559be2e7a92e89760e31fbc21a
Substitution matrix aa:3di.out,nucl:3di.out
Mask residues 0
Mask residues probability 0.999995
Write lookup file 1
Threads 256
Verbosity 3
Cluster search 0
lndb foldseek_db_h foldseek_db_pad_tmp_ss_h
Time for processing: 0h 0m 0s 18ms
lndb foldseek_db_ss foldseek_db_pad_tmp_ss
Time for processing: 0h 0m 0s 14ms
makepaddedseqdb foldseek_db_pad_tmp_ss foldseek_db_pad_ss --sub-mat 'aa:3di.out,nucl:3di.out' --score-bias 0 --mask 0 --mask-prob 0.999995 --mask-lower-case 1 --mask-n-repeat 6 --write-lookup 1 --threads 256 -v 3
[=================================================================] 100.00% 348 0s 43ms
Time for merging to foldseek_db_pad_ss: 0h 0m 1s 552ms
Time for merging to foldseek_db_pad_ss_h: 0h 0m 1s 845ms
Time for processing: 0h 0m 8s 481ms
rmdb foldseek_db_pad_tmp_ss
Time for processing: 0h 0m 0s 8ms
rmdb foldseek_db_pad_tmp_ss_h
Time for processing: 0h 0m 0s 7ms
renamedbkeys foldseek_db_pad_ss.gpu_mapping1 foldseek_db foldseek_db_pad --subdb-mode 1 --threads 256 -v 3
Time for merging to foldseek_db_pad: 0h 0m 0s 10ms
Time for merging to foldseek_db_pad_h: 0h 0m 0s 92ms
Time for processing: 0h 0m 0s 183ms
foldseek_db_pad_h exists and will be overwritten
renamedbkeys foldseek_db_pad_ss.gpu_mapping1 foldseek_db_h foldseek_db_pad_h --subdb-mode 1 --threads 256 -v 3
Time for merging to foldseek_db_pad_h: 0h 0m 0s 11ms
Time for processing: 0h 0m 0s 26ms
As you can see, makepaddedseqdb runs on the database and seems to be doing something with some of the _ss files, though makepaddedseqdb foldseek_db_ss still fails for the same reason.
When I now attempt to run foldseek easy-search, I get a new error:
foldseek easy-search db_concat_01.faa foldseek_db test.m8 tmp --prostt5-model ~/prostt5 --gpu 1
Create directory tmp
easy-search db_concat_01.faa foldseek_db test.m8 tmp --prostt5-model /people/stey877/prostt5 --gpu 1
MMseqs Version: d6204679ceef8a559be2e7a92e89760e31fbc21a
TMscore threshold 0
TMscore threshold mode 0
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Substitution matrix aa:3di.out,nucl:3di.out
Add backtrace false
Alignment mode 3
Alignment mode 0
E-value threshold 10
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias scale 1
Max reject 2147483647
Max accept 2147483647
Preload mode 0
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Threads 256
Compressed 0
Verbosity 3
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 9.5
k-mer length 6
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max results per query 1000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 0
Mask residues probability 0.999995
Mask lower case residues 1
Mask lower letter repeating N times 6
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 1
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
TMalign hit order 0
TMalign fast 1
MultiDomain Mode 1
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Cluster search 0
Exhaustive search mode false
Search iterations 1
Remove temporary files true
Force restart with latest tmp false
MPI runner
Path to ProstT5 /people/stey877/prostt5
Chain name mode 0
Model name mode 0
Createdb extraction mode 0
Interface distance threshold 10
Write mapping file 0
Write Foldcomp 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Report mode 2
Greedy best hits false
createdb db_concat_01.faa tmp/12495472389025653413/query --gpu 1 --prostt5-model /people/stey877/prostt5 --chain-name-mode 0 --model-name-mode 0 --db-extraction-mode 0 --distance-threshold 10 --write-mapping 0 --write-foldcomp 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 256 -v 3
Converting sequences
[304] 0s 456ms
Sort single files in 0h 0m 0s 846ms
Merge all files 0h 0m 0s 482ms
Database type: Aminoacid
CUDA0
CPU
[=================================================================] 100.00% 348 10s 429ms
Time for merging to query_ss: 0h 0m 2s 182ms
Time for merging to query_ss_tmp: 0h 0m 1s 888ms
Time for processing: 0h 0m 20s 923ms
Create directory tmp/12495472389025653413/search_tmp
search tmp/12495472389025653413/query foldseek_db tmp/12495472389025653413/result tmp/12495472389025653413/search_tmp --alignment-mode 3 -s 9.5 -k 6 --gpu 1 --remove-tmp-files 1
ungappedprefilter tmp/12495472389025653413/query_ss foldseek_db_ss tmp/12495472389025653413/search_tmp/16285961332583313377/pref --sub-mat 'aa:3di.out,nucl:3di.out' -c 0 -e 1.79769e+308 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 0.15 --min-ungapped-score 30 --max-seqs 1000 --db-load-mode 0 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 0 --threads 256 --compressed 0 -v 3
terminate called after throwing an instance of 'thrust::THRUST_200500_750_800_860_890_900_NS::system::system_error'
what(): __copy:: D->D: failed: cudaErrorMisalignedAddress: misaligned address
tmp/12495472389025653413/search_tmp/16285961332583313377/structuresearch.sh: line 53: 573305 Aborted (core dumped) $RUNNER "$MMSEQS" ungappedprefilter "${QUERY_PREFILTER}" "${TARGET_PREFILTER}${INDEXEXT}" "${TMP_PATH}/pref" ${UNGAPPEDPREFILTER_PAR}
Error: Ungapped prefilter matching step died
Error: Search died