Skip to content

Commit 9fab52a

Browse files
authored
Merge pull request #31 from ressy/release-0.0.9
Release 0.0.9
2 parents df7ca99 + 81dbc40 commit 9fab52a

File tree

18 files changed

+357
-211
lines changed

18 files changed

+357
-211
lines changed

CHANGELOG.md

+21-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## 0.0.9 - 2021-07-20
4+
5+
### Added
6+
7+
* `--outdir` argument to set output directory other than current working
8+
directory ([#24])
9+
* `--no-collapse` argument (and updates to `request` function) to disable
10+
automatic combining of results across batched submissions ([#25])
11+
12+
### Fixed
13+
14+
* Empty config files now result in the usual error message about required
15+
options ([#30])
16+
* All command-line options now match V-QUEST option names ([#28])
17+
18+
[#30]: https://github.com/ressy/vquest/pull/30
19+
[#28]: https://github.com/ressy/vquest/pull/28
20+
[#25]: https://github.com/ressy/vquest/pull/25
21+
[#24]: https://github.com/ressy/vquest/pull/24
22+
323
## 0.0.8 - 2021-07-13
424

525
### Fixed
@@ -33,7 +53,7 @@
3353

3454
### Added
3555

36-
* `--align` argument (via `airr_to_fasta` function) for exraction of sequence
56+
* `--align` argument (via `airr_to_fasta` function) for extraction of sequence
3757
alignment FASTA from AIRR results ([#1])
3858
* Error messages sent by the server are now raised as an exception containing
3959
the server-provided message(s) ([#7])

README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[IMGT](http://imgt.org)'s [V-QUEST](http://www.imgt.org/IMGT_vquest/analysis)
66
is only available via a web interface. This Python package automates V-QUEST
7-
usage by submitting request data like the web form does. Curently only the
7+
usage by submitting request data like the web form does. Currently only the
88
"Download AIRR formatted results" option is supported.
99

1010
Example command-line usage, with rhesus sequences in seqs.fasta:
@@ -13,7 +13,7 @@ Example command-line usage, with rhesus sequences in seqs.fasta:
1313
vquest --species rhesus-monkey --receptorOrLocusType IG --fileSequences seqs.fasta
1414

1515
The output is saved to `Parameters.txt` and `vquest_airr.tsv` (the files
16-
V-QUEST provides in a zip archive) in the working directory.
16+
V-QUEST provides in a zip archive) in the working directory by default.
1717

1818
Or with `--align` to automatically extract the alignment as FASTA:
1919

@@ -33,14 +33,14 @@ Here the output is a dictionary of filenames to contents.
3333

3434
The only required options are species, receptorOrLocusType, and either
3535
fileSequences or sequences (to provide sequences directly as text). Options
36-
can be given via command-line arguemnts or one or more YAML configuration
36+
can be given via command-line arguments or one or more YAML configuration
3737
files. See [data/defaults.yml](data/defaults.yml) and `./vquest.py --help` for
3838
details.
3939

4040
The web form will only accept 50 sequences at a time, so the sequences given
41-
here are grouped into chunks of 50, submitted, and the results combined. A
42-
delay (default 1 second) is used between submissions to avoid being impolite to
43-
the server.
41+
here are grouped into chunks of 50, submitted, and (by default) the results
42+
automatically combined. A delay (default 1 second) is used between submissions
43+
to avoid being impolite to the server.
4444

4545
* V-QUEST: <http://www.imgt.org/IMGT_vquest/analysis>
4646
* V-QUEST docs: <http://www.imgt.org/IMGT_vquest/user_guide#intro>

test_vquest/data/test_vquest/TestVquestCustom/config.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
species: rhesus-monkey
22
receptorOrLocusType: IG
3-
v_regionsearchindel: true
3+
V_REGIONsearchIndel: true
44
sequences: |
55
>IGKV2-ACR*02
66
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
species: rhesus-monkey
2+
receptorOrLocusType: IG
3+
V_REGIONsearchIndel: true
4+
resultType: excel
5+
xv_outputtype: 3
6+
sequences: |
7+
>IGKV2-ACR*02
8+
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
9+
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
10+
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
11+
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Date Wed Dec 02 19:18:14 CET 2020
2+
IMGT/V-QUEST program version 3.5.21
3+
IMGT/V-QUEST reference directory release 202049-2
4+
Species Macaca mulatta
5+
Receptor type or locus IG
6+
IMGT/V-QUEST reference directory set F+ORF+ in-frame P
7+
Search for insertions and deletions yes
8+
Nb of nucleotides to add (or exclude) in 3' of the V-REGION for the evaluation of the alignment score 0
9+
Nb of nucleotides to exclude in 5' of the V-REGION for the evaluation of the nb of mutations 0
10+
Analysis of scFv no
11+
Number of submitted sequences 1
12+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
sequence_id sequence sequence_aa rev_comp productive complete_vdj vj_in_frame stop_codon locus v_call d_call j_call c_call sequence_alignment sequence_alignment_aa germline_alignment germline_alignment_aa junction junction_aa np1 np1_aa np2 np2_aa cdr1 cdr1_aa cdr2 cdr2_aa cdr3 cdr3_aa fwr1 fwr1_aa fwr2 fwr2_aa fwr3 fwr3_aa fwr4 fwr4_aa v_score v_identity v_support v_cigar d_score d_identity d_support d_cigar j_score j_identity j_support j_cigar c_score c_identity c_support c_cigar v_sequence_start v_sequence_end v_germline_start v_germline_end v_alignment_start v_alignment_end d_sequence_start d_sequence_end d_germline_start d_germline_end d_alignment_start d_alignment_end j_sequence_start j_sequence_end j_germline_start j_germline_end j_alignment_start j_alignment_end cdr1_start cdr1_end cdr2_start cdr2_end cdr3_start cdr3_end fwr1_start fwr1_end fwr2_start fwr2_end fwr3_start fwr3_end fwr4_start fwr4_end v_sequence_alignment v_sequence_alignment_aa d_sequence_alignment d_sequence_alignment_aa j_sequence_alignment j_sequence_alignment_aa c_sequence_alignment c_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_germline_alignment d_germline_alignment_aa j_germline_alignment j_germline_alignment_aa c_germline_alignment c_germline_alignment_aa junction_length junction_aa_length np1_length np2_length n1_length n2_length p3v_length p5d_length p3d_length p5j_length consensus_count duplicate_count cell_id clone_id rearrangement_id repertoire_id rearrangement_set_id sequence_analysis_category d_number 5prime_trimmed_n_nb 3prime_trimmed_n_nb insertions deletions junction_decryption
2+
IGKV2-ACR*02 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagtgacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtttccaaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc F F IGK Macmul IGKV2S20*01 F gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP cagagcctcttggatagtgacgggtacacctgt QSLLDSDGYTC gaggtttcc EVS atgcaaagtatagagtttcctcc MQSIEFP gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagt DIVMTQTPLSLPVTPGEPASISCRSS ttggactggtacctgcagaagccaggccagtctccacagctcctgatctat LDWYLQKPGQSPQLLIY aaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgt NRVSGVPDRFSGSGSXTDFTLKISRVEAEDVGVYYC 1294 93.20 2=1X32=1X17=1X42=3D2=1X2=2X6=1X6=1X34=1X1=1X4=1X19=1X12=1X25=1M25=1X1=1X5=1X17=1X8=1X6=1X9=1X6= 1 302 1 335 1 335 79 111 163 171 280 302 1 78 112 162 172 279 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP 0 0 0 0 0 0 2 (indelcorr) 0 0 0 in CDR1-IMGT, from codon 33 of V-REGION: 3 nucleotides (from position 97 in the user submitted sequence), (do not cause frameshift)

test_vquest/data/test_vquest/TestVquestEmpty/config.yml

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
species: rhesus-monkey
2+
receptorOrLocusType: antibody # not valid!
3+
sequences: |
4+
>IGKV2-ACR*02
5+
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
6+
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
7+
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
8+
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
species: rhesus-monkey
2+
receptorOrLocusType: antibody # not valid!
3+
resultType: excel
4+
xv_outputtype: 3
5+
sequences: |
6+
>IGKV2-ACR*02
7+
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
8+
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
9+
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
10+
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
species: rhesus-monkey
2+
receptorOrLocusType: IG
3+
resultType: excel
4+
xv_outputtype: 3
5+
sequences: |
6+
>IGKV2-ACR*02
7+
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
8+
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
9+
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
10+
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Date Tue Dec 01 22:08:11 CET 2020
2+
IMGT/V-QUEST program version 3.5.21
3+
IMGT/V-QUEST reference directory release 202049-2
4+
Species Macaca mulatta
5+
Receptor type or locus IG
6+
IMGT/V-QUEST reference directory set F+ORF+ in-frame P
7+
Search for insertions and deletions no
8+
Nb of nucleotides to add (or exclude) in 3' of the V-REGION for the evaluation of the alignment score 0
9+
Nb of nucleotides to exclude in 5' of the V-REGION for the evaluation of the nb of mutations 0
10+
Analysis of scFv no
11+
Number of submitted sequences 1
12+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
sequence_id sequence sequence_aa rev_comp productive complete_vdj vj_in_frame stop_codon locus v_call d_call j_call c_call sequence_alignment sequence_alignment_aa germline_alignment germline_alignment_aa junction junction_aa np1 np1_aa np2 np2_aa cdr1 cdr1_aa cdr2 cdr2_aa cdr3 cdr3_aa fwr1 fwr1_aa fwr2 fwr2_aa fwr3 fwr3_aa fwr4 fwr4_aa v_score v_identity v_support v_cigar d_score d_identity d_support d_cigar j_score j_identity j_support j_cigar c_score c_identity c_support c_cigar v_sequence_start v_sequence_end v_germline_start v_germline_end v_alignment_start v_alignment_end d_sequence_start d_sequence_end d_germline_start d_germline_end d_alignment_start d_alignment_end j_sequence_start j_sequence_end j_germline_start j_germline_end j_alignment_start j_alignment_end cdr1_start cdr1_end cdr2_start cdr2_end cdr3_start cdr3_end fwr1_start fwr1_end fwr2_start fwr2_end fwr3_start fwr3_end fwr4_start fwr4_end v_sequence_alignment v_sequence_alignment_aa d_sequence_alignment d_sequence_alignment_aa j_sequence_alignment j_sequence_alignment_aa c_sequence_alignment c_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_germline_alignment d_germline_alignment_aa j_germline_alignment j_germline_alignment_aa c_germline_alignment c_germline_alignment_aa junction_length junction_aa_length np1_length np2_length n1_length n2_length p3v_length p5d_length p3d_length p5j_length consensus_count duplicate_count cell_id clone_id rearrangement_id repertoire_id rearrangement_set_id sequence_analysis_category d_number 5prime_trimmed_n_nb 3prime_trimmed_n_nb insertions deletions junction_decryption
2+
IGKV2-ACR*02 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagtgacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtttccaaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc F F IGK Macmul IGKV2S20*01 F gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP cagagcctcttggatagtgacgggtacacctgt QSLLDSDGYTC gaggtttcc EVS atgcaaagtatagagtttcctcc MQSIEFP gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagt DIVMTQTPLSLPVTPGEPASISCRSS ttggactggtacctgcagaagccaggccagtctccacagctcctgatctat LDWYLQKPGQSPQLLIY aaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgt NRVSGVPDRFSGSGSXTDFTLKISRVEAEDVGVYYC 1294 93.20 2=1X32=1X17=1X42=3D2=1X2=2X6=1X6=1X34=1X1=1X4=1X19=1X12=1X25=1M25=1X1=1X5=1X17=1X8=1X6=1X9=1X6= 1 302 1 335 1 335 79 111 163 171 280 302 1 78 112 162 172 279 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP 0 0 0 0 0 0 1 (noindelsearch) 0 0 0

test_vquest/test_util.py

+35
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,38 @@ def test_chunker(self):
2424
for chunk in util.chunker(range(5), 5):
2525
chunks.append(chunk)
2626
self.assertEqual([[0, 1, 2, 3, 4]], chunks)
27+
28+
class TestUnzip(unittest.TestCase):
29+
"""Basic test of the unzip helper."""
30+
31+
def test_unzip(self):
32+
"""Test that binary ZIP data with an empty file can be extracted."""
33+
self.assertEqual(
34+
util.unzip(bytes.fromhex(
35+
"504b03040a0000000000ab6c"
36+
"ef5200000000000000000000"
37+
"000008001c00746573742e64"
38+
"617455540900035272f06052"
39+
"72f06075780b000104e90300"
40+
"0004e9030000504b01021e03"
41+
"0a0000000000ab6cef520000"
42+
"000000000000000000000800"
43+
"18000000000000000000b481"
44+
"00000000746573742e646174"
45+
"55540500035272f06075780b"
46+
"000104e903000004e9030000"
47+
"504b05060000000001000100"
48+
"4e000000420000000000")),
49+
{"test.dat": b""})
50+
51+
class TestAirrToFasta(unittest.TestCase):
52+
"""Basic test of the airr_to_fastas helper."""
53+
54+
def test_airr_to_fasta(self):
55+
"""Test that FASTA is generated from AIRR TSV."""
56+
expected = ">1\nACTG\n>2\nCGTA\n"
57+
observed = util.airr_to_fasta(
58+
"sequence_id\tsequence\tsequence_alignment\n"
59+
"1\tACTG\tACTG\n"
60+
"2\t\tCGTA\n")
61+
self.assertEqual(observed, expected)

0 commit comments

Comments
 (0)