Skip to content

Commit 51926c4

Browse files
committed
feat: add a CLI entrypoint for find_in_gmail
1 parent 4f15064 commit 51926c4

File tree

8 files changed

+324
-11
lines changed

8 files changed

+324
-11
lines changed

README.md

Lines changed: 110 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ The package can be installed with `pip`:
1818
pip install tp53
1919
```
2020

21-
## Upload a VCF to the Seshat TP53 Annotation Server
21+
## Upload a VCF to Seshat
2222

2323
Upload a VCF to the [Seshat TP53 annotation server](http://vps338341.ovh.net/) using a headless browser.
2424

2525
```bash
2626
❯ python -m tp53.seshat.upload_vcf \
27-
--input "input.vcf" \
27+
--input "sample.library.vcf" \
2828
2929
```
3030
```console
@@ -52,7 +52,9 @@ One solution that has worked in the past is to remove SVs.
5252
The following command will exclude all variants with a non-empty SVTYPE INFO key:
5353

5454
```bash
55-
❯ bcftools view in.vcf --exclude 'SVTYPE!="."' > out.noSV.vcf
55+
❯ bcftools view sample.library.vcf \
56+
--exclude 'SVTYPE!="."' \
57+
> sample.library.noSV.vcf
5658
```
5759

5860
###### Automation
@@ -75,6 +77,111 @@ This script relies on Google Chrome:
7577

7678
Distributions of MacOS may require you to authenticate the Chrome driver ([link](https://stackoverflow.com/a/60362134)).
7779

80+
## Download a Seshat Annotation from Gmail
81+
82+
Download [Seshat](http://vps338341.ovh.net/) VCF annotations by awaiting a server-generated email.
83+
84+
```bash
85+
❯ python -m tp53.seshat.find_in_gmail \
86+
--input "sample.library.vcf" \
87+
--output "sample.library" \
88+
--credentials "~/.secrets/credentials.json"
89+
```
90+
```console
91+
INFO:tp53.seshat.find_in_gmail:Successfully logged into the Gmail service.
92+
INFO:tp53.seshat.find_in_gmail:Querying for a VCF named: sample.library.vcf
93+
INFO:tp53.seshat.find_in_gmail:Searching Gmail messages with: sample.library.vcf from:[email protected] newer_than:5h subject:"Results of batch analysis"
94+
INFO:tp53.seshat.find_in_gmail:Message found with the following metadata: {'id': '193c310d2714b389', 'threadId': '193c30b7244e2662'}
95+
INFO:tp53.seshat.find_in_gmail:Message contents are as follows:
96+
INFO:tp53.seshat.find_in_gmail: >>> Results of batch analysis
97+
INFO:tp53.seshat.find_in_gmail: >>> Analyzed batch file:
98+
INFO:tp53.seshat.find_in_gmail: >>> sample.library.vcf
99+
INFO:tp53.seshat.find_in_gmail: >>> Time taken to run the analysis:
100+
INFO:tp53.seshat.find_in_gmail: >>> 0 minutes 10 seconds
101+
INFO:tp53.seshat.find_in_gmail: >>> Summary:
102+
INFO:tp53.seshat.find_in_gmail: >>> The input file contained
103+
INFO:tp53.seshat.find_in_gmail: >>> 23 mutations out of which
104+
INFO:tp53.seshat.find_in_gmail: >>> 23 were TP53 mutations.
105+
INFO:tp53.seshat.find_in_gmail:Writing attachment to ZIP archive: sample.library.vcf.seshat.zip
106+
INFO:tp53.seshat.find_in_gmail:Extracting ZIP archive: sample.library.vcf.seshat.zip
107+
INFO:tp53.seshat.find_in_gmail:Output file renamed to: sample.library.seshat.short-20241214_034753_129732.tsv
108+
INFO:tp53.seshat.find_in_gmail:Output file renamed to: sample.library.seshat.long-20241214_034753_217420.tsv
109+
```
110+
111+
This tool is used to programmatically wait for, and retrieve, a batch results email from the Seshat TP53 annotation server.
112+
113+
The tool works by searching a user-controlled Gmail inbox for a recent Seshat email that contains the result annotations for a given VCF input file, by name.
114+
It is critically important to be aware that there is no way to prove which annotation files, as they arrive via email, are to be linked with which VCF file on disk.
115+
116+
This tool assists in the correct pairing of VCF input files, and subsequent annotation files, by letting you specify how many hours back in time you will let the Gmail query search (`--newer-than`).
117+
Limiting the window of time in which an email should have arrived minimizes the chance of discovering stale annotation files from an old Seshat execution in the cases where VCF filenames may be non-unique.
118+
If the batch results email from the Seshat annotation server has not yet arrived, this tool will wait a set number of seconds (`--wait-for`) before exiting with exception.
119+
It normally takes less than 1 minute for the Seshat server to annotate an average TP53-only VCF.
120+
121+
###### Search Criteria
122+
123+
The following rules are used to find annotation files:
124+
125+
1. The email contains the filename of the input VCF
126+
2. The email subject line must contain "Results of batch analysis"
127+
3. The email is at least `--newer-than` hours old
128+
4. The email is from the address [[email protected]](mailto:[email protected])
129+
130+
###### Outputs:
131+
132+
- `<output>.seshat.long-\\d{8}_\\d{6}_\\d{6}.tsv`: The long format Seshat annotations for the input VCF
133+
- `<output>.seshat.short-\\d{8}_\\d{6}_\\d{6}.tsv`: The short format Seshat annotations for the input VCF
134+
- `<output>.seshat.zip`: The original ZIP archive from Seshat
135+
136+
###### Server Failures
137+
138+
If Seshat fails to annotate the VCF file but still emails the user a response, then this tool will emit the email body to STDERR and exit with a non-zero status.
139+
140+
#### Gmail Authentication
141+
142+
After installing all Python dependencies, you must create a Google developer's OAuth file.
143+
First-time 2FA may be required depending on the configuration of your Gmail service.
144+
If 2FA is required, then this script will block until you acknowledge your 2FA prompt.
145+
A 2FA prompt is often delivered through an auto-opening web browser.
146+
147+
To create a Google developer's OAuth file, navigate to the following URL and follow the instructions.
148+
149+
- [Authorize Credentials for a Desktop Application](https://developers.google.com/gmail/api/quickstart/python#authorize_credentials_for_a_desktop_application)
150+
151+
Ensure your OAuth file is configured as a "Desktop app" and then download the credentials as JSON.
152+
Save your credentials file somewhere safe, ideally in a secure user folder with restricted permissions (`chmod 700`).
153+
Set your OAuth file permissions to also restrict unwarranted access (`chmod 600`).
154+
155+
This script will store a cached token after first-time authentication is successful.
156+
This cached token can be found in the user's home directory within a hidden directory.
157+
Token caching greatly speeds up continued executions of this script.
158+
As of now, the token is cached at the following location:
159+
160+
```bash
161+
"~/.tp53/seshat/seshat-gmail-find-token.pickle"
162+
```
163+
164+
If the cached token is missing, or becomes stale, then you will need to provide your OAuth credentials file.
165+
166+
A typical Google developer's OAuth file is of the format:
167+
168+
```console
169+
{
170+
"installed": {
171+
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
172+
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
173+
"client_id": "272111863110-csldkfjlsdkfjlksdjflksdincie.apps.googleusercontent.com",
174+
"client_secret": "sdlfkjsdlkjfijciejijcei",
175+
"project_id": "gmail-access-2398293892838",
176+
"redirect_uris": [
177+
"urn:ietf:wg:oauth:2.0:oob",
178+
"http://localhost"
179+
],
180+
"token_uri": "https://oauth2.googleapis.com/token"
181+
}
182+
}
183+
```
184+
78185
## Development and Testing
79186

80187
See the [contributing guide](./CONTRIBUTING.md) for more information.

0 commit comments

Comments
 (0)