Skip to content

Commit 5659bfb

Browse files
authored
feat: add a CLI entrypoint for upload_vcf (#4)
1 parent 6f79cba commit 5659bfb

File tree

12 files changed

+215
-15
lines changed

12 files changed

+215
-15
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1+
.DS_Store
12
.vscode/
3+
testdata/
24

35
# Byte-compiled / optimized / DLL files
46
__pycache__/

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,67 @@ The package can be installed with `pip`:
1818
pip install tp53
1919
```
2020

21+
## Upload a VCF to the Seshat TP53 Annotation Server
22+
23+
Upload a VCF to the [Seshat TP53 annotation server](http://vps338341.ovh.net/) using a headless browser.
24+
25+
```bash
26+
❯ python -m tp53.seshat.upload_vcf \
27+
--input "input.vcf" \
28+
29+
```
30+
```console
31+
INFO:tp53.seshat.upload_vcf:Uploading 0 %...
32+
INFO:tp53.seshat.upload_vcf:Uploading 53%...
33+
INFO:tp53.seshat.upload_vcf:Uploading 53%...
34+
INFO:tp53.seshat.upload_vcf:Uploading 60%...
35+
INFO:tp53.seshat.upload_vcf:Uploading 60%...
36+
INFO:tp53.seshat.upload_vcf:Uploading 66%...
37+
INFO:tp53.seshat.upload_vcf:Uploading 66%...
38+
INFO:tp53.seshat.upload_vcf:Uploading 80%...
39+
INFO:tp53.seshat.upload_vcf:Uploading 80%...
40+
INFO:tp53.seshat.upload_vcf:Upload complete!
41+
```
42+
43+
This tool is used to programmatically configure and upload batch variants in VCF format to the Seshat annotation server.
44+
The tool works by building a headless Chrome browser instance and then interacting with the Seshat website directly through simulated key presses and mouse clicks.
45+
Unfortunately, Seshat does not provide a native programmatic API and one could not be reverse engineered.
46+
Seshat also utilizes custom JavaScript in their form processing, so a lightweight approach of simply interacting with the HTML form elements was also not possible.
47+
48+
###### VCF Input Requirements
49+
50+
Seshat will not let the user know why a VCF fails to annotate, but it has been observed that Seshat can fail to parse some of [VarDictJava](https://github.com/AstraZeneca-NGS/VarDictJava)'s structural variants (SVs) as valid variant records.
51+
One solution that has worked in the past is to remove SVs.
52+
The following command will exclude all variants with a non-empty SVTYPE INFO key:
53+
54+
```bash
55+
❯ bcftools view in.vcf --exclude 'SVTYPE!="."' > out.noSV.vcf
56+
```
57+
58+
###### Automation
59+
60+
There are no terms and conditions posted on the Seshat annotation server's website, and there is no server-side `robots.txt` rule set.
61+
In lieu of usage terms, we strongly encourage all users of this script to respect the Seshat resource by adhering to the following best practice:
62+
63+
- **Minimize Load**: Limit the rate of requests to the server
64+
- **Minimize Connections**: Limit the number of concurrent requests
65+
66+
If you need to batch process dozens, or hundreds, of VCF callsets, you may consider improving this underlying Python script to randomize the user agent and IP address of your headless browser session to prevent from being labelled as a bot.
67+
68+
###### Environment Setup
69+
70+
This script relies on Google Chrome:
71+
72+
```console
73+
brew install --cask google-chrome
74+
```
75+
76+
Distributions of MacOS may require you to authenticate the Chrome driver ([link](https://stackoverflow.com/a/60362134)).
77+
2178
## Development and Testing
2279

2380
See the [contributing guide](./CONTRIBUTING.md) for more information.
81+
82+
## References
83+
84+
- [Soussi, Thierry, et al. “Recommendations for Analyzing and Reporting TP53 Gene Variants in the High-Throughput Sequencing Era.” Human Mutation, vol. 35, no. 6, 2014, pp. 766–778., doi:10.1002/humu.22561](https://doi.org/10.1002/humu.22561)

poetry.lock

Lines changed: 12 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ classifiers = [
3131
[tool.poetry.dependencies]
3232
python = "^3.11"
3333
beautifulsoup4 = "~4.12"
34+
chromedriver-py = "*"
3435
google-api-python-client = "~2.151"
3536
google-auth-httplib2 = "~0.2"
3637
google-auth-oauthlib = "~1.2.1"
@@ -123,7 +124,7 @@ exclude = [
123124
]
124125

125126
[[tool.mypy.overrides]]
126-
module = "defopt"
127+
module = "chromedriver_py"
127128
ignore_missing_imports = true
128129

129130
[[tool.mypy.overrides]]

tests/seshat/test_upload.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from tp53.seshat import HumanGenomeAssembly
1+
from tp53.seshat.upload_vcf import HumanGenomeAssembly
22

33

44
def test_human_genome_assembly() -> None:

tp53/seshat/__init__.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1 @@
11
from tp53.seshat._exceptions import SeshatError as SeshatError
2-
from tp53.seshat._gmail_find import find_in_gmail as find_in_gmail
3-
from tp53.seshat._upload import HumanGenomeAssembly as HumanGenomeAssembly
4-
from tp53.seshat._upload import upload_vcf as upload_vcf
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from ._find_in_gmail import find_in_gmail as find_in_gmail
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
if __name__ == "__main__":
2+
...

tp53/seshat/_gmail_find.py renamed to tp53/seshat/find_in_gmail/_find_in_gmail.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
from google_auth_oauthlib.flow import InstalledAppFlow
2121
from googleapiclient.discovery import build as build_google_client
2222

23-
from ._exceptions import SeshatError
23+
from .._exceptions import SeshatError
2424

2525
logger: Logger = getLogger("tp53.seshat")
2626

tp53/seshat/upload_vcf/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from ._upload_vcf import HumanGenomeAssembly as HumanGenomeAssembly
2+
from ._upload_vcf import upload_vcf as upload_vcf

0 commit comments

Comments
 (0)