Skip to content

Commit 18bda23

Browse files
authored
Merge pull request #49 from ygidtu/dev
Dev
2 parents 51700c2 + 624a7cd commit 18bda23

18 files changed

Lines changed: 97 additions & 56 deletions

README.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ sashimi.py is a tool for visualizing various next-generation sequencing (NGS) da
1414
3. Visualize coverage by heatmap, including HiC diagram
1515
4. Visualize protein domain based the given gene id
1616
5. Demultiplex the single-cell RNA/ATAC-seq which used cell barcode into cell population
17-
6. Support visualizing individual full-length reads in IGV-like style
17+
6. Support visualizing individual full-length reads in read-by-read style
1818
7. Support visualize circRNA sequencing data
1919

2020
## Input
@@ -26,7 +26,7 @@ sashimi.py supports almost NGS data format, including
2626
- bigBed
2727
- bigWig
2828
- Depth file generated by `samtools depth`
29-
- naive HiC format
29+
- naive Hi-C format
3030

3131

3232
## Output
@@ -37,17 +37,16 @@ and each track on output corresponds these datasets from config file.
3737
## Usage
3838

3939
The sashimi.py is written in Python, and user could install it in a variety of ways as follows
40-
1. install from pipy
41-
40+
1. install from PiPy
41+
4242
```bash
4343
pip install sashimi.py
44-
45-
# or install from source
46-
python setup.py install
47-
48-
sashimipy --help
4944
```
50-
2. using docker image
45+
2. install from bioconda
46+
```bash
47+
conda install -c bioconda sashimi-py
48+
```
49+
3. using docker image
5150
```bash
5251
docker pull ygidtu/sashimi
5352
docker run --rm ygidtu/sashimi --help
@@ -61,7 +60,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
6160
docker run --rm ygidtu/sashimi --help
6261
```
6362

64-
3. install from source code
63+
4. install from source code
6564

6665
```bash
6766
git clone https://github.com/ygidtu/sashimi.py sashimi
@@ -73,7 +72,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
7372
python main.py --help
7473
```
7574

76-
4. running from a local webserver
75+
5. running from a local webserver
7776

7877
```bash
7978
git clone https://github.com/ygidtu/sashimi.py sashimi
@@ -89,7 +88,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
8988
python server.py --help
9089
```
9190

92-
5. for `pipenv` users
91+
6. for `pipenv` users
9392

9493
```bash
9594
git clone https://github.com/ygidtu/sashimi.py

docs/command.md

Lines changed: 55 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ Options:
242242
243243
### Common options
244244
245-
1. `--color-factor`: the index of column to set colors
245+
1.`--color-factor`: the index of column to set colors
246246
247247
- basic usage: the input file list as follows,
248248
@@ -265,8 +265,8 @@ Then the `--color-factor 2` means sashimi assign red color to LUAD and "#000000"
265265
266266
### Output options
267267
268-
1. `-o, --output`: the path to output file, the common image format such as pdf, png, jpg and svg are supported.
269-
2. `--backend`: the backend is used to switch matplotlib plotting backend,
268+
1.`-o, --output`: the path to output file, the common image format such as pdf, png, jpg and svg are supported.
269+
2.`--backend`: the backend is used to switch matplotlib plotting backend,
270270
271271
**known issues: **
272272
@@ -283,7 +283,7 @@ The recommended combination of backend and image formats please check [matplotli
283283
284284
### Reference plot
285285
286-
1. `--domain`: fetch domain information from uniprot and ensemble, then map amino acid coordinate into genomic coordinate.
286+
1.`--domain`: fetch domain information from uniprot and ensemble, then map amino acid coordinate into genomic coordinate.
287287
288288
For each transcript, sashimi firstly get the uniprot id from [uniprot website]("https://rest.uniprot.org/uniprotkb/search?&query=ENST00000380276&format=xml") and check whether the length of protein is one third of CDS length. If yes, then fetch the uniprot information from [ebi](f"https://www.ebi.ac.uk/proteins/api/features/U2AF35a").
289289
@@ -292,15 +292,15 @@ The sashimi will present these domains from ['DOMAIN_AND_SITES', 'MOLECULE_PROCE
292292
293293
![](imgs/cmd/domain.png)
294294
295-
2. `--local-domain`: load domain information from a folder that contains bigbed files which download from [UCSC](https://hgdownload.soe.ucsc.edu/gbdb/hg38/uniprot/)
295+
2.`--local-domain`: load domain information from a folder that contains bigbed files which download from [UCSC](https://hgdownload.soe.ucsc.edu/gbdb/hg38/uniprot/)
296296
297297
In order to facilitate these people from poor network regions, Sashimi also provides a local mode for domain visualization. First, the user must download the corresponding reference from UCSC, and collect all bigbed file into a folder which could pass to sashimi with `--local-domain`.
298298
299299
But the bigbed file from UCSC didn't provide a transcript or uniprot id, Sashimi couldn't map the protein information into the corresponding transcript id.
300300
301301
![](imgs/cmd/local_domain.png)
302302
303-
3. `--interval`: add additional feature track into reference.
303+
3.`--interval`: add additional feature track into reference.
304304
305305
In addition to fetch genomic feature from GTF or GFF file, Sashimi also provides a flexible way to load other features into reference track.
306306
And user could prepare and record custom annotation information into a config file, like this
@@ -330,7 +330,7 @@ example/bws/2.bw bw bw green
330330
example/bams/sc.bam bam sc
331331
```
332332
333-
1. `--customized-junction`
333+
1.`--customized-junction`
334334
335335
This parameter is used to add user defined junctions
336336
@@ -344,7 +344,7 @@ chr1:1000-20000 100 200
344344
- the columns corresponding to input files in file list.
345345
- the table were filled with junction counts.
346346
347-
2. `--show-site` and `--show-strand`
347+
2.`--show-site` and `--show-strand`
348348
349349
These two parameters were used to show the density of reads starts by forward and reverse strand separately.
350350
@@ -366,24 +366,24 @@ python main.py \
366366
367367
#### Single cell bam related parameters
368368
369-
1. `--barcode`
369+
1.`--barcode`
370+
371+
Provide a manually curated barcode list to separate bam files by cell types or other groups.
372+
373+
This barcode list as follows:
374+
375+
```bash
376+
#bam barcode cell_type(optional) cell_type(optional)
377+
sc AAACCTGCACCTCGTT-1 AT2 #A6DCC2
378+
sc AAAGATGTCCGAATGT-1 AT2 #A6DCC2
379+
sc AAAGCAATCGTACGGC-1 AT2 #A6DCC2
380+
```
370381
371-
Provide a manually curated barcode list to separate bam files by cell types or other groups.
372-
373-
This barcode list as follows:
374-
375-
```bash
376-
#bam barcode cell_type(optional) cell_type(optional)
377-
sc AAACCTGCACCTCGTT-1 AT2 #A6DCC2
378-
sc AAAGATGTCCGAATGT-1 AT2 #A6DCC2
379-
sc AAAGCAATCGTACGGC-1 AT2 #A6DCC2
380-
```
381-
382-
2. `--barocde-tag` and `--umi-tag`
382+
2.`--barocde-tag` and `--umi-tag`
383383
384-
3. The tag to extract barcode and umi from each reads record, here we take the 10x Genomics bam format as default.
384+
3.The tag to extract barcode and umi from each reads record, here we take the 10x Genomics bam format as default.
385385
386-
4. `--group-by-cell`
386+
4.`--group-by-cell`
387387
388388
Group by cell types in density/line plot.
389389
@@ -411,7 +411,7 @@ The line plot is simply another format of density plots.
411411
The input file list as same as density plots
412412
413413
414-
1. `--hide-legend`, `--legend-position` and `--legend-ncol`
414+
1.`--hide-legend`, `--legend-position` and `--legend-ncol`
415415
416416
These three parameters were used to disable legend, modify legend position and the columns of legend separately.
417417
@@ -502,9 +502,9 @@ example/bws/0.bw bw bw YlOrBr
502502
### Igv plot
503503
504504
505-
1. Sashimi.igv module support different format file as input.
505+
1.Sashimi.igv module support different format file as input.
506506
507-
An Igv-like plot provides a landscape of aligned reads in a straight and convenient way.
507+
A read-by-read plot provides a landscape of aligned reads in a straight and convenient way.
508508
509509
User could pass bed and bam file into Sashimi, and the input config file list as follows
510510
@@ -531,7 +531,7 @@ python main.py \
531531
532532
![](imgs/cmd/igv_plot.1.png)
533533
534-
2. Sashimi.igv module load and visualize features from bam tags.
534+
2.Sashimi.igv module load and visualize features from bam tags.
535535
536536
In this topic, Sashimi.igv could load m6A modification (tag, ma:i) and length of polyA (tag, pa:f) tag from bam file, and then present it on each reads.
537537
@@ -563,7 +563,7 @@ here is the command line,
563563
In this picture, the red track and blue dot represents the length of poly(A) and m6a modification respectively,
564564
![](imgs/cmd/igv_plot.2.png)
565565
566-
3. Sashimi.igv module also allow sort these reads by specific alternative exon
566+
3.Sashimi.igv module also allow sort these reads by specific alternative exon
567567
568568
User could modify the config file as follows,
569569
@@ -619,7 +619,7 @@ for each hic track, a bigger `depth` means a higher y-axis.
619619
620620
Because `Li_et_al_2015.h5` doesn't contain chromosome 1, user could download a new toy dataset and add into example picture.
621621
622-
1. download hic file and convert into h5 format
622+
1.download hic file and convert into h5 format
623623
624624
```bash
625625
wget https://encode-public.s3.amazonaws.com/2016/12/01/a241cba5-df2e-45fb-9a8f-5af5587fb02a/ENCFF121YPY.hic
@@ -639,13 +639,13 @@ cooler cload pairix -p 16 hg38.chrom.sizes:1000 ENCFF931NQV.pairs.gz ENCFF931NQV
639639
hicConvertFormat -m ENCFF121YPY_1000.cool --inputFormat cool --outputFormat h5 -o ENCFF121YPY.h5
640640
```
641641
642-
2. prepare the config file
642+
2.prepare the config file
643643
644644
```bash
645645
# filepath file_category label color transform depth
646646
example/ENCFF718AWL.h5 hic ENCFF718AWL RdYlBu_r log2 30000
647647
```
648-
3. run Sashimi
648+
3.run Sashimi
649649
650650
```bash
651651
python main.py \
@@ -661,6 +661,28 @@ python main.py \
661661
```
662662
here is the [results](https://github.com/ygidtu/sashimi/blob/dev/example/hic.example.pdf).
663663
664+
## circRNA plot
665+
666+
The linear and circRNA raw data were downloaded from [PRJNA541935](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA541935).
667+
668+
669+
The command for generating a circRNA coverage plot with highlight the back-splice junction
670+
671+
```bash
672+
python sashimi.py/main.py \
673+
-e chr1:925421-944308:+ \
674+
--density example/circRNA.tsv \
675+
--stroke 937113-937713@blue \
676+
-o circRNA.pdf \
677+
--dpi 300 \
678+
--width 10 \
679+
--height 1 \
680+
-t 10 \
681+
-r Homo_sapiens.GRCh38.101.gtf \
682+
--link 925921-943808
683+
```
684+
![](imgs/cmd/circRNA.png)
685+
664686
665687
## Motif plot
666688
@@ -678,15 +700,15 @@ python main.py \
678700
--motif-region 1270756-1270760
679701
```
680702
The motif weight matrix should be customized bedGraph format as follows:
703+
681704
```bash
682705
# chromosome start end A_weight T_weight C_weight G_weight
683706
chr1 100 101 0.1 0.2 -0.3 -0.4
684707
```
685708
686709
Then, bgzipped && tabix indexed
687710
688-
here is the [results](imgs/cmd/motif.png).
689-
711+
here is the [result](imgs/cmd/motif.png).
690712
691713
692714
### Additional annotation

docs/imgs/cmd/circRNA.png

319 KB
Loading

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@
66

77
---
88

9-
The full-featured example
9+
## Get started
1010

11-
![](imgs/example.png)
11+
To learn Snakemake, please follow the [Tutorial](https://sashimi.readthedocs.io/en/latest/command/)

docs/installation.md

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,18 +18,32 @@ python setup.py install
1818
sashimipy --help
1919
```
2020

21-
### Run as script
22-
1. using pipenv
21+
or
22+
2323
```bash
24-
pipenv install
25-
pipenv run python main.py --help
24+
## via Conda
25+
conda install sashimi-py
26+
27+
## via Docker
28+
docker pull quay.io/biocontainers/sashimi-py
29+
30+
## via PyPI
31+
pip install sashimi.py
32+
2633
```
2734

35+
### Run as script
36+
1. using pipenv
37+
```bash
38+
pipenv install
39+
pipenv run python main.py --help
40+
```
41+
2842
2. using python
29-
```bash
30-
pip install -r requirements.txt
31-
python main.py
32-
```
43+
```bash
44+
pip install -r requirements.txt
45+
python main.py
46+
```
3347

3448
** Note: **
3549
If there is any problem with installation of `cairocffi`

example/circRNA.tsv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
example/circRNA_bam/SRR9029986.bam bam Linear_RNA_rep1 #00AFBB
2+
example/circRNA_bam/SRR9029988.bam bam Linear_RNA_rep2 #00AFBB
3+
example/circRNA_bam/SRR9029992.bam bam Linear_RNA_rep3 #00AFBB
4+
example/circRNA_bam/SRR9029993.bam bam Circular_RNA_rep1 #FC4E07
5+
example/circRNA_bam/SRR9029994.bam bam Circular_RNA_rep2 #FC4E07
6+
example/circRNA_bam/SRR9029995.bam bam Circular_RNA_rep3 #FC4E07

example/circRNA_bam/SRR9029986.bam

67.1 KB
Binary file not shown.
2.07 KB
Binary file not shown.

example/circRNA_bam/SRR9029988.bam

138 KB
Binary file not shown.
2.07 KB
Binary file not shown.

0 commit comments

Comments
 (0)