Skip to content

Commit b46bcee

Browse files
committed
created hess 0.5
1 parent 6ff73e9 commit b46bcee

76 files changed

Lines changed: 190002 additions & 1109 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 4 additions & 230 deletions
Original file line numberDiff line numberDiff line change
@@ -1,232 +1,6 @@
11
## HESS (Heritability Estimation from Summary Statistics)
22

3-
HESS estimates the amount of variance in trait explained by typed SNPs at
4-
each single locus on the genome (local SNP-heritability) from GWAS summary
5-
statistics, while accounting for linkage disequilibrium (LD).
6-
7-
---
8-
9-
#### Releases
10-
11-
[version 0.3-beta](https://github.com/huwenboshi/hess/releases/tag/v0.3-beta)
12-
13-
---
14-
15-
#### Software requirement
16-
17-
HESS requires [NumPy](http://www.numpy.org/) and
18-
[Python 2.7](https://www.python.org/download/releases/2.7/).
19-
We recommend using [NumPy with Intel MKL](
20-
https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl)
21-
for maximum speed.
22-
23-
---
24-
25-
#### <a name="input_file_format"></a> Input file format
26-
27-
HESS requires as input
28-
(1) GWAS summary statistics
29-
(2) reference panel matching the GWAS population
30-
(3) bed files specifying start and end positions of each locus.
31-
32-
###### Summary statistics
33-
34-
**To improve computational efficiency and parallelizability, HESS requires
35-
that users split summary statistics into chromosomes**. For each SNP, HESS
36-
requires 6 information (in the listed order): (1) rs ID (2) position
37-
(3) reference allele (4) alternative allele
38-
(5) Z-score (6) sample size. HESS internally filters out strand-ambiguous
39-
SNPs and flips signs of Z-scores based on alleles in the reference panel.
40-
However, user awareness of these details are highly recommended. The
41-
following is an example of summary statistics file.
42-
43-
```
44-
rsID pos A0 A1 Z-score N
45-
rs1000 29321 G A -1.6434 89834
46-
rs1001 29478 T C -0.0152 91021
47-
rs1002 30500 G A 0.7238 95831
48-
```
49-
50-
###### Input checklist
51-
52-
Although HESS provides functionality to filter and sort SNPs, we recommend
53-
that users go through the following checklist before applying HESS.
54-
55-
1. Make sure that the coordinate of SNP positions in the summary
56-
statistics file matches the reference panel (NCBI b37).
57-
2. Make sure that strand-ambiguous SNPs (SNPs with alleles A/T or C/G)
58-
are removed.
59-
3. Make sure that summary statistics are split into chromosomes and
60-
that SNPs are sorted by their positions.
61-
62-
###### Reference panel
63-
64-
1000 Genomes Project (phase 3) reference panel for SNPs with MAF > 5% in the
65-
EUR population can be downloaded
66-
[here](https://drive.google.com/open?id=0B0OmLzMQAvWqc3FPcVRDWkdvc2c).
67-
68-
###### Partition file (bed format)
69-
70-
Can be downloaded [here](https://bitbucket.org/nygcresearch/ldetect-data/src).
71-
72-
---
73-
74-
#### Pipeline
75-
76-
HESS estimates local heritability in 2 steps. In step 1, HESS computes
77-
the eigenvalues of LD matrices, and the squared projections of GWAS effect
78-
size vector onto the eigenvectors of LD matrices. In step 2, HESS computes
79-
local SNP heritability estimates and their standard errors, using results
80-
from step 1.
81-
82-
###### Step 1 - compute eigenvalues and projections
83-
84-
In this step, HESS computes the eigenvalues of LD matrices, and the squared
85-
projections of GWAS effect size vector onto the eigenvectors of LD matrices.
86-
The following code snippet illustrates the 1st step of HESS.
87-
88-
```{r, engine='sh', count_lines}
89-
# this for loop can be parallelized, i.e. one CPU for each chromosome
90-
for i in $(seq 22)
91-
do
92-
python hess.py \
93-
--chrom $i \
94-
--h2g zscore.chr"$i" \
95-
--reference-panel refpanel_genotype_chr"$i".gz \
96-
--legend-file refpanel_legend_chr"$i".gz \
97-
--partition-file partition_chr"$i".bed \
98-
--out step1
99-
done
100-
```
101-
102-
In the command above, `--chrom` specifies the chromosome number;
103-
`--zscore-file` specifies the summary statistics for SNPs in the
104-
corresponding chromosome; `--reference-panel` specifies the genotype file
105-
for the reference panel; `--legend-file` specifies the legend file for the
106-
reference panel; `--partition-file` specifies start and end positions
107-
of the loci; `--output-file-step1` specifies the prefix of the output for step 1.
108-
For input file format, please refer to
109-
[Input file format](#input_file_format).
110-
111-
After executing the command above, 4 files will be created for each
112-
chromosome (i.e. 88 files in total), taking up ~10MB of space for the entire
113-
genome. Here's an example obtained for chromosome 22.
114-
115-
* step1\_chr22.info.gz - contains the information of each locus (start and
116-
end positions, number of SNPs, rank of LD matrices, sample size)
117-
```
118-
16050408 17674294 371 274 91273
119-
17674295 18296087 419 306 89182
120-
18296088 19912357 947 502 90231
121-
... ... ... ... ...
122-
```
123-
* step1\_chr22.eig.gz - contains the positive eigenvalues of LD matrix at
124-
each locus, one line per locus
125-
```
126-
39.31792281 31.23990243 23.81549256 23.47296559 20.45343550 ...
127-
48.73186142 26.95692375 25.32769526 22.11750791 20.55766423 ...
128-
82.58157342 67.42588424 59.52766188 43.10471854 32.15181631 ...
129-
... ... ... ... ... ...
130-
```
131-
* step1\_chr22.prjsq.gz - contains the squared projections of effect
132-
size vector onto the eigenvectors of LD matrix at each locus, one
133-
line per locus
134-
```
135-
0.00008940 0.00001401 0.00013805 0.00009906 0.00007841 ...
136-
0.00054948 0.00001756 0.00008532 0.00002303 0.00004706 ...
137-
0.00008693 0.00005737 0.00070234 0.00008411 0.00004001 ...
138-
... ... ... ... ... ...
139-
```
140-
* step1\_chr22.log - contains logging information (e.g. number of SNPs,
141-
number of SNPs filtered, etc.)
142-
```
143-
Command started at: ...
144-
Command issued: hess.py ...
145-
Number of SNPs in reference panel: ...
146-
Number of SNPs in Z-score file: ...
147-
Number of SNPs in Z-score file after filtering: ...
148-
Number of loci in partition file: ...
149-
Command finished at: ...
150-
```
151-
152-
###### Step 2 - compute local SNP heritability
153-
**Step 2 should be run after step 1 finishes for all chromosomes.**
154-
In this step, HESS uses results from step 1 across all chromosomes
155-
(step1\_chr{1..22}.info.gz, step1\_chr{1..22}.eig.gz,
156-
step1\_chr{1..22}.prjsq.gz) to compute local SNP heritability estimates
157-
and their standard error. The following command automatically looks for
158-
results from step 1 across all chromosomes with the prefix "step1" to
159-
obtain local SNP-heritability estimates.
160-
161-
```{r, engine='sh', count_lines}
162-
python hess.py \
163-
--prefix step1 \
164-
--k 50 \
165-
--out step2.txt
166-
```
167-
168-
In the command above, `--prefix` specifies prefix of the files generated
169-
during step 1, "step1", in this case; `--k`, default at 50, specifies the
170-
maximum number of eigenvectors to use in estimating local SNP heritability;
171-
`--output-file-step2` specifies the name of the output file.
172-
173-
After executing the command above, 2 files will be created.
174-
175-
* step2.txt - contains local SNP heritability estimates for loci across all
176-
chromosomes (including chromosome number, locus start position, locus end
177-
position, number of SNPs in locus, number of eigenvectors used, local SNP
178-
heritability, variance)
179-
```
180-
chr start end num_snp k local_h2g var
181-
1 10583 1892606 158 24 0.0001786340 0.000000011374
182-
1 1892607 3582735 814 40 0.0004164805 0.000000039661
183-
1 3582736 4380810 558 40 0.0001844619 0.000000027595
184-
1 4380811 5913892 1879 40 0.0000738749 0.000000032164
185-
... ... ... ... ... ... ...
186-
22 46470495 47596317 899 50 0.0004263759 0.000000005798
187-
22 47596318 48903702 1580 50 0.0000899976 0.000000003539
188-
22 48903703 49824533 1344 50 0.0000695594 0.000000003439
189-
22 49824534 51243297 740 50 0.0001590363 0.000000004160
190-
```
191-
* step2.txt.log - contains logging information (e.g. estimated genomic
192-
control factor, total SNP heritability, etc.)
193-
```
194-
Command started at: ...
195-
Command issued: ...
196-
Number of loci from step 1: ...
197-
Total number of SNPs: ...
198-
Using lambda gc: ...
199-
Estimated total h2g: ...
200-
Command finished at: ...
201-
```
202-
203-
###### Additional flags for step 2
204-
205-
For step 2, HESS has 4 additional flags:
206-
* `--lambda_gc` allows users to specify their own genomic control factor to
207-
re-inflate the summary statistics, if not specified, HESS will estimates
208-
the genomic control factor from data
209-
* `--tot-h2g <h2g> <s.e.>` allows users to specify total SNP heritability
210-
of the trait
211-
* `--sense-threshold-joint` default at 2.0, allows users to control standard
212-
error of the estimates when total SNP heritability is not known, the smaller
213-
the threshold, the smaller the standard error (at the cost of downward bias)
214-
* `--sense-threshold-indep` default at 0.5, allows users to control standard
215-
error of the estimates when total SNP heritability is available, the smaller
216-
the threshold, the smaller the standard error (at the cost of downward bias)
217-
* `--eig-threshold` default at 1.0, allows users to filter eigenvectors
218-
based on magnitude of eigenvalues
219-
220-
---
221-
222-
#### Contact
223-
224-
Please contact Huwenbo Shi (shihuwenbo\_AT\_ucla.edu) for questions
225-
related to HESS.
226-
227-
---
228-
229-
#### Reference
230-
231-
Manuscript describing HESS can be found
232-
[here](http://www.cell.com/ajhg/abstract/S0002-9297(16)30148-3).
3+
HESS is a Python package that provides utilities for estimating and analyzing
4+
local SNP-heritability and genetic covariance from GWAS summary
5+
association data.
6+
[![](https://img.shields.io/badge/docs-latest-blue.svg)](https://huwenboshi.github.io/hess-0.5)

docs/analysis.md

Whitespace-only changes.

docs/contrast_polygenicity.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Contrast polygenicity
2+
3+
We provide script (`misc/contrast_polygenicity.py`) to make the plot for
4+
contrasting degrees of polygenicity between traits. The script can be
5+
executed as follows.
6+
7+
## Input
8+
9+
* Local SNP-heritability estimates for a number of traits.
10+
11+
We recommend to plot less than 10 traits at a time.
12+
13+
## Example
14+
15+
The following is an example script to create the contrast polygenicity plot.
16+
17+
```
18+
python $src/contrast_polygenicity.py \
19+
--local-hsqg-est <local SNP-heritability output trait 1> <local SNP-heritability output trait 1> \
20+
--show-se --no-negative --trait-names TRAIT1 TRAIT2 \
21+
--out <output file name e.g. trait1_trait2_contrast.pdf>
22+
```
23+
24+
Here `--no-negative` enforces any negative local estimates to 0.0
25+
Standard error shade can be turned off by removing the `--show-se` flag.
26+
27+
The following is an example output figure.
28+
29+
![contrast polygenicity](img/trait1_trait2_contrast.svg)

0 commit comments

Comments
 (0)