Commit 3fa37fe
Ty eqtl squash (#72)
* Squash ty_eqtl before rebase
* Moved main test script into inst/scripts.
ROxygen docs were not checked in.
* Moved main test script into inst/scripts.
ROxygen docs were not checked in.
* Moved python eqtl pipeline script to scripts dir.
* Modified pipeline to use already generated data if it's available.
* Modified pipeline to use already generated data if it's available.
* Fixed some unqualified calls to data.table functions
* Need a minimal package readme for install.
* Refactoring of the python code to make the pipeline script minimally accessible.
* Refactoring of the python code to make the pipeline script minimally accessible.
* updated .gitignore for python
* Trying to replicate the eQTL results plot that's in the slide deck
* Trying to replicate the eQTL results plot that's in the slide deck
* Fix step ordering, update K=9/seed=119, remove TODOs
- Fix step 8/9 order: combine_expression first, then median_expression
- Remove Jim's TODO comments
- Update cluster_order to c(2,1,7,4,8,5,6,3,0) for K=9
- Update Python pipelines: K=9, random_state=119, desired_order=[2,1,7,4,8,5,6,3,0]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort cluster assignments output for reproducibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Deduplicate egene pairs by lowest q-value instead of first occurrence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sort slope matrix by gene before k-means for reproducible clustering
K-means results depend on input row order. Sort by phenotype_id when
writing the index-SNP slope matrix (R) and when reading it back in
the Python k-means pipeline. Also improve handling of zero-variance
rows in gene expression ordering by adding a tiny deterministic trend
instead of dropping them.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update plotting outputs to SVG, fix cluster order, and align KMeans params
- Switch plot_gene_snp, plot_cell_type_pairwise_cor, and kmeans_heatmap
output from PNG to SVG for publication-quality figures
- Update cluster_order to [5,0,6,2,7,8,10,1,9,4,3] across all scripts
- Align KMeans params (random_state=42, n_init=200, max_iter=20)
- Add human-readable cell type labels to pairwise correlation heatmap
- Flip Fisher exact plot to horizontal orientation (clusters on y-axis)
- Add variant_id column to cluster assignments output
- Clean up stale PNG references in docstrings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove median imputation option, keep only zero imputation
Median imputation of missing eQTL slopes is not methodologically sound.
Remove the imputation_method/min_non_na parameters and .impute_slope_dt
helper, hardcoding zero imputation (NA -> 0). Update all references
across R/Python scripts, runner scripts, test pipelines, docs, and README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Pin KMeans algorithm to elkan for reproducibility across sklearn versions
The default KMeans algorithm changed from elkan (sklearn <1.3) to lloyd
(sklearn >=1.3), which can produce different cluster assignments.
Explicitly setting algorithm="elkan" ensures identical results
regardless of sklearn version.
Also relaxed version constraints in pyproject.toml (scikit-learn>=1.0,
numpy>=1.22, python>=3.9) to match the environment that produced the
validated clustering.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix cluster_order in test_eqtl_pipeline.R to K=11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sync test_eqtl_pipeline.py with K=11 params and infer plot format from extension
Update K (9->11), random_state (119->42), desired_order, and heatmap
output extension (.png->.svg) to match the K=11 clustering.
Remove hardcoded format="svg" from savefig so output format is inferred
from the file extension.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Sync cli/test_eqtl_pipeline.py with K=11 params
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Switch all plot outputs to SVG
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add distance-to-TSS scripts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add new end-to-end pipeline (0308) and update boxplot script
New pipeline runs R Part 1 → Python K-means → R Part 2 via system2(),
no manual intervention needed. Boxplot improvements: horizontal
orientation option, smaller beeswarm points, Neuron (6) label fix,
format inferred from output path extension.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Convert standalone scripts to package functions, remove hardcoded paths
- Move get_index_snp_start_distance and plot_eqtl_distance_to_tss_boxplot
from standalone scripts (inst/scripts/) to exported package functions (R/)
- Pipeline now uses bican.mccarroll.eqtl:: calls instead of system2(Rscript)
- Inline Python K-means call, remove separate test_eqtl_pipeline_0308.py
- Remove hardcoded script_dir and local machine paths
- Update NAMESPACE with new exports and imports
- Use /broad/ server paths instead of local mount paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Restore separate Python K-means script, fix base_dir to /broad/
Revert Step 10 to call the Python script via system2() instead of
inlining Python code. Fix base_dir in Python script from local mount
path to /broad/ server path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* All of the scripts that shouldn't be run are commented out.
The command line interface around the k-means plot has been expanded to allow for different K settings, cluster orderings, and input/output directory.
* add more validation for cluster reorder
* updated 0308 code to call k-means from the install
* Updated the defaults. R script should be parameterized.
* Polishing eQTL pipeline workflow + documentation.
* Deleted old no longer needed scripts.
* Updated k-means plot to allow sequential cluster labels
* Finished fixing up eQTL plots.
* Finished fixing up eQTL plots.
* Some more eQTL plot cleanup.
* Some more eQTL plot cleanup.
---------
Co-authored-by: tracyyuan123 <tyuan@college.harvard.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent e7cf37b commit 3fa37fe
File tree
46 files changed
+4129
-1
lines changed- R/bican.mccarroll.eqtl
- R
- man
- python/bican_mccarroll_eqtl
- scripts
- src/bican_mccarroll_eqtl
- cli
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
46 files changed
+4129
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
6 | 13 | | |
7 | 14 | | |
8 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
12 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
13 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
14 | 28 | | |
15 | 29 | | |
16 | 30 | | |
17 | 31 | | |
| 32 | + | |
| 33 | + | |
18 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
19 | 42 | | |
20 | 43 | | |
21 | 44 | | |
22 | 45 | | |
23 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
24 | 50 | | |
| 51 | + | |
25 | 52 | | |
| 53 | + | |
| 54 | + | |
26 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
27 | 59 | | |
| 60 | + | |
| 61 | + | |
28 | 62 | | |
29 | 63 | | |
| 64 | + | |
| 65 | + | |
30 | 66 | | |
| 67 | + | |
| 68 | + | |
31 | 69 | | |
| 70 | + | |
| 71 | + | |
32 | 72 | | |
33 | 73 | | |
34 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
35 | 78 | | |
| 79 | + | |
36 | 80 | | |
37 | 81 | | |
| 82 | + | |
38 | 83 | | |
| 84 | + | |
| 85 | + | |
39 | 86 | | |
40 | 87 | | |
| 88 | + | |
| 89 | + | |
41 | 90 | | |
42 | 91 | | |
43 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
44 | 97 | | |
| 98 | + | |
45 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
46 | 104 | | |
| 105 | + | |
47 | 106 | | |
Lines changed: 106 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
Lines changed: 69 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
0 commit comments