Skip to content
This repository was archived by the owner on Sep 13, 2024. It is now read-only.

Commit 966b025

Browse files
authored
Merge pull request #16 from shntnu/docs
Document how to use local files
2 parents ea02f65 + 74e9946 commit 966b025

12 files changed

+118
-7
lines changed

matric/README.md

+30-6
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ The docker image has all the dependencies installed and the `evalzoo` repo clone
2424
`~/Desktop/input` is the folder where your input data is stored.
2525
In the container, this is mapped to `/input`.
2626
You can change this to any folder on your computer.
27-
The example below does not need any input data.
2827

2928
Open <http://localhost:8787/> in your browser and log in using the crendentials `rstudio` / `rstudio`.
3029

@@ -43,22 +42,47 @@ run_param("params/params_cellhealth.yaml")
4342
Knitted notebooks and outputs, including metrics, are written to a configuration-specific subfolder of `results/`.
4443
See `5.inspect-metrics` for how to access them.
4544

46-
### Optionally change output location
45+
The example parameter file `params/params_cellhealth.yaml` reads the input directly from a public GitHub repo.
4746

48-
Use `results_root_dir` to specify the folder where you want the results to be stored.
47+
Instead, your input might live on your local machine (and you might also want to store the results in some other folder).
4948

50-
In the example below, we set it to `/input` which is the folder that we mapped to `~/Desktop/input` on the host machine when we started the docker container.
49+
In that case, the mapping (`~/Desktop/input:/input`) that you've set up in the docker command above will be useful.
50+
51+
To mock this up, we'll use the same input file as in the example parameter file above.
52+
53+
First download the file locally to `~/Desktop/input`:
54+
55+
```bash
56+
mkdir -p ~/Desktop/input
57+
cd ~/Desktop/input
58+
url=https://github.com/broadinstitute/grit-benchmark/raw/main/1.calculate-metrics/cell-health/data/cell_health_merged_feature_select.csv.gz
59+
curl -L -o cell_health_merged_feature_select.csv.gz $url
60+
```
61+
62+
Next, edit the parameter file `params/params_cellhealth.yaml` to point to the local file:
63+
64+
```yaml
65+
data_path: "/input"
66+
```
67+
68+
and save it as `params/params_cellhealth_local.yaml` (the parameter file can live anywhere; it doesn't have to be in `params`)
69+
70+
Then, run the following command in the R console:
5171

5272
```r
53-
run_param("params/params_cellhealth.yaml", results_root_dir = "/input")
73+
setwd("matric")
74+
source("run_param.R")
75+
run_param("params/params_cellhealth_local.yaml", results_root_dir = "/input")
5476
```
5577

78+
Here, we have additionally used `results_root_dir` to specify the folder where we want the results to be stored.
79+
5680
TODO: Document the configuration file
5781

5882
## Addendum
5983

6084
<details>
61-
85+
6286
### Notebooks
6387

6488
- `1.prepare_data.Rmd` prepares the datasets.

matric/params/params_cellhealth.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ experiment:
4949
- Metadata_cell_line
5050
- Metadata_gene_name
5151
- Metadata_reference_or_other
52+
all_different_cols_rep:
53+
any_different_cols_rep:
5254
all_same_cols_rep_ref: NULL
5355
any_different_cols_non_rep:
5456
- Metadata_gene_name

matric/params/params_cellhealth_group.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ experiment:
4949
all_same_cols_rep:
5050
- Metadata_gene_name
5151
- Metadata_reference_or_other
52+
all_different_cols_rep:
53+
any_different_cols_rep:
5254
all_same_cols_rep_ref: NULL
5355
any_different_cols_non_rep: NULL
5456
all_same_cols_non_rep: NULL
+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
knit_output_format: github_document
2+
experiment:
3+
data_path: /input
4+
input_structure: "{data_path}/cell_health_merged_feature_select.{extension}"
5+
extension: csv.gz
6+
external_metadata:
7+
add_dummy_metadata_column: FALSE
8+
split_by_column:
9+
significance_threshold: 0.05
10+
parallel_workers: 8
11+
aggregate_by:
12+
- Metadata_cell_line
13+
- Metadata_gene_name
14+
- Metadata_pert_name
15+
filter_by:
16+
reference_set:
17+
Metadata_gene_name:
18+
- Chr2
19+
- Luc
20+
- LacZ
21+
random_seed: 42
22+
background_type: non_rep
23+
shuffle: FALSE
24+
shuffle_bad_groups_threshold: 0.1
25+
shuffle_group: Metadata_gene_name
26+
shuffle_strata: NULL
27+
shuffle_exclude:
28+
Metadata_gene_name:
29+
- Chr2
30+
- Luc
31+
- LacZ
32+
- EMPTY
33+
subsample_fraction: 1
34+
subsample_pert_strata:
35+
- Metadata_gene_name
36+
subsample_reference_strata:
37+
- Metadata_Well
38+
similarity_method: cosine
39+
sim_params:
40+
drop_group:
41+
Metadata_gene_name:
42+
- EMPTY
43+
reference:
44+
Metadata_reference_or_other:
45+
reference
46+
all_same_cols_ref:
47+
- Metadata_cell_line
48+
all_same_cols_rep:
49+
- Metadata_cell_line
50+
- Metadata_gene_name
51+
- Metadata_reference_or_other
52+
# all_different_cols_rep:
53+
# any_different_cols_rep:
54+
all_same_cols_rep_ref: NULL
55+
any_different_cols_non_rep:
56+
- Metadata_gene_name
57+
all_same_cols_non_rep:
58+
- Metadata_cell_line
59+
all_different_cols_non_rep:
60+
- Metadata_gene_name
61+
all_same_cols_group: NULL
62+
any_different_cols_group: NULL
63+
annotation_cols:
64+
- Metadata_cell_line
65+
- Metadata_pert_name
66+
- Metadata_gene_name
67+
- Metadata_reference_or_other

matric/params/params_cellhealth_shuffle.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ experiment:
4949
- Metadata_cell_line
5050
- Metadata_gene_name
5151
- Metadata_reference_or_other
52+
all_different_cols_rep:
53+
any_different_cols_rep:
5254
all_same_cols_rep_ref: NULL
5355
any_different_cols_non_rep:
5456
- Metadata_gene_name

matric/params/params_cpjump1_prod_biological_no_split.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ experiment:
5050
all_same_cols_rep:
5151
- Metadata_target_list
5252
- Metadata_reference_or_other
53+
all_different_cols_rep:
54+
any_different_cols_rep:
5355
all_same_cols_rep_ref: NULL
5456
any_different_cols_non_rep:
5557
- Metadata_target_list

matric/params/params_cpjump1_prod_biological_single_target.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ experiment:
4545
all_same_cols_rep:
4646
- Metadata_gene
4747
- Metadata_reference_or_other
48+
all_different_cols_rep:
49+
any_different_cols_rep:
4850
all_same_cols_rep_ref: NULL
4951
any_different_cols_non_rep:
5052
- Metadata_gene

matric/params/params_cpjump1_prod_biological_split.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ experiment:
1919
split_by_column:
2020
split_column: Metadata_target_list
2121
element_column: Metadata_broad_sample
22-
compact_splits: FALSE
22+
compact_splits: FALSE
2323
significance_threshold: 0.05
2424
parallel_workers: 8
2525
aggregate_by:
@@ -53,6 +53,8 @@ experiment:
5353
all_same_cols_rep:
5454
- Metadata_target_list_split
5555
- Metadata_reference_or_other
56+
all_different_cols_rep:
57+
any_different_cols_rep:
5658
all_same_cols_rep_ref: NULL
5759
any_different_cols_non_rep:
5860
- Metadata_broad_sample

matric/params/params_cpjump1_prod_biological_split_compact.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ experiment:
5353
all_same_cols_rep:
5454
- Metadata_target_list_split_compact
5555
- Metadata_reference_or_other
56+
all_different_cols_rep:
57+
any_different_cols_rep:
5658
all_same_cols_rep_ref: NULL
5759
any_different_cols_non_rep:
5860
- Metadata_target_list_split_compact

matric/params/params_cpjump1_prod_biological_split_compact_no_filter.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ experiment:
4949
all_same_cols_rep:
5050
- Metadata_target_list_split_compact
5151
- Metadata_reference_or_other
52+
all_different_cols_rep:
53+
any_different_cols_rep:
5254
all_same_cols_rep_ref: NULL
5355
any_different_cols_non_rep:
5456
- Metadata_target_list_split_compact

matric/params/params_cpjump1_prod_technical.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ experiment:
4646
- Metadata_broad_sample
4747
- Metadata_control_type
4848
- Metadata_reference_or_other
49+
all_different_cols_rep:
50+
any_different_cols_rep:
4951
all_same_cols_rep_ref: NULL
5052
any_different_cols_non_rep: NULL
5153
all_same_cols_non_rep: NULL

matric/params/params_luad.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ experiment:
3333
- Metadata_cell_line
3434
- Metadata_pert_name
3535
- Metadata_gene_name
36+
all_different_cols_rep:
37+
any_different_cols_rep:
3638
all_same_cols_rep_ref:
3739
any_different_cols_non_rep:
3840
- Metadata_cell_line

0 commit comments

Comments
 (0)