Skip to content

Commit 0bbe6c9

Browse files
authored
reformat run_batch_general to CLI, add CPG structure (#179)
1 parent e2d444a commit 0bbe6c9

File tree

2 files changed

+712
-199
lines changed

2 files changed

+712
-199
lines changed

documentation/DCP-documentation/step_2_submit_jobs.md

+71-4
Original file line numberDiff line numberDiff line change
@@ -38,18 +38,85 @@ Job files that don't include it will use the default structure.
3838
For large numbers of groups, it may be helpful to create this list separately as a .txt file you can then append into the job's JSON file.
3939
You may create this yourself in your favorite scripting language.
4040
Alternatively, you can use the following additional tools to help you create and format this list:
41-
* `batches.sh` allows you to provide a list of all the individual metadata components (plates, columns, rows, etc).
41+
* `batches.sh` allows you to provide a list of all the individual metadata components (plates, columns, rows, etc).
4242
It then uses [GNU parallel](https://www.gnu.org/software/parallel/parallel_tutorial.html) to create a formatted text file with all the possible combinations of the components you provided.
4343
This approach is best when you have a large number of groups and the group structure is uniform.
4444

4545
Example: for a 96-well plate experiment where one there are 3 plates and the experiment is grouped by Plate and Well, `batches.sh` would read:
46-
`parallel echo '{\"Metadata\": \"Metadata_Plate={1},Metadata_Well={2}{3}\"},' ::: Plate1 Plate2 Plate3 ::: A B C D E F G H ::: 01 02 03 04 05 06 07 08 09 10 11 12 | sort > batches.txt`
47-
* You may also use the list of groupings created by calling `cellprofiler --print-groups` from the command line (see [here](https://github.com/CellProfiler/CellProfiler/wiki/Adapting-CellProfiler-to-a-LIMS-environment#cmd) and [here](https://github.com/CellProfiler/Distributed-CellProfiler/issues/52) for more information).
48-
Note that for job files that specify groupings in this way, the `output_structure` variable is NOT optional - it must be specified or an error will be returned.
46+
`parallel echo '{\"Metadata\": \"Metadata_Plate={1},Metadata_Well={2}{3}\"},' ::: Plate1 Plate2 Plate3 ::: A B C D E F G H ::: 01 02 03 04 05 06 07 08 09 10 11 12 | sort > batches.txt`
47+
* You may also use the list of groupings created by calling `cellprofiler --print-groups` from the command line (see [here](https://github.com/CellProfiler/CellProfiler/wiki/Adapting-CellProfiler-to-a-LIMS-environment#cmd) and [here](https://github.com/CellProfiler/Distributed-CellProfiler/issues/52) for more information).
48+
Note that for job files that specify groupings in this way, the `output_structure` variable is NOT optional - it must be specified or an error will be returned.
4949

5050
## Alternate job submission: run_batch_general.py
5151

5252
We also support an alternate second path besides `submitJobs` to create the list of jobs - the `run_batch_general.py` file.
5353
This file essentially serves as a "shortcut" to run many common types of stereotyped experiments we run in our lab.
5454
Essentially, if your data follows a regular structure (such as N rows, N columns, N grouping, a particular structure for output, etc.), you may find it useful to take and modify this file for your own usage.
5555
We recommend new users use the `submitJobs` pathway, as it will help users understand the kinds of information Distributed-CellProfiler needs in order to run properly, but once they are comfortable with it they may find `run_batch_general.py` helps them create jobs faster in the future.
56+
57+
As of Distributed-CellProfiler 2.2.0, `run_batch_general.py` has been reformatted as a CLI tool with greatly enhanced customizeability.
58+
`run_batch_general.py` must be passed 5 pieces of information:
59+
60+
### Required inputs
61+
62+
* `step` is the step that you would like to make jobs for.
63+
Supported steps are `zproj`, `illum`, `qc`, `qc_persite`, `assaydev`, and`analysis`
64+
* `identifier` is the project identifier (e.g. "cpg0000-jump-pilot" or "2024_11_07_Collaborator_Cell_Painting")
65+
* `batch` is the name of the data batch (e.g. "2020_11_04_CPJUMP1")
66+
* `platelist` is the list of plates to process.
67+
Format the list in quotes with individual plates separated by commas and no spaces (e.g. "Plate1,Plate2,Plate3")
68+
69+
A minimal `run_batch_general.py` command may look like:
70+
"""bash
71+
run_batch_general.py analysis 2024_05_16_Segmentation_Project 2024_10_10_Batch1 "Plate1,Plate2,Plate3"
72+
"""
73+
74+
### Required input for Cell Painting Gallery
75+
76+
Runs being made off of the Cell Painting Gallery require two additional flags:
77+
78+
* `--source <value>` to specify the identifier-specific source of the data.
79+
* `--path-style cpg` is to set the input and output paths as data is structured in the Cell Painting Gallery.
80+
All paths can be overwritten with flags (see below).
81+
82+
A minimal `run_batch_general.py` command for a dataset on the Cell Painting Gallery may look like:
83+
"""bash
84+
run_batch_general.py analysis cpg0000-jump-pilot 2020_11_04_CPJUMP1 "BR00116991,BR00116992" --path-style cpg --source broad
85+
"""
86+
87+
### Plate layout flags
88+
89+
* `--plate-format <value>`: if used, can be `96` or `384` and will overwrite `rows` and `columns` to produce standard 96- or 384-well plate well names (e.g. A01, A02, etc.)
90+
* `--rows <value>`: a custom list of row labels.
91+
Will be combined with `columns` to generate well names.
92+
Separate values with commas and no spaces and surround with quotation marks (e.g. `"A,B,C,D,E,F,G"`)
93+
* `--columns <value>`: a custom list of column labels.
94+
Will be combined with `rows` to generate well names.
95+
Separate values with commas and no spaces and surround with quotation marks (e.g. `"1,2,3,4,5,6,7,8,9,10"`)
96+
* `--wells <value>`: a custom list of wells.
97+
Overwrites `rows` and `columns`.
98+
Separate values with commas and no spaces and surround with quotation marks (e.g. `"C02,D04,E04,N12"`)
99+
* `--no-well-digit-pad`: Formats wells without well digit padding.
100+
Formats wells passed with `--plate format` or `--rows` and `--columns` but not `--wells`.
101+
(e.g. `A1` NOT `A01`)
102+
* `--sites <value>`: a custom list of sites (fields of view) to be analyzed.
103+
Separate values with commas and no spaces and surround with quotation marks (e.g. `"1,2,3,4,5,6"`)
104+
105+
### Overwrite structural defaults
106+
107+
* `--output-structure <value>`: overwrite default output structure
108+
* `--output-path <value>`: overwrite default output path
109+
* `--input-path <value>`: overwrite the default path to input files
110+
111+
### Overwrite defaults (for runs using load data .csv's and .cppipe)
112+
113+
* `--pipeline <value>`: overwrite the default pipeline name
114+
* `--pipeline-path <value>`: overwrite the default path to pipelines
115+
* `--datafile-name <value>`: overwrite the default load data .csv name
116+
* `--datafile-path <value>`: overwrite the default path to load data files
117+
118+
### Overwrite defaults (for runs using .h5 batch files)
119+
120+
* `--use-batch`: use h5 batch files instead of load data csv and .cppipe files
121+
* `--batchfile-name <value>`: overwrite default batchfile name
122+
* `--batchfile-path <value>`: overwrite default path to the batchfile

0 commit comments

Comments
 (0)