This pipeline uses python image processing to segment cell nuclei in 2D on the basis of DAPI intensity and spots within those nuclei.
It has been optimized for parallel processing using swarm/slurm jobs, such that each step of the process should take about 5 minutes.
For nuclear segmentation, we use CellPose (a trained neural network-based algorithm): https://www.cellpose.org/
For spot segmentation, we use the Laplacian of the Gaussian and define the edges of the spot as those pixels above 1/2 max intensity over nuclear background.
We use pipenv (https://pipenv.pypa.io/en/latest/) for version control and dependencies and jupyter (https://jupyter.org/) to run the pipeline. As such, you should first install pipenv and jupyter before you install this pipeline. To install pipenv, we recommend:
pip install pipenv
To install and set up the kernel for jupyter, use the following steps:
- Get all the files locally:
git clone https://github.com/elfinn/cell-and-spot-segmentation
- From the cell-and-spot-segmentation folder, use pipenv to install all dependencies:
cd cell-and-spot-segmentation pipenv install
- Create the jupyter kernel associated with the pipenv environment:
pipenv run python -m ipykernel install --user --name=cell-and-spot-segmentation
- Launch the jupyter notebook to run the pipeline
jupyter notebook
This pipeline processes .tif images with filenames and folder structures automatically generated by either Yokogawa CellVoyager (CV7000 or similar) microscopes or Zeiss LSM microscopes (with output as single files per field per channel).
The python code is meant to be run from within the jupyter notebooks, and some specification of options must be done manually. Follow these steps to process your images:
- Run the "run_nuclear_segmentations_and_crop.ipynb" notebook to segment nuclei on the basis of the DAPI channel:
- Specify file type in chunk 1 ("CV" for CellVoyager" is default, "LSM" for LSM)
- Specify run strategy in chunk 3 ("LOCAL" for single-stream testing of small datasets, "SWARM" for parallelized processing of batches)
- Specify file location, DAPI channel number, and approximate size of nuclei in chunk 4.
- Run each chunk sequentially. If individual chunks take much longer than 10 minutes, consider using parallel processing next time.
- (Optional) Check effective nuclear segmentations with the "test_nuclear_segmentation.ipynb" notebook.
- (REQUIRED) Determine best parameters for segmenting spots with the "test_spot_segmentation.ipynb" notebook. This is done iteratively on each channel, and needs to be performed on each channel -- and a global config file with optimized parameters outputted -- in order for the spot segmentation to run.
- Specify file type in chunk 1 ("CV" for CellVoyager" is default, "LSM" for LSM)
- Specify image locations, channel of interest, and a representative well/field in chunk 3
- Specify working parameters in chunk 4
- Display images in chunk 5, if spots do not appear well-segmented, go back to chunk 4 and alter parameters
- Once one representative field is well segmented, repeat the process to verify parameters for multiple fields and wells (go back to chunk 3 and choose different wells or fields)
- Once one channel is well-segmented in many wells and fields, run chunk 6 to output parameters for that channel into the config file. Note that the config file is a global file and will be added to for each channel -- do not change the name of the config file or make a separate config file for each channel, even though you are testing each channel sequentially.
- Go back to chunk 3 and repeat the entire process for your next channel. Repeat until all non-DAPI channels have been optimized.
- Run the "run_spot_segmentations_and_compile.ipynb" notebook.
- Specify file type in chunk 1 ("CV" for CellVoyager" is default, "LSM" for LSM)
- Specify run strategy in chunk 3 ("LOCAL" for single-stream testing of small datasets, "SWARM" for parallelized processing of batches)
- Specify file location for cropped images and the spot segmentation config file in chunk 4.
- Run each chunk sequentially. If individual chunks take much longer than 10 minutes, consider using parallel processing next time.
It outputs the following parameters for each nucleus in each channel, to a file with the suffix "cell_intensities.csv":
- Unique nuclear ID
- Summed intensity
- Total area in pixels
It outputs the following parameters for each spot within the nucleus in each non-DAPI channel, to a file with the suffix "spot_positions.csv"
- Unique nuclear ID
- Unique spot ID
- Position (center of gravity) in X, Y, and Z
- Radial position normalized to 0 = central and 1 = peripheral
- Total area
- Eccentricity of fitted ellipse
- Solidity of spot (% of bounding box taken up by spot)
For questions or more information, please contact Elizabeth Finn (elysabeth.finn@gmail.com)