Skip to content

Commit 61bbc41

Browse files
committed
docs: text review
1 parent 6b57466 commit 61bbc41

4 files changed

Lines changed: 35 additions & 30 deletions

File tree

docs/source/Data_processing/Overview.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Overview
44
``snazzy_processing`` is a Python package to automate the extraction of primary data: Ventral Nerve Cord (VNC) length and signal intensity from fluorescence imaging of Drosophila embryos.
55
The package is an image processing pipeline, that can be divided intro three main stages:
66

7-
* Crop movies for individual specimens from raw data
7+
* Crop raw data containing many samples into smaller movies (one per sample)
88
* Calculate signal intensity inside ROIs
99
* Measure the VNC length
1010

@@ -16,16 +16,16 @@ This means that you can feed either ``.tif`` or ``.nd2`` files into the pipeline
1616
If your raw data is in another format, you must first convert it to ``.tif``.
1717
ImageJ for example provides several plugins to convert files to tif, including the excellent `BioFormats extension <https://imagej.net/formats/bio-formats>`__.
1818

19-
Before running the pipeline, which is the last cell of the jupyter notebook, we must determine from where to crop each movie in the raw data.
19+
Before running the pipeline, which is the last cell of the jupyter notebook, we must determine where to crop the raw data to separate samples.
2020

21-
The bounding boxes for each individual sample are determined via thresholding.
22-
Because the image histograms might vary, some manual adjustment might be necessary.
21+
Thresholding is used to distinguish each sample from its background and create bounding boxes.
22+
Because thresholding depends on image histograms, and these histograms might vary between movies, some manual adjustment may be necessary.
2323
Inspect where the bounding boxes will be created in the jupyter notebook.
2424
They should cover the entire sample.
25-
If there is a bounding box that covers more than one specimen, because maybe they were touching each other, those specimens must be ignored or further manually processed.
26-
We can't use it directly because the resulting ROI will not match a single sample and the length and activity data will be wrong.
25+
If there is a bounding box that covers more than one specimen (e.g, samples are touching each other), those specimens must be ignored or further manually processed.
26+
Boxes that contain several samples cannot be used directly because the resulting ROI will not match a single sample and the length & activity data will be wrong.
2727

28-
After the bounding boxes are determined, you can run the pipeline.
28+
After the bounding boxes are determined, you can select which embryos to include in the analysis and run the pipeline.
2929
By default, the methods in the pipeline will not overwrite any data.
3030
If data for a given sample is found in the output directory, it will simply skip that sample.
3131
If you want to recalculate any data, first remove or rename the current files.

docs/source/Data_processing/Process_raw_data.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
Process raw images
22
==================
33

4-
Since the imaging is done with a large Field of View microscope, usually during 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
5-
The simplest way to handle the raw data is to crop it in individual movies.
6-
There is a considerable amount of background pixels that can be ignored in the raw data. After cropping the embryos, all individual movies combined take about 40% of the original memory space.
7-
This already saves considerable ROM memory but most importantly, it means we can easily load individual movies in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
4+
Since the imaging is performed with a large field of view microscope and typically time-lapsed for 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
5+
The simplest way to handle the raw data is to first crop it into individual movies for each sample, as there is a considerable amount of background pixels that can be ignored in the raw data.
6+
After cropping the embryos, all individual movies combined take about 40% of the original memory space.
7+
This already saves considerable ROM memory but, most importantly, it means individual movies can be easily loaded in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
88

9-
The algorithm to process the raw image can be resumed as:
9+
The algorithm to process the raw image can be summarized as:
1010

1111
1. Get the maximum projection of each pixel for the first 10 frames
1212
2. Automatic threshold (Triangle method)
@@ -19,11 +19,11 @@ The algorithm to process the raw image can be resumed as:
1919
To calculate the bounding boxes of each embryo, we first take the maximum projection of each pixel for the first 10 frames, and then use the Triangle threshold method to binarize the image.
2020
The Triangle threshold is a good choice here because the image has a lot of background pixels, resulting in an unimodal histogram that is centered around the background pixels average value.
2121

22-
Once we have the binary image, we traverse it to identify each embryo.
23-
Whenever a foreground pixel is found, we mark all connecting foreground pixels, and also keep track of the amount of pixels marked and the extreme points (minimum and maximum coordinates in both dimensions).
24-
The pixel count is used to determine if the marked area really corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
22+
Once the binary image is created, it is traversed to identify each embryo.
23+
Whenever a foreground pixel is found, we mark all connecting foreground pixels and keep track of: (1) the amount of pixels marked and (2) the extreme points (minimum and maximum coordinates in both dimensions).
24+
The pixel count is used to determine if the marked area accurately corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
2525
The minimum pixel count might change depending on the type of sample being processed, and can be adjusted in ``slice_img.get_bbox_boundaries``.
26-
Regions with high signal intensity, for example corresponding to fly embryo's eyes or gut are examples of smaller artifacts that sometimes are included in the binary image, but can easily be removed due to its size.
26+
Regions of high signal intensity that represent small artifacts (e.g., fly embryo eyes or gut) can be included in the binary image but easily distingushed from the embryo due to their size.
2727
The extreme points are then used to generate the bounding boxes, which will determine the positions where the image will be cropped.
2828

2929
The raw image is opened as a memory map using ``numpy``, and the individual embryos are cropped and saved as tif files.

docs/source/Data_processing/ROI_length.rst

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1+
Neurodevelopmental Time
2+
----------------
3+
Together the ROI length and the full embryo size are used as a proxy to measure the embryonic neurodevelopmental progression:
4+
5+
developmental_progression = embryo_length / ROI_length
6+
17
ROI length
28
==========
39

4-
The ROI length is used as a proxy to measure the embryonic neurodevelopmental progression.
5-
It is calculated by center line estimation.
6-
The idea is to measure the line that will pass through the center of the ROI.
7-
This will correspond to the ventral nerve cord length.
10+
The ROI length is calculated by center line estimation.
11+
The general approach is to measure the line that will pass through the center of the ROI; this will correspond to the ventral nerve cord length.
812

9-
To determine this line, we go over the following steps:
13+
To determine the ROI length, the following steps are used:
1014

1115
1. Binarize the image
1216
2. Apply a 'chessboard' distance transform
@@ -21,7 +25,7 @@ Embryo Full size
2125

2226
The full specimen size is calculated by approximating the entire sample shape as an ellipse, and measuring this ellipse's diameter.
2327

24-
The steps to calculate the sample size are:
28+
The steps to calculate the embryo's size are:
2529
1. Equalize the image histogram
2630
2. Automatic threshold (Triangle method)
2731
3. Binarize the image

docs/source/Data_processing/ROIs_and_signal_intensity.rst

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,29 @@
11
ROIs and signal intensity
22
=========================
33

4-
When running the pipeline from the jupyter notebook, the ROIs are calculated for each frame.
5-
The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, based on the fact that the sample signal won't change considerably within this interval.
6-
This is a good approximation and useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
4+
When running the pipeline from the jupyter notebook, a single ROI is calculated for each frame.
5+
The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, instead of a single ROI per frame.
6+
This is possible if the sample signal doesn't change considerably within this interval.
7+
Therefore, calculating one ROI per group of frames is a good approximation, and it can be useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
78

8-
The ROI algorithm can be resumed as:
9+
The ROI algorithm can be summarized as:
910

1011
1. Average the group of frames into a single 2D matrix
1112
2. Automatic threshold (Otsu's method)
1213
3. Binarize the image
13-
4. Remove small holes inside the VNC
14+
4. Seal small holes inside the VNC
1415
5. Select the largest group of connected foreground pixels
1516
6. Return a mask that matches the largest label
1617

1718
For some datasets with lower VNC signal, a higher threshold value tends to provide better results.
1819
To change the threshold method, change the ``threshold_method`` parameter in the ``pipeline.measure_vnc_length`` function to ``otsu``.
1920

2021
Even though the signal from pixels inside the VNC is stable, the selected threshold value is not always perfect.
21-
After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold.
22+
After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold (i.e., holes).
2223
These regions are merged back into the VNC if they are completely contained inside the VNC binary component.
2324

24-
To calculate the signal intensity, we apply the mask to the embryo and calculate the mean pixel value.
25-
The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using the code from ``snazzy_analysis``.
25+
To calculate the signal intensity, a mask is applied to the embryo and the mean pixel value is calculated.
26+
The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using ``snazzy_analysis``.
2627

2728
Visualizing calculated ROIs
2829
~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)