docs: text review

alanaevora · alanaevora · commit 61bbc415b0cc · 2025-10-07T16:07:16.000-04:00
diff --git a/docs/source/Data_processing/Overview.rst b/docs/source/Data_processing/Overview.rst
@@ -4,7 +4,7 @@ Overview
 ``snazzy_processing`` is a Python package to automate the extraction of primary data: Ventral Nerve Cord (VNC) length and signal intensity from fluorescence imaging of Drosophila embryos.
 The package is an image processing pipeline, that can be divided intro three main stages:
 
-* Crop movies for individual specimens from raw data
+* Crop raw data containing many samples into smaller movies (one per sample)
 * Calculate signal intensity inside ROIs
 * Measure the VNC length
 
@@ -16,16 +16,16 @@ This means that you can feed either ``.tif`` or ``.nd2`` files into the pipeline
 If your raw data is in another format, you must first convert it to ``.tif``.
 ImageJ for example provides several plugins to convert files to tif, including the excellent `BioFormats extension <https://imagej.net/formats/bio-formats>`__.
 
-Before running the pipeline, which is the last cell of the jupyter notebook, we must determine from where to crop each movie in the raw data.
+Before running the pipeline, which is the last cell of the jupyter notebook, we must determine where to crop the raw data to separate samples.
 
-The bounding boxes for each individual sample are determined via thresholding.
-Because the image histograms might vary, some manual adjustment might be necessary.
+Thresholding is used to distinguish each sample from its background and create bounding boxes.
+Because thresholding depends on image histograms, and these histograms might vary between movies, some manual adjustment may be necessary.
 Inspect where the bounding boxes will be created in the jupyter notebook.
 They should cover the entire sample.
-If there is a bounding box that covers more than one specimen, because maybe they were touching each other, those specimens must be ignored or further manually processed.
-We can't use it directly because the resulting ROI will not match a single sample and the length and activity data will be wrong.
+If there is a bounding box that covers more than one specimen (e.g, samples are touching each other), those specimens must be ignored or further manually processed.
+Boxes that contain several samples cannot be used directly because the resulting ROI will not match a single sample and the length & activity data will be wrong.
 
-After the bounding boxes are determined, you can run the pipeline.
+After the bounding boxes are determined, you can select which embryos to include in the analysis and run the pipeline.
 By default, the methods in the pipeline will not overwrite any data.
 If data for a given sample is found in the output directory, it will simply skip that sample.
 If you want to recalculate any data, first remove or rename the current files.
diff --git a/docs/source/Data_processing/Process_raw_data.rst b/docs/source/Data_processing/Process_raw_data.rst
@@ -1,12 +1,12 @@
 Process raw images
 ==================
 
-Since the imaging is done with a large Field of View microscope, usually during 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
-The simplest way to handle the raw data is to crop it in individual movies.
-There is a considerable amount of background pixels that can be ignored in the raw data. After cropping the embryos, all individual movies combined take about 40% of the original memory space.
-This already saves considerable ROM memory but most importantly, it means we can easily load individual movies in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
+Since the imaging is performed with a large field of view microscope and typically time-lapsed for 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
+The simplest way to handle the raw data is to first crop it into individual movies for each sample, as there is a considerable amount of background pixels that can be ignored in the raw data.
+After cropping the embryos, all individual movies combined take about 40% of the original memory space.
+This already saves considerable ROM memory but, most importantly, it means individual movies can be easily loaded in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
 
-The algorithm to process the raw image can be resumed as:
+The algorithm to process the raw image can be summarized as:
 
 1. Get the maximum projection of each pixel for the first 10 frames
 2. Automatic threshold (Triangle method)
@@ -19,11 +19,11 @@ The algorithm to process the raw image can be resumed as:
 To calculate the bounding boxes of each embryo, we first take the maximum projection of each pixel for the first 10 frames, and then use the Triangle threshold method to binarize the image.
 The Triangle threshold is a good choice here because the image has a lot of background pixels, resulting in an unimodal histogram that is centered around the background pixels average value.
 
-Once we have the binary image, we traverse it to identify each embryo.
-Whenever a foreground pixel is found, we mark all connecting foreground pixels, and also keep track of the amount of pixels marked and the extreme points (minimum and maximum coordinates in both dimensions).
-The pixel count is used to determine if the marked area really corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
+Once the binary image is created, it is traversed to identify each embryo.
+Whenever a foreground pixel is found, we mark all connecting foreground pixels and keep track of: (1) the amount of pixels marked and (2) the extreme points (minimum and maximum coordinates in both dimensions).
+The pixel count is used to determine if the marked area accurately corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
 The minimum pixel count might change depending on the type of sample being processed, and can be adjusted in ``slice_img.get_bbox_boundaries``.
-Regions with high signal intensity, for example corresponding to fly embryo's eyes or gut are examples of smaller artifacts that sometimes are included in the binary image, but can easily be removed due to its size.
+Regions of high signal intensity that represent small artifacts (e.g., fly embryo eyes or gut) can be included in the binary image but easily distingushed from the embryo due to their size.
 The extreme points are then used to generate the bounding boxes, which will determine the positions where the image will be cropped.
 
 The raw image is opened as a memory map using ``numpy``, and the individual embryos are cropped and saved as tif files.
diff --git a/docs/source/Data_processing/ROI_length.rst b/docs/source/Data_processing/ROI_length.rst
@@ -1,12 +1,16 @@
+Neurodevelopmental Time
+----------------
+Together the ROI length and the full embryo size are used as a proxy to measure the embryonic neurodevelopmental progression:
+
+developmental_progression = embryo_length / ROI_length
+
 ROI length
 ==========
 
-The ROI length is used as a proxy to measure the embryonic neurodevelopmental progression.
-It is calculated by center line estimation.
-The idea is to measure the line that will pass through the center of the ROI.
-This will correspond to the ventral nerve cord length.
+The ROI length is calculated by center line estimation.
+The general approach is to measure the line that will pass through the center of the ROI; this will correspond to the ventral nerve cord length.
 
-To determine this line, we go over the following steps:
+To determine the ROI length, the following steps are used:
 
 1. Binarize the image
 2. Apply a 'chessboard' distance transform 
@@ -21,7 +25,7 @@ Embryo Full size
 
 The full specimen size is calculated by approximating the entire sample shape as an ellipse, and measuring this ellipse's diameter.
 
-The steps to calculate the sample size are:
+The steps to calculate the embryo's size are:
 1. Equalize the image histogram
 2. Automatic threshold (Triangle method)
 3. Binarize the image
diff --git a/docs/source/Data_processing/ROIs_and_signal_intensity.rst b/docs/source/Data_processing/ROIs_and_signal_intensity.rst
@@ -1,28 +1,29 @@
 ROIs and signal intensity
 =========================
 
-When running the pipeline from the jupyter notebook, the ROIs are calculated for each frame.
-The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, based on the fact that the sample signal won't change considerably within this interval.
-This is a good approximation and useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
+When running the pipeline from the jupyter notebook, a single ROI is calculated for each frame.
+The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, instead of a single ROI per frame. 
+This is possible if the sample signal doesn't change considerably within this interval.
+Therefore, calculating one ROI per group of frames is a good approximation, and it can be useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
 
-The ROI algorithm can be resumed as:
+The ROI algorithm can be summarized as:
 
 1. Average the group of frames into a single 2D matrix
 2. Automatic threshold (Otsu's method)
 3. Binarize the image 
-4. Remove small holes inside the VNC
+4. Seal small holes inside the VNC
 5. Select the largest group of connected foreground pixels
 6. Return a mask that matches the largest label
 
 For some datasets with lower VNC signal, a higher threshold value tends to provide better results.
 To change the threshold method, change the ``threshold_method`` parameter in the ``pipeline.measure_vnc_length`` function to ``otsu``.
 
 Even though the signal from pixels inside the VNC is stable, the selected threshold value is not always perfect.
-After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold.
+After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold (i.e., holes).
 These regions are merged back into the VNC if they are completely contained inside the VNC binary component.
 
-To calculate the signal intensity, we apply the mask to the embryo and calculate the mean pixel value.
-The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using the code from ``snazzy_analysis``.
+To calculate the signal intensity, a mask is applied to the embryo and the mean pixel value is calculated.
+The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using ``snazzy_analysis``.
 
 Visualizing calculated ROIs
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~