Merge pull request #21 from ACRLab/alanaedits

cdpaiva · web-flow · commit 617f510b44fb · 2025-10-08T10:35:52.000-04:00
Documentation review
diff --git a/docs/source/Data_analysis/Graphical_User_Interface.rst b/docs/source/Data_analysis/Graphical_User_Interface.rst
@@ -45,7 +45,7 @@ You can load more than one dataset to a Group, or have multiple Groups to visual
 When more than one dataset is loaded, you cannot change the analysis parameters anymore.
 In this mode, the analysis results are read-only.
 The same parameters should be used to comparing different datasets.
-To make sure that the chosen parameters work with each dataset, load each one separately and verify the peak detection first.
+To make sure that the chosen parameters are appropriate for each dataset, load each one separately and verify the peak detection first.
 The comparison plots in the Plot menu will show results by Group.
 In the upper left corner of the GUI, a dropdown menu can be used to change the Group that is currently being visualized.
 
@@ -81,7 +81,7 @@ The ``emb_numbers.png`` file represents a snapshot of the microscope's field of
 Config parameters
 -----------------
 
-When loading a dataset the code will look for a config file named ``peak_detection_params.json`` inside th dataset directory and will use its data for the analysis.
+When loading a dataset the code will look for a config file named ``peak_detection_params.json`` inside the dataset directory and will use its data for the analysis.
 If not found, a file with default parameters is created.
 The default parameters can be found inside ``config.py``. 
 If you change any of the parameters, they will be recorded in this file.
@@ -95,7 +95,7 @@ From this window it's possible to set:
 * Group name: name of the group that contains this dataset
 * First peak threshold: minimum time in minutes that has to pass before any peak happens. Used to make sure that the first peak caught at the imaging session is really the activity onset.
 * To_exclude: embryo numbers that will be excluded from the analysis. These embryos will be excluded from the analysis.
-* To_remove: embryo numbers that will be analyzed, but will show up in the 'Removed' group.
+* To_remove: embryo numbers that will be analyzed, but will appear in the 'Removed' group.
 * Embryos that have it's first peak before the first peak threshold or that were marked by the user as removed will also be at the to_remove category.
 * Has_transients: if selected the code will try to identify and skip the first peak if it's likely just a transient.
 * Has_dsna: if selected the code will try to determine dSNA and ignore all peaks that happen after dSNA start.
@@ -109,8 +109,8 @@ Visualizing traces
 
 The description here refers to the image on the top of the file.
 
-The top app bar has buttons to change the data presentation.
-Below the top app bar there are two sliders.
+The top bar has buttons to change the data presentation.
+Below the top bar there are two sliders.
 The first is for the frequency cutoff, which controls how much the signal is smoothed for the finding peaks algorithm.
 The second is the peak width parameter, used to determine the start and end times of each peak.
 The sidebar presents which embryos are currently considered for plots and analysis, and which ones should be removed.
diff --git a/docs/source/Data_processing/Overview.rst b/docs/source/Data_processing/Overview.rst
@@ -4,7 +4,7 @@ Overview
 ``snazzy_processing`` is a Python package to automate the extraction of primary data: Ventral Nerve Cord (VNC) length and signal intensity from fluorescence imaging of Drosophila embryos.
 The package is an image processing pipeline, that can be divided intro three main stages:
 
-* Crop movies for individual specimens from raw data
+* Crop raw data containing many samples into smaller movies (one per sample)
 * Calculate signal intensity inside ROIs
 * Measure the VNC length
 
@@ -16,16 +16,16 @@ This means that you can feed either ``.tif`` or ``.nd2`` files into the pipeline
 If your raw data is in another format, you must first convert it to ``.tif``.
 ImageJ for example provides several plugins to convert files to tif, including the excellent `BioFormats extension <https://imagej.net/formats/bio-formats>`__.
 
-Before running the pipeline, which is the last cell of the jupyter notebook, we must determine from where to crop each movie in the raw data.
+Before running the pipeline, which is the last cell of the jupyter notebook, we must determine where to crop the raw data to separate samples.
 
-The bounding boxes for each individual sample are determined via thresholding.
-Because the image histograms might vary, some manual adjustment might be necessary.
+Thresholding is used to distinguish each sample from its background and create bounding boxes.
+Because thresholding depends on image histograms, and these histograms might vary between movies, some manual adjustment may be necessary.
 Inspect where the bounding boxes will be created in the jupyter notebook.
 They should cover the entire sample.
-If there is a bounding box that covers more than one specimen, because maybe they were touching each other, those specimens must be ignored or further manually processed.
-We can't use it directly because the resulting ROI will not match a single sample and the length and activity data will be wrong.
+If there is a bounding box that covers more than one specimen (e.g, samples are touching each other), those specimens must be ignored or further manually processed.
+Boxes that contain several samples cannot be used directly because the resulting ROI will not match a single sample and the length & activity data will be wrong.
 
-After the bounding boxes are determined, you can run the pipeline.
+After the bounding boxes are determined, you can select which embryos to include in the analysis and run the pipeline.
 By default, the methods in the pipeline will not overwrite any data.
 If data for a given sample is found in the output directory, it will simply skip that sample.
 If you want to recalculate any data, first remove or rename the current files.
diff --git a/docs/source/Data_processing/Process_raw_data.rst b/docs/source/Data_processing/Process_raw_data.rst
@@ -1,12 +1,12 @@
 Process raw images
 ==================
 
-Since the imaging is done with a large Field of View microscope, usually during 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
-The simplest way to handle the raw data is to crop it in individual movies.
-There is a considerable amount of background pixels that can be ignored in the raw data. After cropping the embryos, all individual movies combined take about 40% of the original memory space.
-This already saves considerable ROM memory but most importantly, it means we can easily load individual movies in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
+Since the imaging is performed with a large field of view microscope and typically time-lapsed for 6 hours or more, the raw images tend to be in the range of 50 ~ 200 GiB.
+The simplest way to handle the raw data is to first crop it into individual movies for each sample, as there is a considerable amount of background pixels that can be ignored in the raw data.
+After cropping the embryos, all individual movies combined take about 40% of the original memory space.
+This already saves considerable ROM memory but, most importantly, it means individual movies can be easily loaded in the RAM of a regular computer (16-32 GB RAM), without needing to use memory mapped files.
 
-The algorithm to process the raw image can be resumed as:
+The algorithm to process the raw image can be summarized as:
 
 1. Get the maximum projection of each pixel for the first 10 frames
 2. Automatic threshold (Triangle method)
@@ -19,11 +19,11 @@ The algorithm to process the raw image can be resumed as:
 To calculate the bounding boxes of each embryo, we first take the maximum projection of each pixel for the first 10 frames, and then use the Triangle threshold method to binarize the image.
 The Triangle threshold is a good choice here because the image has a lot of background pixels, resulting in an unimodal histogram that is centered around the background pixels average value.
 
-Once we have the binary image, we traverse it to identify each embryo.
-Whenever a foreground pixel is found, we mark all connecting foreground pixels, and also keep track of the amount of pixels marked and the extreme points (minimum and maximum coordinates in both dimensions).
-The pixel count is used to determine if the marked area really corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
+Once the binary image is created, it is traversed to identify each embryo.
+Whenever a foreground pixel is found, we mark all connecting foreground pixels and keep track of: (1) the amount of pixels marked and (2) the extreme points (minimum and maximum coordinates in both dimensions).
+The pixel count is used to determine if the marked area accurately corresponds to an embryo, or just a smaller artifact that was erroneously considered a foreground.
 The minimum pixel count might change depending on the type of sample being processed, and can be adjusted in ``slice_img.get_bbox_boundaries``.
-Regions with high signal intensity, for example corresponding to fly embryo's eyes or gut are examples of smaller artifacts that sometimes are included in the binary image, but can easily be removed due to its size.
+Regions of high signal intensity that represent small artifacts (e.g., fly embryo eyes or gut) can be included in the binary image but easily distingushed from the embryo due to their size.
 The extreme points are then used to generate the bounding boxes, which will determine the positions where the image will be cropped.
 
 The raw image is opened as a memory map using ``numpy``, and the individual embryos are cropped and saved as tif files.
diff --git a/docs/source/Data_processing/ROI_length.rst b/docs/source/Data_processing/ROI_length.rst
@@ -1,12 +1,16 @@
+Neurodevelopmental Time
+----------------
+Together the ROI length and the full embryo size are used as a proxy to measure the embryonic neurodevelopmental progression:
+
+developmental_progression = embryo_length / ROI_length
+
 ROI length
 ==========
 
-The ROI length is used as a proxy to measure the embryonic neurodevelopmental progression.
-It is calculated by center line estimation.
-The idea is to measure the line that will pass through the center of the ROI.
-This will correspond to the ventral nerve cord length.
+The ROI length is calculated by center line estimation.
+The general approach is to measure the line that will pass through the center of the ROI; this will correspond to the ventral nerve cord length.
 
-To determine this line, we go over the following steps:
+To determine the ROI length, the following steps are used:
 
 1. Binarize the image
 2. Apply a 'chessboard' distance transform 
@@ -21,7 +25,7 @@ Embryo Full size
 
 The full specimen size is calculated by approximating the entire sample shape as an ellipse, and measuring this ellipse's diameter.
 
-The steps to calculate the sample size are:
+The steps to calculate the embryo's size are:
 1. Equalize the image histogram
 2. Automatic threshold (Triangle method)
 3. Binarize the image
diff --git a/docs/source/Data_processing/ROIs_and_signal_intensity.rst b/docs/source/Data_processing/ROIs_and_signal_intensity.rst
@@ -1,28 +1,29 @@
 ROIs and signal intensity
 =========================
 
-When running the pipeline from the jupyter notebook, the ROIs are calculated for each frame.
-The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, based on the fact that the sample signal won't change considerably within this interval.
-This is a good approximation and useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
+When running the pipeline from the jupyter notebook, a single ROI is calculated for each frame.
+The processing can be sped up by calculating a single ROI for groups of 5 or 10 frames, instead of a single ROI per frame. 
+This is possible if the sample signal doesn't change considerably within this interval.
+Therefore, calculating one ROI per group of frames is a good approximation, and it can be useful for quick analyses, at the cost of the eventual errors in readings caused by movement (see ``activity.ipynb`` for details about the error in activity caused by downsampling).
 
-The ROI algorithm can be resumed as:
+The ROI algorithm can be summarized as:
 
 1. Average the group of frames into a single 2D matrix
 2. Automatic threshold (Otsu's method)
 3. Binarize the image 
-4. Remove small holes inside the VNC
+4. Seal small holes inside the VNC
 5. Select the largest group of connected foreground pixels
 6. Return a mask that matches the largest label
 
 For some datasets with lower VNC signal, a higher threshold value tends to provide better results.
 To change the threshold method, change the ``threshold_method`` parameter in the ``pipeline.measure_vnc_length`` function to ``otsu``.
 
 Even though the signal from pixels inside the VNC is stable, the selected threshold value is not always perfect.
-After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold.
+After binarizing the image, the VNC might contain small regions that were lower than the calculated threshold (i.e., holes).
 These regions are merged back into the VNC if they are completely contained inside the VNC binary component.
 
-To calculate the signal intensity, we apply the mask to the embryo and calculate the mean pixel value.
-The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using the code from ``snazzy_analysis``.
+To calculate the signal intensity, a mask is applied to the embryo and the mean pixel value is calculated.
+The dynamic and structural channel measurements are exported as a ``.csv`` file and further processed using ``snazzy_analysis``.
 
 Visualizing calculated ROIs
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~