Skip to content
Merged

Docs #18

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# SNAzzy: an image processing pipeline for investigating global Synchronous Network Activity

## Getting Started

### Installation

The project uses [conda](https://docs.conda.io) to manage dependencies.
If you don’t already have conda, you can download and install it from the official website.

Make a copy of the repo (e.g. with `git clone`), then `cd` into the root folder of the repo.

Recreate the conda environment with the dependencies listed in `environment.yml` in the repo's root:

```
conda env create -f=environment.yml
```
Activate the environment:

```
conda activate snazzy-env
```
## Contributing

Thank you for being interested in `snazzy`!

If you are interested in contributing, we accept contributions of all sorts: improving documentation, submitting bug reports, adding feature requests or writing code.
Feel free to create an issue or a pull request!

If you are new to open souce and need help creating a pull request, we recommend taking a look at these tutorials:
Here are a couple of friendly tutorials you can include: http://makeapullrequest.com/ and http://www.firsttimersonly.com/

### How to report a bug

Please open an issue for any bugs or request for help analyzing your data.

When filing an issue, please add the following informatation:

1. What operating system are you using?
2. What did you expect to see?
3. What did you see instead?

> Did you have a problem analyzing data? If possible, please provide an example dataset.

### How to suggest a feature or enhancement

Please file an issue explaining the desired feature or enhancement.
Original file line number Diff line number Diff line change
@@ -1,38 +1,49 @@
# Example Analysis
Example Analysis
================

An example of how to use the GUI to analyze the data output from the raw image processing pipeline.
An example of how to use the GUI to analyze the data from the ``snazzy_processing`` pipeline.

## Open the GUI
Open the GUI
------------

Open a terminal window and activate the conda environment:
Open a terminal window and activate the conda environment.

```
conda activate pscope_analysis
```
.. code::

conda activate snazzy-env

Then ``cd`` into the snazzy_analysis folder, and run the following command to open the GUI:

.. code::

python3 snazzy_analysis/gui/gui.py

Refer to the Getting Started documentation if you haven't installed conda or haven't created an environment yet.

## Load data
Load data
---------

To load data in the GUI, select an entire folder that has pasnascope output.
To load data in the GUI, select an entire folder that has snazzy_processing output.

The data from a folder is inspected and loaded as an Experiment object.
There are several configurable parameters that change how data is processed.
The parameters that change more often are presented as a dialog window as soon as we select a directory.
For the example dataset, we are not going to change any of these parameters.
For more details about these parameters, refer to the GUI guide item 'Config Parameters'.
For more details about these parameters, refer to the GUI guide, section `Config Parameters <Graphical_User_Interface.html#config-parameters>`__.

## Visualizing data
Visualizing data
----------------

When the data is loaded the GUI presents a sidebar with accepted and removed embryos, and the currenlty selected embryo.
The sidebar can be used to select other embryos.
For the selected trace, we can see the identified peaks.
Once the data is loaded the GUI presents a sidebar with accepted and removed embryos, and the currenlty selected embryo.
The signal from each channel can be inspected by clicking the button to the right of the trace plot.
The sidebar can be used to select other embryos.
Only the selected embryos are considered in any of the plots generated in the GUI.

### Adjusting peaks
Adjusting peaks
---------------

The first option to change peaks is to change the frequency filter value.
Higher frequency values will result in more denoising, which will help if the signal has many fast oscillations that should be ignored.
Lower frequency values will result in more denoising, which will help if the signal has many fast oscillations that should be ignored.
A recommended workflow is to change the frequency slider and see how the selected trace looks.
Then click 'Apply Changes' and change the presentation mode to see All Embryos.
Inspect the new peaks for every embryo and stop once peaks are precise enough.
Expand All @@ -41,12 +52,16 @@ To solve this problem, it is also possible to manually add new peaks or remove e

The peak width can be controlled with the peak width slider.
The value of 0.98 works well for the majority of samples.
To evaluate the peak width values, click the 'View Widths' button.
Increasing the value in the slider will increase the peak width, while decreasing the slider makes the peaks more narrow.
To better understand this parameter, refer to the `scipy.signal.find_peaks documentation <https://docs.scipy.org/doc/scipy-1.16.2/reference/generated/scipy.signal.peak_widths.html#scipy.signal.peak_widths>`__.
To inspect the peak widths, click the 'View Widths' button.
Increasing the value in the slider will increase the peak width, while decreasing the slider makes the peaks narrower.

.. NOTE:: Changes in the slider values must be applied to all samples by clicking "Apply Changes", otherwise they will be discarted.

Once all peak data looks good, we can open other directories as another Group, to compare trace properties between them.

## Comparing with another Experiment
Comparing with another Experiment
---------------------------------

We can combine data from multiple experiments in two ways: either by adding more data from another experiment to the same Group, or by adding another Group and compare the different loaded Groups.
In both modes, it's not possible to change the peak detection parameters, that's possbile only when a single experiment is loaded.
Expand Down
151 changes: 151 additions & 0 deletions docs/source/Data_analysis/Graphical_User_Interface.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
GUI
===

The graphical user interface is written using ``PyQt6`` and ``pyqtgraph``.

The GUI's main functionalities are:

1. Visualize and adjust peak data.
2. Combine multiple experiments as a group.
3. Compare multipe groups.
4. Inspect TIF movies in sync with the DFF signal.
5. Inspect all parameters used in the analysis.

Loading the GUI
---------------

First step to use the GUI is to activate the conda environment.
Refer to the `Getting Started <../Getting_Started.html>`__ session if you haven't created an environment yet.

.. code:: bash

conda activate snazzy-env

Then, from the snazzy_analysis directory, run the GUI:

.. code:: bash

python3 snazzy_analysis/gui/gui.py

Using the GUI
-------------

There are two primary modes to use the GUI:

1. Open a single experiment:

Allows you to visualize peak detection results and adjust parameters if needed.

2. Compare experiments:

You can load more than one experiment to a Group, or have multiple Groups to visualize comparisons.
When more than one experiment is loaded, you cannot change the analysis parameters anymore.
In this mode, the analysis results are read-only.
Therefore the general workflow is to first open each experiment separately and make sure all parameters are correct for peak detection.
The comparison plots in the Plot menu will show results by Group.
In the upper left corner of the GUI, a dropdown menu can be used to change the Group that is currently being visualized.

Loading an Experiment
---------------------

To load an Experiment select a directory that has ``snazzy_processing`` output.
The directory structure should look like:

.. code:: bash

|-- project_folder
|-- data
| -- 20240501
| -- activity
| emb1.csv
| ..
| -- lengths
| emb1.csv
| ..
| -- embs
| emb1-ch1.tif
| emb1-ch2.tif
| full-length.csv
| emb_numbers.png

The ``activity`` and ``lengths`` directories, and the ``full-length.csv`` file are required.
If any of these is not found, the GUI will abort loading with an error message.

The ``embs`` directory will hold individual embryo movies in ``.tif`` format, and can be used to visualize embryo movies in sync with the DFF trace.
The ``emb_numbers.png`` file represents a snapshot of the microscope's field of view at the start of the imaging session, and also shows the embryo id of each embryo.

Config parameters
-----------------

When loading an experiment the code will look for a config file named ``peak_detection_params.json`` inside the experiment directory and will use its data for the analysis.
If not found, a file with default parameters is created.
The default parameters can be found inside ``config.py``.
If you change any of the parameters, they will be recorded in this file.
To restore the original settings, simply delete ``peak_detection_params.json`` from the corresponding directory.
Sharing the config file allows someone else to reproduce your results in another machine.
Keep in mind that each directory should have its own ``peak_detection_params.json`` file.

The parameters that are most frequently changed are presented in the GUI when an Experiment is loaded.
From this window it's possible to set:

* Group name: name of the group that contains this experiment dataset
* First peak threshold: minimum time in minutes that has to pass before any peak happens. Used to make sure that the first peak caught at the imaging session is really the activiy onset.
* To_exclude: embryo numbers that will be excluded from the analysis. These embryos will be excluded from the analysis.
* To_remove: embryo numbers that will be analyzed, but will show up in the 'Removed' group.
* Embryos that have it's first peak before the first peak threshold or that were marked by the user as removed will also be at the to_remove category.
* Has_transients: if selected the code will try to identify and skip the first peak if it's likely just a transient.
* Has_dsna: if selected the code will try to determine dSNA and ignore all peaks that happen after dSNA start.
* Dff_strategy: Combo box with the baseline strategy methods. ``local_minima`` will pick the bottom 11 points out of the ``baseline_window_size`` and use that average as the baseline. ``baseline`` will split the DFF values into bins and use the average of the most frequent bin as the baseline. This method assumes that the bursts of activity are sparse, so that for all windows the most frequent bin falls into the baseline values.

Inside the File menu there is an option to open the ``json`` file and change any of its parameters.
Updating the file causes the entire Experiment to be recreated with the new configuration data.

Visualizing traces
------------------

Once the data is loaded, you should see something similar to this:

.. image:: /_static/gui-screenshot.png
:alt: GUI Screeshot with loaded data

The top app bar has buttons to change the data presentation.
Below the top app bar there are two sliders.
The first is for the frequency cutoff, which controls how much the signal is smoothed for the finding peaks algorithm.
The second is the peak width parameter, used to determine the start and end times of each peak.
The sidebar presents which embryos are currently considered for plots and analysis, and which ones should be removed.
You can toggle the embryo status between these two categories.
In the main view you will see the DFF trace of the currently selected embryo.
The pink dots represent the peak indices.

You can also visualize the signal from each channel, by clicking on the button in the right of the screen.
This window will present the signal from each channel and also the hatching point, which can be changed manually by dragging the line.

Manually changing peak data
~~~~~~~~~~~~~~~~~~~~~~~~~~~

By pressing ``shift`` + ``left mouse click`` you can add a new peak to the plot.
Because we usually have many points over the X axis, it can be hard to click exactly where we want the peak index to land.
To help with this, the actual peak index after clicking in the local maximum value for a small window around the point that was clicked.
By pressing ``CTRL`` + ``left mouse click`` you can remove a peak.
It also works on a small X axis range just like when adding new peaks.

The peak width can also be adjusted.
Click the button 'Adjust widths' to display handles on the peak boundaries.
To change the width, just drag the line to the desired position.

The manual data is saved in ``peak_detection_params.json``, in a key named ``embryos``.
Click 'Clear manual data' to remove the manual data for the current sample of all samples at once.


View embryo movies
------------------

If you haven't removed the individual movies that were cropped from raw data, you can visualize them in the GUI.

.. NOTE:: When running the pipeline, set the variable `clean_up_data = False` to keep the cropped movies.

The embryos must be placed inside the experiment directory, in a directory named ``embs``.

If there are no files available to show, the GUI will simply display an error message.

If there are files, you can select one and see the movie in sync with the DFF trace.
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# Hathching point calculation
Hathching point calculation
===========================

Hatching happens when the fruit-fly embryo leaves its egg.
Determining hatching time is somewhat straightforward, because when the embryo hatches it moves out of the field of view.
In terms of the signal recorded, this manifests as an abrupt drop in both active and structural channel signals.
In terms of the signal recorded, this is observed as an abrupt drop in both active and structural channel signals.

To identify this signal drop we use the structural channel signal, because it's more stable than the active channel.
The structural channel is first smoothed using `scipy.signal.savgol_filter` and zscored.
The structural channel is first smoothed using ``scipy.signal.savgol_filter`` and zscored.
Then we calculate the baseline of this signal, as the average of the most frequent bin in the signal's histogram.
The signal used to calculate hatching is the structural channel signal minus the baseline.
As a default threshold we use `Z = 0.35`.
As a default threshold we use ``Z = 0.35``.
The hatching point is then marked as the first point that reaches the Z score.

Notice that all data after the hatching point should be ignored.
Expand All @@ -22,4 +23,4 @@ On a few occasions, mostly due to very abrupt motion, the ROI is understimated a
In these cases, you can drag the line that indicates the hatching to a more accurate positon or remove that embryo.

If the default Z-score of 0.35 does not work in your case, you can adjust it to another value.
Inside the GUI, open the Config file `Menu... View pd_params` and change the value of the Z-score variable.
Inside the GUI, open the Config file ``Menu... View pd_params`` and change the value of the Z-score variable.
64 changes: 64 additions & 0 deletions docs/source/Data_analysis/Peak_Detection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Peak Detection
==============

Peak detection is one of the core features of ``snazzy_analysis``.
From the detected peaks, we derive most of the metrics used in this package: peak widths, amplitudes, rise times, decay times, and more.
The algorithm consists of several steps, each with parameters that can be fine-tuned for optimal detection.

Each step is implemented as a single function that takes a params dictionary and changes the data in ``Trace._peak_idxes``.
If you want to change or extend the peak detection steps, just add another function that follows this interface as a new stage inside ``Trace.detect_peaks``.

To understand how the different parameters influence the peak detection, it's important to first understand the entire peak detection algorithm.

1. Peak Detection on Low-Passed Filtered Signal
-----------------------------------------------

The ΔF/F (DFF) trace is filtered in the frequency domain using a ``freq_cutoff`` parameter: all frequencies above this value are removed, and the remaining low-frequency components are used to reconstruct the filtered trace.
This acts as a smoothing step, which almost completely removes oscillations and short-duration peaks that do not correspond to actual activity bursts.

The ``freq_cutoff`` can be adjusted in the GUI using a slider, and the reconstructed signal is updated in real time.
The default value of ``0.0025 Hz`` works well for many traces, but traces with high-frequency noise may require a lower cutoff.
Different types of samples will likely result in different traces and this value will have to be adjusted.

Once we have the filtered trace, peaks are detected using the parameters ``fft_height`` and ``fft_prominence``.
The ``fft_height`` parameter is especially important because the reconstructed signal often contains minor ripples before the first real burst.
These are easy to identify, as they usually do not correspond to peaks in the original ΔF/F trace.
``fft_prominence`` complements ``fft_height`` by measuring how much a point must stand tall from its surrounding baseline in order to be marked as a peak.

2. Align Peaks in the Original Signal
-------------------------------------

After detecting peaks in the filtered signal, the peak indices are mapped back to the original ΔF/F trace.
This step is necessary because the low-passed filter will result shift peak positions.
The bursts of activity have a sharp rise and are followed closely by shorter oscillations.
To properly mark bursts, we use the leftmost peak in each burst as the peak index.

The window size used to search for the leaftmost peak is given by ``port_peaks_window_size``.
Since the leaftmost peak can have an amplitude very different than the local maximum peak, we specify the percentage from the local maximum we accept using the parameter ``port_peaks_thres``.

3. Filter peaks by local threshold
----------------------------------

As the embryos develop, there is a global trend of peak amplitude to rise and then stabilize before hatching.
We use this fact to perform an extra validation step for the calculated peaks.
Each peak is compared against its neighboring peaks, and peaks that are too high or too low are discarted.
For example, if a peak close to baseline level is misidentified between bursts, it will likely be discarded due to all other peaks having higher values nearby.

The window size used to compare each peak with its neighbors is controlled using ``local_thres_window_size``.
The minimum value for accepting a peak is given by ``local_thres_value``.

4. Optional Post-Processing
---------------------------

Some post-processing operations can further improve peak detection for specific types of traces.

Certain traces may exhibit a large burst at the beginning of the imaging session.
This is an artifact that should be removed.
In such cases, the ``remove_transients`` function can be applied.
It detects and removes initial bursts if their interval is significantly longer than the average of subsequent bursts.

Another post-processing step removes low-amplitude peaks that are likely false positives.
Peaks below a specified percentage of the maximum peak amplitude are discarted.

Finally, in the GUI, you can manually add or remove peaks.
These manual edits are used to update the set of calculated peaks.
11 changes: 11 additions & 0 deletions docs/source/Data_analysis/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Data Analysis
=============

.. toctree::
:maxdepth: 1
:caption: Contents:

Example
Graphical_User_Interface
Peak_Detection
Hatching_Point
Loading
Loading