Analysis code for "Spatial learning in feature-impoverished environments in Drosophila" (Yang Chen, Robert Alfredson, Dorsa Motevalli, Ulrich Stern, Chung-Hui Yang, bioRxiv 2024.09.28.615625; doi: https://doi.org/10.1101/2024.09.28.615625)
This document provides instructions for setting up the data analysis environment for this project. The software has been tested on Ubuntu 22.04 and Windows 11, but the instructions should be broadly applicable to other OS versions. No special hardware is required beyond a standard desktop or laptop computer.
- Python 3.10 (either system-installed or provided via Conda)
- A C compiler (e.g.,
gccon Linux, Xcode command line tools on macOS, or Visual Studio Build Tools on Windows)
You can set up this project using either of two recommended methods:
- Method 1: Virtualenv (requires system-wide Python installation)
- Method 2: Conda (provides isolated Python environments and package management) Choose the method most suitable for your needs.
Follow this method if you already have a system-wide Python installation and prefer using virtualenv.
Use your OS package manager (apt, brew, etc.) or download from the official Python website.
pip install virtualenv-
Linux/macOS
python3 -m venv myenv
-
Windows
python -m venv myenv
-
Linux/macOS
source myenv/bin/activate -
Windows
.\myenv\Scripts\activate
pip install -r requirements.txtpython scripts/setup.py build_ext --inplaceFollow this method for a more isolated environment, particularly if you do not have Python 3.10 installed system-wide or prefer easy package management.
Download and install from the official Miniconda website, selecting the appropriate installer for your OS.
conda create -n nvsl-analysis python=3.10
conda activate nvsl-analysisWith the environment activated, install packages listed in requirements.txt:
pip install -r requirements.txtpython scripts/setup.py build_ext --inplaceBy default, the script will use OS-specific path conventions (e.g., /home/you/your_video.avi vs. C:\Users\you\your_video.avi). To universally use Unix-style paths on Windows, create a JSON file drive_mapping.json:
{
"/media/Synology4/": "Y:\\\\"
}If you want a machine-local override for selected analyze.py defaults without checking those preferences into Git, create a file named .analyze.local.env in the repository root. This file is ignored by Git.
At present, this local config supports toggling the default one-shot between-reward SLI-bracketed plots:
# Local-only defaults for analyze.py
ENABLE_DEFAULT_BETWEEN_REWARD_SLI_PLOTS=false
SHOW_AUC_P_VALUES=falseAccepted boolean values are 1, true, yes, on, 0, false, no, and off.
Behavior:
- If
.analyze.local.envis absent, the repository default behavior is used. - If
ENABLE_DEFAULT_BETWEEN_REWARD_SLI_PLOTS=false, the default one-shot between-reward SLI-bracketed plots are skipped for your local runs only. - If
SHOW_AUC_P_VALUES=false, AUC labels show only significance stars /ns, without numeric p-values.
If Python 3.10 is already installed, setting up the virtual environment and installing dependencies (including building Cython extensions) typically takes about 2 minutes on a normal desktop computer. If you need to install Python or Miniconda from scratch, plan for an additional 3–5 minutes, depending on your internet speed and operating system.
Exit your environment:
- Virtualenv:
deactivate
- Conda:
conda deactivate
- Dependency errors: Double-check you have Python 3.10 and your virtual environment is activated before installing.
- Cython compilation issues: Confirm that you have a working C compiler (e.g., gcc, Xcode CLT, or VS Build Tools).
Once your environment is ready, you can run a standard analysis with:
python analyze.py -v "/actual/path/to/your_video.avi" -f "0-9"Where:
-vis required and specifies the path to your video file.-fis required if analyzing a chamber that supports multiple flies; use a range like0-9.
You can detect turn events in different areas with the --turn flag:
circleagaroseboundary
For example:
python analyze.py -v "/actual/path/to/your_video.avi" -f "0-9" --turn circleFor more details on optional flags, see the top of analyze.py or run python analyze.py --help.
Most analyses in this repository follow one of four patterns:
- One-shot plots: run
analyze.pywith the relevant flag and it writes figures, CSVs, logs, or debug files directly. - Analyze import overlay: run
analyze.pyonce per cohort to export a specialized.npz, then callanalyze.pyagain with matching*-import-*flags to make the overlay. In these plotting-stage commands, the-vvideo is just a required placeholder; the plotted data come from the imported NPZ files. - Bundle then plot: run
analyze.pyonce per cohort to export an.npzbundle, then call ascripts/plot_*script to overlay bundles across cohorts, SLI extremes, or conditions. - Script-only utility: call a helper in
scripts/directly for stats, contrast bundles, or figure variants that do not need to reopen video files.
The common base command is:
python analyze.py -v "/actual/path/to/videos/c*.avi" -f "0-9" --gl "Group A"For cross-group bundle workflows, repeat the analyze.py export command for each group, storing each bundle under a distinct filename. The plotting scripts usually accept either comma-separated bundles:
python scripts/plot_com_sli_bundles.py \
--bundles exports/group_a_com_sli.npz,exports/group_b_com_sli.npz \
--labels "Group A,Group B" \
--metric commag \
--out imgs/group_com_sli_overlay.pngor repeatable Label=path inputs:
python scripts/plot_overlay_training_metric_hist.py \
--input "Group A=exports/group_a_reward_count_hist.npz" \
--input "Group B=exports/group_b_reward_count_hist.npz" \
--out imgs/reward_count_hist_overlay.pngFor analyze.py import-overlay workflows, keep the normal required -v and -f arguments but point them at any convenient valid video/fly selection. A small representative or demo video is usually enough because the overlay values are loaded from LABEL:PATH import NPZs:
python analyze.py \
-v "/path/to/any/valid/video.avi" \
-f "0-9" \
--btw-rwd-conditioned-com \
--btw-rwd-conditioned-com-import-npz "Group A:exports/group_a_com_bins.npz" \
--btw-rwd-conditioned-com-import-npz "Group B:exports/group_b_com_bins.npz" \
--btw-rwd-conditioned-com-import-out imgs/com_binned_overlay.pngThese metrics can be run directly from analyze.py; outputs are usually written under imgs/ plus any CSV/NPZ paths named by the flag.
| Metric or view | Main analyze.py flags |
Typical output | Notes |
|---|---|---|---|
| Standard learning/reward summaries | default command, often with --rpd |
learning_stats.csv, reward and reward-per-distance plots |
Good first pass for a new experiment. |
| SLI/reward PI by sync bucket | --rpd; optional --best-worst-sli |
imgs/rewards_per_dist..., SLI-bracketed reward plots |
--best-worst-sli also enables top/bottom SLI overlays for supported reward metrics. |
| Walking speed, stopped fraction, rewards/min | default post-analysis metrics | log summaries and standard plots | These are part of the core per-training summary path. |
| Circular motion, angular velocity, turn radius | --circle |
angular velocity and turn-radius figures | Use with circular-motion analyses; --jab selects the JAABA classifier. |
| Turn events by area | --turn circle, --turn agarose, or --turn boundary |
turn plots and turn statistics | Area-specific turn detection. |
| Turn probability by distance | --turn-prob-by-dist 2,3,4,5,6 |
distance-binned turn probability plots | Commonly used with --turn circle. |
| Outside-circle duration | --outside-circle-radii ... |
outside-circle duration plots | Radius-thresholded residence outside the reward circle. |
| Reward raster | --reward-raster |
imgs/reward_raster.png by default |
Supports training selection, first-N rewards, SLI subset, and sorting flags. |
| First-N reward diagnostics | --first-n-reward-diagnostics |
CSV plus imgs/first_n_reward_diagnostics.png |
Anchors each fly to its nth reward; can also write an optional NPZ. |
| First-N reward SLI comparison | --first-n-reward-sli-comparison |
CSV plus imgs/first_n_reward_sli_comparison.png |
Compares a selected reward-timing/count metric against SLI. |
| Between-reward trajectory examples | --btw-rwd-plots |
sampled trajectory panels under imgs/ |
Use --btw-rwd-mode to choose random, first-N, last-N, or first/last panels. |
| Between-reward distance histogram | --btw-rwd-dist-hist |
imgs/btw_rwd_dists.png |
Can export histogram data with --btw-rwd-dist-export-hist. |
| Normalized between-reward distance histogram | --btw-rwd-norm-dist-hist |
imgs/btw_rwd_norm_dists.png |
Supports normalization mode, transform, SLI subset, and wall/nonwalking filters. |
| Between-reward tortuosity histogram | --btw-rwd-tortuosity-hist |
imgs/btw_rwd_tortuosity_hist.png |
Can also export histogram data for overlay plotting. |
| Between-reward tortuosity by max radius | --btw-rwd-tortuosity-box |
imgs/btw_rwd_tortuosity_by_max_radius_box.png |
Add --btw-rwd-tortuosity-box-export for later group overlays. |
| Between-reward COM magnitude histogram | --btw-rwd-com-mag-hist |
imgs/btw_rwd_com_mag_hist.png |
Uses between-reward center-of-mass magnitude. |
| Distance-binned COM | --btw-rwd-conditioned-com |
imgs/btw_rwd_dist_binned_com.png |
Optional export/import NPZ flags make this reusable across groups. |
| Distance-binned distance traveled | --btw-rwd-conditioned-disttrav |
imgs/btw_rwd_dist_binned_disttrav.png |
Optional export/import NPZ flags support cached overlays. |
| Max distance vs. distance traveled | --btw-rwd-conditioned-dmax-vs-disttrav |
imgs/between_reward_dmax_vs_disttrav.png |
Optional export/import NPZ flags support cached overlays. |
| Between-reward polar occupancy | --btw-rwd-polar |
imgs/btw_rwd_polar__*.png |
Supports 1D angular, 2D theta-radius, pooled trainings, per-fly plots, and debug TSVs. |
| Reward count histogram | --reward-count-hist |
imgs/rwd_count_hist.png |
Add --reward-count-export-hist for script-based overlays. |
| Reward total bars | --reward-count-total-bars |
imgs/rwd_totals.png |
Add --reward-count-total-export for script-based overlays and stats. |
| Wall-contact PMF/totals | --wall, --wall-contacts-pmf-*, --wall-contacts-per-reward-interval-total-bars |
wall-contact plots, optional NPZ exports | Some wall-contact outputs can be imported or plotted later. |
These are two-stage workflows where analyze.py handles both stages: first it exports cached data from real cohort videos, then a later analyze.py command imports those cached files and draws the overlay. The plotting-stage video path is arbitrary as long as it is valid enough for the command to run.
| Metric family | Export from cohort run | Plot with analyze.py import flags |
Notes |
|---|---|---|---|
| Distance-binned COM | --btw-rwd-conditioned-com --btw-rwd-conditioned-com-export-npz exports/group_a_com_bins.npz |
--btw-rwd-conditioned-com --btw-rwd-conditioned-com-import-npz "Group A:exports/group_a_com_bins.npz" |
Repeat the import flag for each group; use --btw-rwd-conditioned-com-import-out for the output image. |
| Distance-binned distance traveled | --btw-rwd-conditioned-disttrav --btw-rwd-conditioned-disttrav-export-npz exports/group_a_disttrav_bins.npz |
--btw-rwd-conditioned-disttrav --btw-rwd-conditioned-disttrav-import-npz "Group A:exports/group_a_disttrav_bins.npz" |
Repeatable LABEL:PATH imports; output path via --btw-rwd-conditioned-disttrav-import-out. |
| Max distance vs. distance traveled | --btw-rwd-conditioned-dmax-vs-disttrav --btw-rwd-conditioned-dmax-vs-disttrav-export-npz exports/group_a_dmax_vs_disttrav.npz |
--btw-rwd-conditioned-dmax-vs-disttrav --btw-rwd-conditioned-dmax-vs-disttrav-import-npz "Group A:exports/group_a_dmax_vs_disttrav.npz" |
Supports overlay stats with --btw-rwd-conditioned-dmax-vs-disttrav-stats. |
| Wall-contact PMF | --export-wall-contacts-per-sync-bkt-npz exports/group_a_wall_sync.npz or --export-wall-contacts-per-reward-interval-npz exports/group_a_wall_interval.npz |
--wall-contacts-pmf-import-npz "Group A:exports/group_a_wall_sync.npz" |
Imports cached wall-contact counts and plots a PMF; output path via --wall-contacts-pmf-import-out. |
| Wall contacts per reward interval totals | --wall-contacts-per-reward-interval-total-bars --wall-contacts-per-reward-interval-total-export exports/group_a_wall_interval_totals.npz |
--wall-contacts-per-reward-interval-total-import-npz "Group A:exports/group_a_wall_interval_totals.npz" |
Imports cached scalar-bar NPZs; output path via --wall-contacts-per-reward-interval-total-import-out. |
These workflows are useful when you want cohort overlays, top/bottom SLI splits at plot time, stats annotations, or consistent axis handling across multiple runs.
| Metric family | Export from analyze.py |
Follow-up script | Plot-time metric/options |
|---|---|---|---|
| COM magnitude + SLI | --export-com-sli-bundle exports/group_a_com_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric commag or --metric sli; supports --sli-extremes. |
| Between-reward max distance + SLI | --export-between-reward-maxdist-sli-bundle exports/group_a_maxdist_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric between_reward_maxdist. |
| Between-reward return-leg distance + SLI | --export-btw-rwd-return-leg-dist-sli-bundle exports/group_a_return_leg_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric between_reward_return_leg_dist. |
| Cumulative-reward-aligned SLI | --export-cum-reward-sli-bundle exports/group_a_cum_reward_sli.npz |
scripts/plot_cum_reward_sli_bundles.py |
--metric sli, reward_pi, reward_pi_exp, or reward_pi_yoked. |
| Turnback dual-circle ratio + SLI | --export-turnback-sli-bundle exports/group_a_turnback_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric turnback; choose --turnback-mode exp, ctrl, or exp_minus_ctrl. |
| Turnback by outer radius | --export-turnback-outer-radius-sli-bundle exports/group_a_turnback_radius.npz |
scripts/plot_turnback_outer_radius_sli_bundles.py |
Radius-response bars/curves with optional stats. |
| Turnback by excursion bin | --export-turnback-excursion-bin-sli-bundle exports/group_a_turnback_excursion.npz |
scripts/plot_turnback_excursion_bin_sli_bundles.py |
Realized excursion-bin ratio/success/failure/total plots. |
| Return probability by outer radius | --export-return-prob-outer-radius-sli-bundle exports/group_a_return_prob_radius.npz |
scripts/plot_return_prob_outer_radius_sli_bundles.py |
--metric ratio, success, failure, total, or stacked. |
| Return probability by excursion bin | --export-return-prob-excursion-bin-sli-bundle exports/group_a_return_prob_excursion.npz |
scripts/plot_return_prob_excursion_bin_sli_bundles.py |
Realized excursion-bin return probability. |
| Agarose dual-circle avoidance + SLI | --export-agarose-sli-bundle exports/group_a_agarose_sli.npz |
scripts/plot_com_sli_bundles.py; stats via scripts/stats_agarose_sli_bundles.py or scripts/stats_agarose_sli_anova.py |
--metric agarose; add --agarose-sli-include-pre when pre windows matter. |
| Wall-contact percent + SLI | --export-wallpct-sli-bundle exports/group_a_wallpct_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric wallpct; useful with plot-time SLI filtering. |
| Weaving per exit + SLI | --export-weaving-sli-bundle exports/group_a_weaving_sli.npz |
scripts/plot_com_sli_bundles.py |
--metric weaving. |
| Large-turn start distance + SLI | --export-lgturn-startdist-sli-bundle exports/group_a_lgturn_startdist.npz |
scripts/plot_com_sli_bundles.py |
--metric lgturn_startdist. |
| Reward-anchored large-turn path length/prevalence + SLI | --export-reward-lgturn-pathlen-sli-bundle exports/group_a_reward_lgturn.npz |
scripts/plot_com_sli_bundles.py |
--metric reward_lgturn_pathlen or reward_lgturn_prevalence. |
| Reward local variation + SLI | --reward-lv --export-reward-lv-sli-bundle exports/group_a_reward_lv.npz |
scripts/plot_com_sli_bundles.py |
--metric reward_lv. |
| Between-reward shortest-tail efficiency | --btw-rwd-shortest-tail --btw-rwd-shortest-tail-export-npz exports/group_a_shortest_tail.npz |
scripts/plot_overlay_btw_rwd_shortest_tail.py |
Per-training shortest-tail overlays and stats. |
| Between-reward tortuosity histograms | --btw-rwd-tortuosity-hist --btw-rwd-tortuosity-export-hist exports/group_a_tort_hist.npz |
scripts/plot_overlay_training_metric_hist.py or scripts/plot_overlay_training_metric_filled_hist.py |
PDF/CDF or filled-distribution overlays. |
| Reward count histograms | --reward-count-hist --reward-count-export-hist exports/group_a_reward_count_hist.npz |
scripts/plot_overlay_training_metric_hist.py or scripts/plot_overlay_training_metric_filled_hist.py |
PDF/CDF or filled-distribution overlays. |
| Reward total bars | --reward-count-total-bars --reward-count-total-export exports/group_a_reward_totals.npz |
scripts/plot_overlay_training_metric_scalar_bars.py |
Scalar bar overlays with optional stats. |
| Wall contacts per reward interval totals | --wall-contacts-per-reward-interval-total-bars --wall-contacts-per-reward-interval-total-export exports/group_a_wall_interval_totals.npz |
scripts/plot_overlay_training_metric_scalar_bars.py |
Scalar bar overlays with optional stats. |
| Tortuosity mean swarm | --btw-rwd-tortuosity-mean-swarm --btw-rwd-tortuosity-mean-swarm-export exports/group_a_tort_swarm.npz |
scripts/plot_between_reward_tortuosity_mean_swarm_bundles.py |
Per-fly swarm overlays. |
| Tortuosity by radius box data | --btw-rwd-tortuosity-box --btw-rwd-tortuosity-box-export exports/group_a_tort_box.npz |
scripts/plot_between_reward_tortuosity_box_bundles.py |
Cross-group box plots by radius bin. |
| Tortuosity vs. wall-contact scatter data | --btw-rwd-tortuosity-wall-scatter-export exports/group_a_tort_wall.npz |
scripts/plot_between_reward_tortuosity_wall_scatter_bundles.py or scripts/plot_between_reward_tortuosity_wall_flycorr_bundles.py |
Segment-level scatter or per-fly correlation overlays. |
Some between-reward SLI plots are also emitted automatically by analyze.py for one or two groups: COM magnitude, between-reward max distance, and return-leg distance. This behavior can be locally disabled with .analyze.local.env:
ENABLE_DEFAULT_BETWEEN_REWARD_SLI_PLOTS=falseUse explicit bundle exports when you need reproducible filenames, more than two groups, plot-time SLI extremes, or a figure that combines independently analyzed cohorts.
Below is a short demo to verify the pipeline with real data.
- Download the sample data Grab the files from this Figshare dataset.
- Run an example command
python analyze.py \ -v /path/to/your/demo/files/c*.avi \ -f 0-9 \ --turn circle \ --turn-prob-by-dist 2,3,4,5,6- The analysis takes about 4 minutes to complete on a typical desktop.
- This produces:
__analyze.log: A log summarizing results per fly.learning_stats.csv: Metrics such as walking speed, number of rewards, etc.imgs/: A folder with around 48 plots covering various metrics.
- Compare outputs to expected
- Download the reference outputs from this Figshare dataset.
- Compare each file (log, CSV, images) to confirm the results match.
A regression test checks whether the latest code matches prior reference outputs:
python scripts/regression_test.pyIt:
- Parses log/CSV commands from past runs
- Executes them on your current code
- Compares generated outputs (logs, CSVs, images) with the reference data
- Prints any discrepancies
Logs like __analyze_13F02.log or __analyze_blind.log and CSVs like learning_stats_alt.csv or learning_stats_blind.csv are checked, and images in test-imgs/key.json are compared via binary checks.
Thank you for using nvsl-analysis! If you have any feedback or questions, please open an issue on GitHub or email the authors.