Skip to content

Commit c49b580

Browse files
rlycalderast
andauthored
Update testing to download test data and run on GitHub Actions (calderast#13)
* Add small test data files to repo, update test data scripts * Add download script and github action to run tests * Add scikit-learn to deps for photometry preproc * Move downloaded data folder --------- Co-authored-by: Stephanie Crater <[email protected]>
1 parent 06cbbf9 commit c49b580

File tree

17 files changed

+649
-46
lines changed

17 files changed

+649
-46
lines changed
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
name: Test building package and publish
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- maint/*
8+
tags:
9+
- "*"
10+
pull_request:
11+
branches:
12+
- main
13+
- maint/*
14+
defaults:
15+
run:
16+
shell: bash
17+
jobs:
18+
build:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
with:
23+
fetch-depth: 0
24+
- uses: actions/setup-python@v5
25+
with:
26+
python-version: 3
27+
- run: pip install --upgrade build twine
28+
- name: Build sdist and wheel
29+
run: python -m build
30+
- run: twine check dist/*
31+
- name: Upload sdist and wheel artifacts
32+
uses: actions/upload-artifact@v4
33+
with:
34+
name: dist
35+
path: dist/
36+
- name: Build git archive
37+
run: mkdir archive && git archive -v -o archive/archive.tgz HEAD
38+
- name: Upload git archive artifact
39+
uses: actions/upload-artifact@v4
40+
with:
41+
name: archive
42+
path: archive/
43+
test-package:
44+
runs-on: ubuntu-latest
45+
needs: [build]
46+
strategy:
47+
matrix:
48+
package: ['wheel', 'sdist', 'archive', 'editable']
49+
steps:
50+
- name: Download sdist and wheel artifacts
51+
if: matrix.package != 'archive'
52+
uses: actions/download-artifact@v4
53+
with:
54+
name: dist
55+
path: dist/
56+
- name: Download git archive artifact
57+
if: matrix.package == 'archive'
58+
uses: actions/download-artifact@v4
59+
with:
60+
name: archive
61+
path: archive/
62+
- name: Checkout repo
63+
if: matrix.package == 'editable'
64+
uses: actions/checkout@v4
65+
with:
66+
fetch-depth: 0
67+
- uses: actions/setup-python@v5
68+
with:
69+
python-version: "3.12"
70+
- name: Display Python version
71+
run: python -c "import sys; print(sys.version)"
72+
- name: Update pip
73+
run: pip install --upgrade pip
74+
- name: Install wheel
75+
if: matrix.package == 'wheel'
76+
run: pip install dist/*.whl
77+
- name: Install sdist
78+
if: matrix.package == 'sdist'
79+
run: pip install dist/*.tar.gz
80+
- name: Install archive
81+
if: matrix.package == 'archive'
82+
run: pip install archive/archive.tgz
83+
- name: Install editable
84+
if: matrix.package == 'editable'
85+
run: pip install -e .
86+
- name: Install test extras
87+
run: pip install .[test]
88+
- name: Download test data
89+
env:
90+
BOX_USERNAME: ${{ secrets.BOX_USERNAME }}
91+
BOX_PASSWORD: ${{ secrets.BOX_PASSWORD }}
92+
run: |
93+
python tests/download_test_data.py
94+
tree tests/test_data
95+
- name: Run tests without coverage
96+
if: matrix.package != 'editable'
97+
run: pytest -v jdb_to_nwb
98+
- name: Run tests on editable install with coverage
99+
if: matrix.package == 'editable'
100+
run: pytest --cov=src --cov-report=xml -v jdb_to_nwb
101+
- name: Upload coverage reports to Codecov
102+
if: matrix.package == 'editable'
103+
uses: codecov/codecov-action@v5
104+
env:
105+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
106+
# pypi-publish:
107+
# name: Upload release to PyPI
108+
# runs-on: ubuntu-latest
109+
# needs: [test-package]
110+
# environment:
111+
# name: pypi
112+
# url: https://pypi.org/p/jdb-to-nwb
113+
# permissions:
114+
# id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
115+
# if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
116+
# steps:
117+
# - uses: actions/download-artifact@v4
118+
# with:
119+
# name: dist
120+
# path: dist/
121+
# - name: Publish package distributions to PyPI
122+
# uses: pypa/gh-action-pypi-publish@release/v1

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,8 @@ _version.py
169169

170170
# Large test data
171171
tests/test_data/photometry/*
172-
tests/test_data/raw_ephys/*
173-
tests/test_data/processed_ephys/*
174172
tests/test_data/behavior/IM-1478*
173+
tests/test_data/downloaded/*
175174

175+
# Box credentials
176+
.env

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,38 @@ cp tests/metadata_full.yaml .
2626
jdb_to_nwb metadata_full.yaml out.nwb
2727
```
2828

29+
## Downloading test data
30+
31+
The large test data files are stored in a shared UCSF Box account. To get access to the test data,
32+
please contact the repo maintainers.
33+
34+
Create a new file called `.env` in the root directory of the repository and add your Box credentials:
35+
```bash
36+
BOX_USERNAME=<your_box_username>
37+
BOX_PASSWORD=<your_box_password>
38+
```
39+
Or set the environment variables in your shell:
40+
```bash
41+
export BOX_USERNAME=<your_box_username>
42+
export BOX_PASSWORD=<your_box_password>
43+
```
44+
45+
Then run the download script:
46+
```bash
47+
python tests/download_test_data.py
48+
```
49+
50+
Notes:
51+
- Run `python tests/test_data/create_raw_ephys_test_data.py` to re-create the test data for `raw_ephys`.
52+
- Run `python tests/test_data/create_processed_ephys_test_data.py` to re-create the test data for `processed_ephys`.
53+
- `tests/test_data/processed_ephys/impedance.csv` was manually created for testing purposes.
54+
- `tests/test_data/processed_ephys/geom.csv` was manually created for testing purposes.
55+
- Some files (`settings.xml`, `structure.oebin`) nested within `tests/test_data/raw_ephys/2022-07-25_15-30-00`
56+
were manually created for testing purposes.
57+
58+
The GitHub Actions workflow (`.github/workflows/test_package_build.yml`) will automatically download the test data and run the tests.
59+
60+
2961
## Versioning
3062

3163
Versioning is handled automatically using [hatch-vcs](https://github.com/ofek/hatch-vcs) using the latest

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,11 @@ classifiers = [
2525
dependencies = [
2626
"spikeinterface >= 0.101.0",
2727
"tqdm",
28-
"neuroconv == 0.6.0",
28+
"neuroconv == 0.6.5",
2929
"pynwb >= 2.8.1",
3030
"ndx_fiber_photometry",
3131
"ndx_franklab_novela",
32+
"scikit-learn",
3233
]
3334
dynamic = ["version"]
3435

@@ -40,6 +41,7 @@ dev = [
4041
"ruff",
4142
"codespell",
4243
]
44+
test = ["pytest", "pytest-cov"]
4345

4446
[project.urls]
4547
"Homepage" = "https://github.com/calderast/jdb_to_nwb/"

tests/create_spike_test_data.py renamed to tests/create_processed_ephys_test_data.py

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,34 +3,39 @@
33

44
# SpikeInterface can read this format easily.
55

6-
# In Tim's data, the MDA file contains 145 units and 3,040,410 spikes.
6+
# In Tim's data for IM-1478/2022-07-25_15-30-00, the `firings.mda` file contains 145 units and 3,040,410 spikes.
77

88
# To create test data of a reasonable size, we will trim the spike times to only those in the first 30,000 samples.
99
# This results in 63 units and 462 spikes.
1010

11-
# To run this script, copy Tim's MountainSort output file "firing.mda" to "../data/ephys/mntsort_output/firings.mda"
12-
# or change the paths in this script to point to the location of Tim's data.
11+
# To run this script, copy Tim's MountainSort output file `firings.mda` for IM-1478/2022-07-25_15-30-00
12+
# to your computer and adjust the path in this script to point to the location of the data on your computer.
13+
14+
# The `firings.mda` file should be 72,969,860 bytes. The test checks specific properties of the file generated by this
15+
# script.
1316

1417
# Then run this script from the command line from the root of the repo:
15-
# python tests/test_data/create_spike_test_data.py
18+
# python tests/test_data/create_processed_ephys_test_data.py
1619

1720
from pathlib import Path
1821

1922
from spikeinterface.extractors import read_mda_sorting, MdaSortingExtractor
2023

21-
# Create a new directory to store the trimmed data
22-
new_data_dir = Path("./tests/test_data/processed_ephys")
23-
new_data_dir.mkdir(parents=True, exist_ok=True)
24-
output_file_path = new_data_dir / "firings.mda"
24+
# NOTE: Adjust this path to point to the location of Tim's sorted data for IM-1478/2022-07-25_15-30-00
25+
firings_mda_file_path = Path("/Users/rly/Documents/NWB/berke-lab-to-nwb/data/ephys/mntsort_output/firings.mda")
26+
sampling_frequency = 30_000
2527

2628
# Read the .mda file
27-
firings_mda_file_path = Path("../data/ephys/mntsort_output/firings.mda")
28-
sampling_frequency = 30_000
2929
sorting = read_mda_sorting(firings_mda_file_path, sampling_frequency=sampling_frequency)
3030

3131
# Trim the spike times to only those in the first 30,000 samples
3232
sorting_trimmed = sorting.frame_slice(start_frame=0, end_frame=30_000)
3333

34+
# Create a new directory to store the trimmed data
35+
new_data_dir = Path("./tests/test_data/processed_ephys")
36+
new_data_dir.mkdir(parents=True, exist_ok=True)
37+
output_file_path = new_data_dir / "firings.mda"
38+
3439
# Write the trimmed spike sorting data to a new .mda file
3540
MdaSortingExtractor.write_sorting(sorting=sorting_trimmed, save_path=output_file_path)
3641

tests/create_raw_ephys_test_data.py

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,32 @@
33

44
# Each continuous directory contains the following files:
55

6-
# - continuous.dat: A simple binary file containing N channels x M samples 16-bit integers in little-endian format.
6+
# - `continuous.dat`: A simple binary file containing N channels x M samples 16-bit integers in little-endian format.
77
# Data is saved as ch1_samp1, ch2_samp1, ... chN_samp1, ch1_samp2, ch2_samp2, ..., chN_sampM. The value of the
88
# least significant bit needed to convert the 16-bit integers to physical units is specified in the bitVolts
99
# field of the relevant channel in the structure.oebin JSON file. For “headstage” channels, multiplying by
1010
# bitVolts converts the values to microvolts, whereas for “ADC” channels, bitVolts converts the values to volts.
1111

12-
# - timestamps.npy: A numpy array containing M 64-bit integers that represent the index of each sample in the
12+
# - `timestamps.npy`: A numpy array containing M 64-bit integers that represent the index of each sample in the
1313
# .dat file since the start of acquisition.
1414

1515
# We could use SpikeInterface to read this data, but manipulating the data is easier with numpy since the data
1616
# is a flat binary file.
1717

18-
# In Tim's data, the continuous.dat file contains 264 channels. The first 256 channels are the headstage (neural)
18+
# In Tim's data, the `continuous.dat` file contains 264 channels. The first 256 channels are the headstage (neural)
1919
# channels, and the last 8 channels are the ADC channels.
2020

21-
# The structure.oebin JSON file and settings.xml contains metadata for the recording.
21+
# The `structure.oebin` JSON file and `settings.xml` contains metadata for the recording.
2222

2323
# To create test data of a reasonable size, we will trim the existing data and timestamps to 30,000 samples
2424
# (one second of data) and 6 channels and save it to a new directory.
2525

26-
# We will manually edit the structure.oebin JSON file to remove the events and TTL channels and extra headstage
27-
# and ADC channels. We will also manually edit the settings.xml file to remove the events and TTL channels and
26+
# We will manually edit the `structure.oebin` JSON file to remove the events and TTL channels and extra headstage
27+
# and ADC channels. We will also manually edit the `settings.xml` file to remove the events and TTL channels and
2828
# extra headstage and ADC channels.
2929

30-
# To run this script, copy Tim's ephys data directory "2022-07-25_15-30-00" and place it in "../data"
31-
# or change the paths in this script to point to the location of Tim's data.
30+
# To run this script, copy Tim's open ephys data directory for IM-1478/2022-07-25_15-30-00 to your computer
31+
# and adjust the paths in this script to point to the location of the data on your computer.
3232

3333
# Then run this script from the command line from the root of the repo:
3434
# python tests/test_data/create_raw_ephys_test_data.py
@@ -37,23 +37,24 @@
3737

3838
import numpy as np
3939

40-
# Create a new directory to store the trimmed data
41-
new_data_root = Path("./tests/test_data/raw_ephys")
42-
new_data_dir = new_data_root / "2022-07-25_15-30-00/experiment1/recording1/continuous/Rhythm_FPGA-100.0"
43-
new_data_dir.mkdir(parents=True, exist_ok=True)
40+
# NOTE: Adjust this path to point to the location of Tim's sorted data for IM-1478/2022-07-25_15-30-00
41+
open_ephys_data_root = Path("/Users/rly/Documents/NWB/berke-lab-to-nwb/data/2022-07-25_15-30-00")
42+
continuous_dat_file_path = open_ephys_data_root / "experiment1/recording1/continuous/Rhythm_FPGA-100.0/continuous.dat"
43+
timestamps_file_path = open_ephys_data_root / "experiment1/recording1/continuous/Rhythm_FPGA-100.0/timestamps.npy"
4444

4545
# Set the properties of the source data and parameters for the trimmed data
4646
num_channels = 264
4747
sampling_rate_in_hz = 30_000
48-
continuous_dat_file_path = (
49-
"../data/2022-07-25_15-30-00/experiment1/recording1/continuous/Rhythm_FPGA-100.0/continuous.dat"
50-
)
51-
timestamps_file_path = "../data/2022-07-25_15-30-00/experiment1/recording1/continuous/Rhythm_FPGA-100.0/timestamps.npy"
5248

5349
# Specify the number of seconds and channels of the original data to keep
5450
num_seconds_to_keep = 1.0
5551
num_channels_to_keep = 6
5652

53+
# Create a new directory to store the trimmed data
54+
new_data_root = Path("./tests/test_data/raw_ephys")
55+
new_data_dir = new_data_root / "2022-07-25_15-30-00/experiment1/recording1/continuous/Rhythm_FPGA-100.0"
56+
new_data_dir.mkdir(parents=True, exist_ok=True)
57+
5758
# Load the data from the continuous.dat file into a memory-mapped numpy array
5859
data = np.memmap(continuous_dat_file_path, dtype=np.int16, mode="r")
5960
assert len(data) % num_channels == 0, f"Data length is not divisible by num_channels: {num_channels}"

0 commit comments

Comments
 (0)