Skip to content
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
3f72eb0
Modernize pyproject.toml with comprehensive tooling configuration
edeno Sep 25, 2025
d121e93
Remove pre-commit dependency
edeno Sep 25, 2025
3c9478d
Fix ruff and mypy configuration issues
edeno Sep 25, 2025
1f73e92
Update Python version requirement to 3.10+ and auto-fix ruff issues
edeno Sep 25, 2025
ff91eb2
Add PLAN.md for systematic ruff issue fixing
edeno Sep 25, 2025
ced766b
Fix Priority 1: Critical ruff issues (7 fixes)
edeno Sep 25, 2025
f5e7888
Fix Priority 2: Auto-fix 37 code quality issues
edeno Sep 25, 2025
69182fb
Fix Priority 3: Additional style improvements (3 fixes)
edeno Sep 25, 2025
f401777
Fix formatting
edeno Sep 25, 2025
ebd3f12
Address GitHub PR review feedback
edeno Sep 25, 2025
ba7bc49
Update review plan - zip() uses safer strict=True default
edeno Sep 25, 2025
1126f42
Fix handling of empty nwb_hw_channel_order
edeno Sep 25, 2025
526931b
Improve docstrings and type annotations across modules
edeno Sep 26, 2025
5c3a4af
Standardize docstrings for improved clarity
edeno Sep 26, 2025
883c26b
Delete REVIEW_PLAN.md
edeno Sep 26, 2025
f431303
Use strict mode in zip for NWB creation loop
edeno Sep 26, 2025
dfa4834
Update src/trodes_to_nwb/tests/test_convert_analog.py
edeno Sep 26, 2025
24d457c
Update src/trodes_to_nwb/convert_position.py
edeno Sep 26, 2025
3803819
Update src/trodes_to_nwb/convert_position.py
edeno Sep 26, 2025
831052a
Enforce strict zip in add_dios channel mapping
edeno Sep 26, 2025
49e9788
Merge branch 'update-pyproject-toml' of https://github.com/LorenFrank…
edeno Sep 26, 2025
a9caf6d
Delete PLAN.md
edeno Sep 26, 2025
2e0e15c
Update based on code comments
edeno Sep 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Python package that converts SpikeGadgets .rec files (electrophysiology data) to NWB 2.0+ format. The conversion includes ephys data, position tracking, video files, DIO events, and behavioral metadata, with validation for DANDI archive compatibility.

## Development Setup Commands

**Environment Setup:**

```bash
mamba env create -f environment.yml
mamba activate trodes_to_nwb
pip install -e .
```

**Testing:**

```bash
pytest --cov=src --cov-report=xml --doctest-modules -v --pyargs trodes_to_nwb
```

**Linting:**

```bash
black .
```

**Build Package:**

```bash
python -m build
twine check dist/*
```

## Architecture

### Core Conversion Pipeline

The main conversion happens in `src/trodes_to_nwb/convert.py` with the `create_nwbs()` function which orchestrates:

1. **File Discovery** (`data_scanner.py`): Scans directories for .rec files and associated data files
2. **Metadata Loading** (`convert_yaml.py`): Loads and validates YAML metadata files
3. **Header Processing** (`convert_rec_header.py`): Extracts device configuration from .rec file headers
4. **Data Conversion**: Modular converters for different data types:
- `convert_ephys.py`: Raw electrophysiology data
- `convert_position.py`: Position tracking and video
- `convert_dios.py`: Digital I/O events
- `convert_analog.py`: Analog signals
- `convert_intervals.py`: Epoch and behavioral intervals
- `convert_optogenetics.py`: Optogenetic stimulation data

### File Structure Requirements

Input files must follow naming convention: `{YYYYMMDD}_{animal}_{epoch}_{tag}.{extension}`

Required files per session:

- `.rec`: Main recording file
- `{date}_{animal}.metadata.yml`: Session metadata
- Optional: `.h264`, `.videoPositionTracking`, `.cameraHWSync`, `.stateScriptLog`

### Metadata System

- Uses YAML metadata files validated against JSON schema (`nwb_schema.json`)
- Probe configurations stored in `device_metadata/probe_metadata/`
- Virus metadata in `device_metadata/virus_metadata/`
- See `docs/yaml_mapping.md` for complete metadata field mapping

### Key Data Processing

- Uses Neo library (`spike_gadgets_raw_io.py`) for .rec file I/O
- Implements chunked data loading (`RecFileDataChunkIterator`) for memory efficiency
- Parallel processing support via Dask for batch conversions
- NWB validation using nwbinspector after conversion

## Testing

- Unit tests in `src/trodes_to_nwb/tests/`
- Integration tests in `tests/integration-tests/`
- Test data downloaded from secure UCSF Box in CI
- Coverage reports uploaded to Codecov

## Release Process

1. Tag release commit (e.g. `v0.1.0`)
2. Push tag to GitHub (triggers PyPI upload)
3. Create GitHub release

## Important Notes

- Package supports Python >=3.8
- Requires `ffmpeg` for video conversion
- Uses hatch for build system with VCS-based versioning
- Main branch protected, requires PR reviews
78 changes: 78 additions & 0 deletions PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Ruff Issues Fix Plan

This document tracks the plan to fix the remaining 56 ruff issues (excluding notebook issues).

## 🔴 Priority 1: Critical Fixes (7 issues) - ✅ COMPLETED

### Immediate Action Required

- [x] **Mutable Default Argument** (`convert_ephys.py:42`)
- Change `nwb_hw_channel_order=[]` to `nwb_hw_channel_order=None`
- Add `if nwb_hw_channel_order is None: nwb_hw_channel_order = []` inside function

- [x] **Missing Raise Statements** (2 issues)
- `spike_gadgets_raw_io.py:170, 1210` - Add `raise` keyword before exception instantiation

- [x] **Exception Chaining** (`convert_position.py:134, 602`)
- Change `raise SomeException(...)` to `raise SomeException(...) from err`

- [x] **Top-Level Imports** (`convert_optogenetics.py` - 4 locations)
- Move `import` statements from inside functions to module top

## 🟡 Priority 2: Code Quality (25 issues) - ✅ COMPLETED

### Quick Wins - Auto-fixable patterns

- [x] **Dictionary/List Inefficiencies** (11 issues)
- Replace `key in dict.keys()` with `key in dict` (8 instances)
- Replace `dict()` with `{}` literals (2 instances)
- Replace list comprehension with set comprehension (1 instance)

- [x] **Logic Simplification** (6 issues)
- Use ternary operators for simple if/else blocks
- Use `.get()` method instead of if/else for dict access
- Replace `not a == b` with `a != b`

- [x] **Unused Variables** (6 issues)
- Remove unused assignments in tests
- Replace unused loop variables with `_`

- [x] **Unnecessary Comprehensions** (6 issues)
- Convert list comprehensions to generators where appropriate

## 🟠 Priority 3: Style & Performance (9 issues remaining) - PARTIALLY COMPLETED

### Consider for future refactoring

- [ ] **Magic Numbers** (`convert_position.py` - 4 instances)
- Extract constants: `MIN_TIMESTAMPS = 2`, `DEFAULT_TIMEOUT = 2000`, `MIN_TICKS = 100`
- **Note**: These are context-specific values that may be better left as literals

- [ ] **Memory Optimization** (`spike_gadgets_raw_io.py` - 4 instances)
- Replace `@lru_cache` with `@cached_property` or manual caching for methods
- **Note**: These require careful analysis to avoid breaking performance

- [x] **Variable Naming** (2 instances)
- Rename single-letter variables to descriptive names

- [x] **Other Improvements** (6 issues)
- Add stacklevel to warnings
- Use contextlib.suppress() for clean exception handling
- Remove unused imports

## Progress Tracking

**Total Issues**: 56 (excluding notebooks)

- **Fixed**: 47 (7 Priority 1 + 37 Priority 2 + 3 Priority 3)
- **Remaining**: 9 (4 magic numbers + 4 memory optimizations + 1 unused import)

**Estimated Timeline**:

- Phase 1 (Critical): 30 minutes
- Phase 2 (Quality): 45 minutes
- Phase 3 (Style): As needed during regular development

## Commit Strategy

Each priority level will be committed separately with detailed commit messages explaining the fixes applied.
9 changes: 5 additions & 4 deletions notebooks/explore_rec_file_neo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
" break\n",
"\n",
" if header_size is None:\n",
" ValueError(\"SpikeGadgets: the xml header does not contain '</Configuration>'\")\n",
" raise ValueError(\"SpikeGadgets: the xml header does not contain '</Configuration>'\")\n",
"\n",
" f.seek(0)\n",
" header_txt = f.read(header_size).decode('utf8')\n",
Expand Down Expand Up @@ -118,7 +118,7 @@
"# The raw data block consists of N packets.\n",
"# Each packet consists of:\n",
"# First byte is 0x55\n",
"# Some number of bytes for each device (e.g., Controller_DIO has 1 byte, \n",
"# Some number of bytes for each device (e.g., Controller_DIO has 1 byte,\n",
"# ECU has 32 bytes, Multiplexed has 8 bytes, SysClock has 8 bytes)\n",
"# Timestamp (uint32) which has 4 bytes\n",
"# Ephys data (int16) which has 2 * num_ephy_channels bytes\n",
Expand Down Expand Up @@ -182,6 +182,7 @@
"source": [
"# read the binary part lazily\n",
"import numpy as np\n",
"\n",
"raw_memmap = np.memmap(rec_file_path, mode='r', offset=header_size, dtype='<u1')\n",
"\n",
"num_packet = raw_memmap.size // packet_size\n",
Expand Down Expand Up @@ -325,10 +326,10 @@
"for device in hconf:\n",
" stream_id = device.attrib['name']\n",
" print(stream_id)\n",
" \n",
"\n",
" for channel in device:\n",
" print(channel.attrib)\n",
" \n",
"\n",
" if 'interleavedDataIDByte' in channel.attrib:\n",
" # TODO LATER: deal with \"headstageSensor\" which have interleaved\n",
" continue\n",
Expand Down
Loading