LorenFrankLab · edeno · Sep 26, 2025 · Sep 25, 2025 · Sep 25, 2025 · Sep 25, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,97 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+This is a Python package that converts SpikeGadgets .rec files (electrophysiology data) to NWB 2.0+ format. The conversion includes ephys data, position tracking, video files, DIO events, and behavioral metadata, with validation for DANDI archive compatibility.
+
+## Development Setup Commands
+
+**Environment Setup:**
+
+```bash
+mamba env create -f environment.yml
+mamba activate trodes_to_nwb
+pip install -e .
+```
+
+**Testing:**
+
+```bash
+pytest --cov=src --cov-report=xml --doctest-modules -v --pyargs trodes_to_nwb
+```
+
+**Linting:**
+
+```bash
+black .
+```
+
+**Build Package:**
+
+```bash
+python -m build
+twine check dist/*
+```
+
+## Architecture
+
+### Core Conversion Pipeline
+
+The main conversion happens in `src/trodes_to_nwb/convert.py` with the `create_nwbs()` function which orchestrates:
+
+1. **File Discovery** (`data_scanner.py`): Scans directories for .rec files and associated data files
+2. **Metadata Loading** (`convert_yaml.py`): Loads and validates YAML metadata files
+3. **Header Processing** (`convert_rec_header.py`): Extracts device configuration from .rec file headers
+4. **Data Conversion**: Modular converters for different data types:
+   - `convert_ephys.py`: Raw electrophysiology data
+   - `convert_position.py`: Position tracking and video
+   - `convert_dios.py`: Digital I/O events
+   - `convert_analog.py`: Analog signals
+   - `convert_intervals.py`: Epoch and behavioral intervals
+   - `convert_optogenetics.py`: Optogenetic stimulation data
+
+### File Structure Requirements
+
+Input files must follow naming convention: `{YYYYMMDD}_{animal}_{epoch}_{tag}.{extension}`
+
+Required files per session:
+
+- `.rec`: Main recording file
+- `{date}_{animal}.metadata.yml`: Session metadata
+- Optional: `.h264`, `.videoPositionTracking`, `.cameraHWSync`, `.stateScriptLog`
+
+### Metadata System
+
+- Uses YAML metadata files validated against JSON schema (`nwb_schema.json`)
+- Probe configurations stored in `device_metadata/probe_metadata/`
+- Virus metadata in `device_metadata/virus_metadata/`
+- See `docs/yaml_mapping.md` for complete metadata field mapping
+
+### Key Data Processing
+
+- Uses Neo library (`spike_gadgets_raw_io.py`) for .rec file I/O
+- Implements chunked data loading (`RecFileDataChunkIterator`) for memory efficiency
+- Parallel processing support via Dask for batch conversions
+- NWB validation using nwbinspector after conversion
+
+## Testing
+
+- Unit tests in `src/trodes_to_nwb/tests/`
+- Integration tests in `tests/integration-tests/`
+- Test data downloaded from secure UCSF Box in CI
+- Coverage reports uploaded to Codecov
+
+## Release Process
+
+1. Tag release commit (e.g. `v0.1.0`)
+2. Push tag to GitHub (triggers PyPI upload)
+3. Create GitHub release
+
+## Important Notes
+
+- Package supports Python >=3.8
+- Requires `ffmpeg` for video conversion
+- Uses hatch for build system with VCS-based versioning
+- Main branch protected, requires PR reviews
diff --git a/PLAN.md b/PLAN.md
@@ -0,0 +1,78 @@
+# Ruff Issues Fix Plan
+
+This document tracks the plan to fix the remaining 56 ruff issues (excluding notebook issues).
+
+## 🔴 Priority 1: Critical Fixes (7 issues) - ✅ COMPLETED
+
+### Immediate Action Required
+
+- [x] **Mutable Default Argument** (`convert_ephys.py:42`)
+  - Change `nwb_hw_channel_order=[]` to `nwb_hw_channel_order=None`
+  - Add `if nwb_hw_channel_order is None: nwb_hw_channel_order = []` inside function
+
+- [x] **Missing Raise Statements** (2 issues)
+  - `spike_gadgets_raw_io.py:170, 1210` - Add `raise` keyword before exception instantiation
+
+- [x] **Exception Chaining** (`convert_position.py:134, 602`)
+  - Change `raise SomeException(...)` to `raise SomeException(...) from err`
+
+- [x] **Top-Level Imports** (`convert_optogenetics.py` - 4 locations)
+  - Move `import` statements from inside functions to module top
+
+## 🟡 Priority 2: Code Quality (25 issues) - ✅ COMPLETED
+
+### Quick Wins - Auto-fixable patterns
+
+- [x] **Dictionary/List Inefficiencies** (11 issues)
+  - Replace `key in dict.keys()` with `key in dict` (8 instances)
+  - Replace `dict()` with `{}` literals (2 instances)
+  - Replace list comprehension with set comprehension (1 instance)
+
+- [x] **Logic Simplification** (6 issues)
+  - Use ternary operators for simple if/else blocks
+  - Use `.get()` method instead of if/else for dict access
+  - Replace `not a == b` with `a != b`
+
+- [x] **Unused Variables** (6 issues)
+  - Remove unused assignments in tests
+  - Replace unused loop variables with `_`
+
+- [x] **Unnecessary Comprehensions** (6 issues)
+  - Convert list comprehensions to generators where appropriate
+
+## 🟠 Priority 3: Style & Performance (9 issues remaining) - PARTIALLY COMPLETED
+
+### Consider for future refactoring
+
+- [ ] **Magic Numbers** (`convert_position.py` - 4 instances)
+  - Extract constants: `MIN_TIMESTAMPS = 2`, `DEFAULT_TIMEOUT = 2000`, `MIN_TICKS = 100`
+  - **Note**: These are context-specific values that may be better left as literals
+
+- [ ] **Memory Optimization** (`spike_gadgets_raw_io.py` - 4 instances)
+  - Replace `@lru_cache` with `@cached_property` or manual caching for methods
+  - **Note**: These require careful analysis to avoid breaking performance
+
+- [x] **Variable Naming** (2 instances)
+  - Rename single-letter variables to descriptive names
+
+- [x] **Other Improvements** (6 issues)
+  - Add stacklevel to warnings
+  - Use contextlib.suppress() for clean exception handling
+  - Remove unused imports
+
+## Progress Tracking
+
+**Total Issues**: 56 (excluding notebooks)
+
+- **Fixed**: 47 (7 Priority 1 + 37 Priority 2 + 3 Priority 3)
+- **Remaining**: 9 (4 magic numbers + 4 memory optimizations + 1 unused import)
+
+**Estimated Timeline**:
+
+- Phase 1 (Critical): 30 minutes
+- Phase 2 (Quality): 45 minutes
+- Phase 3 (Style): As needed during regular development
+
+## Commit Strategy
+
+Each priority level will be committed separately with detailed commit messages explaining the fixes applied.
diff --git a/notebooks/explore_rec_file_neo.ipynb b/notebooks/explore_rec_file_neo.ipynb
@@ -60,7 +60,7 @@
     "            break\n",
     "\n",
     "    if header_size is None:\n",
-    "        ValueError(\"SpikeGadgets: the xml header does not contain '</Configuration>'\")\n",
+    "        raise ValueError(\"SpikeGadgets: the xml header does not contain '</Configuration>'\")\n",
     "\n",
     "    f.seek(0)\n",
     "    header_txt = f.read(header_size).decode('utf8')\n",
@@ -118,7 +118,7 @@
     "# The raw data block consists of N packets.\n",
     "# Each packet consists of:\n",
     "# First byte is 0x55\n",
-    "# Some number of bytes for each device (e.g., Controller_DIO has 1 byte, \n",
+    "# Some number of bytes for each device (e.g., Controller_DIO has 1 byte,\n",
     "# ECU has 32 bytes, Multiplexed has 8 bytes, SysClock has 8 bytes)\n",
     "# Timestamp (uint32) which has 4 bytes\n",
     "# Ephys data (int16) which has 2 * num_ephy_channels bytes\n",
@@ -182,6 +182,7 @@
    "source": [
     "# read the binary part lazily\n",
     "import numpy as np\n",
+    "\n",
     "raw_memmap = np.memmap(rec_file_path, mode='r', offset=header_size, dtype='<u1')\n",
     "\n",
     "num_packet = raw_memmap.size // packet_size\n",
@@ -325,10 +326,10 @@
     "for device in hconf:\n",
     "    stream_id = device.attrib['name']\n",
     "    print(stream_id)\n",
-    "    \n",
+    "\n",
     "    for channel in device:\n",
     "        print(channel.attrib)\n",
-    "        \n",
+    "\n",
     "        if 'interleavedDataIDByte' in channel.attrib:\n",
     "            # TODO LATER: deal with \"headstageSensor\" which have interleaved\n",
     "            continue\n",