-
Notifications
You must be signed in to change notification settings - Fork 52
Add AGENTS.md for LLM guidance on Spyglass standards and conventions #1407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Copilot
wants to merge
2
commits into
master
Choose a base branch
from
copilot/fix-1406
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,235 @@ | ||
| # AGENTS.md - Spyglass Standards for LLM Agents | ||
|
|
||
| This document provides essential information for Large Language Models (LLMs) | ||
| and AI agents working with the Spyglass neuroscience data analysis framework. | ||
| It outlines key standards, patterns, and conventions to ensure consistent and | ||
| correct assistance when helping developers create new Spyglass tables, | ||
| pipelines, and analyses. | ||
|
|
||
| ## Overview | ||
|
|
||
| Spyglass is a neuroscience data analysis framework that: | ||
| - Uses **DataJoint** for reproducible data pipeline management | ||
| - Stores data in **NWB (Neurodata Without Borders)** format | ||
| - Provides standardized pipelines for common neuroscience analyses | ||
| - Ensures reproducible research through automated dependency tracking | ||
| - Supports collaborative data sharing and analysis | ||
|
|
||
| ## Core Architecture | ||
|
|
||
| ### DataJoint Foundation | ||
| - All tables inherit from DataJoint table types: `Manual`, `Lookup`, | ||
| `Computed`, `Imported` | ||
| - Tables are organized into **schemas** (MySQL databases) by topic | ||
| - Primary keys establish relationships and dependencies between tables | ||
| - The `@schema` decorator associates tables with their schema | ||
|
|
||
| ### Spyglass Mixin | ||
| - **All Spyglass tables MUST inherit from `SpyglassMixin`** | ||
| - Import: `from spyglass.utils import SpyglassMixin` | ||
| - Provides Spyglass-specific functionality like NWB file handling and | ||
| permission checks | ||
| - Standard inheritance pattern: `class MyTable(SpyglassMixin, dj.TableType)` | ||
|
|
||
| ### Schema Organization | ||
| ```python | ||
| import datajoint as dj | ||
| from spyglass.utils import SpyglassMixin | ||
|
|
||
| schema = dj.schema("module_name") # e.g., "common_ephys", "spikesorting" | ||
|
|
||
| @schema | ||
| class TableName(SpyglassMixin, dj.TableType): | ||
| definition = """ | ||
| # Table description | ||
| primary_key: datatype | ||
| --- | ||
| secondary_key: datatype | ||
| """ | ||
| ``` | ||
|
|
||
| ## Table Types and Patterns | ||
|
|
||
| ### 1. Parameters Tables | ||
| Store analysis parameters for reproducible computations. | ||
|
|
||
| **Conventions:** | ||
| - Name ends with `Parameters` or `Params` | ||
| - Usually `dj.Lookup` (for predefined values) or `dj.Manual` | ||
| - Primary key: `{pipeline}_params_name: varchar(32)` | ||
| - Parameters stored as: `{pipeline}_params: blob` (Python dictionary) | ||
|
|
||
| **Example:** | ||
| ```python | ||
| @schema | ||
| class MyAnalysisParameters(SpyglassMixin, dj.Lookup): | ||
| definition = """ | ||
| analysis_params_name: varchar(32) | ||
| --- | ||
| analysis_params: blob | ||
| """ | ||
|
|
||
| contents = [ | ||
| ["default", {"threshold": 0.5, "window_size": 100}], | ||
| ["strict", {"threshold": 0.8, "window_size": 50}], | ||
| ] | ||
|
|
||
| @classmethod | ||
| def insert_default(cls): | ||
| cls().insert(rows=cls.contents, skip_duplicates=True) | ||
| ``` | ||
|
|
||
| ### 2. Selection Tables | ||
| Pair data with parameters for analysis. | ||
|
|
||
| **Conventions:** | ||
| - Name ends with `Selection` | ||
| - Usually `dj.Manual` | ||
| - Foreign keys to data source and parameters tables | ||
| - Sets up what will be computed | ||
|
|
||
| **Example:** | ||
| ```python | ||
| @schema | ||
| class MyAnalysisSelection(SpyglassMixin, dj.Manual): | ||
| definition = """ | ||
| -> DataSourceTable | ||
| -> MyAnalysisParameters | ||
| --- | ||
| """ | ||
| ``` | ||
|
|
||
| ### 3. Computed Tables | ||
| Store analysis results. | ||
|
|
||
| **Conventions:** | ||
| - Usually `dj.Computed` | ||
| - Foreign key to corresponding Selection table | ||
| - Include `analysis_file_name` from `AnalysisNwbfile` | ||
| - Implement `make()` method for computation | ||
|
|
||
| **Example:** | ||
| ```python | ||
| @schema | ||
| class MyAnalysisOutput(SpyglassMixin, dj.Computed): | ||
| definition = """ | ||
| -> MyAnalysisSelection | ||
| --- | ||
| -> AnalysisNwbfile | ||
| """ | ||
|
|
||
| def make(self, key): | ||
| # Get parameters and data | ||
| params = (MyAnalysisParameters & key).fetch1("analysis_params") | ||
|
|
||
| # Perform analysis | ||
| result = perform_analysis(data, **params) | ||
|
|
||
| # Save to NWB file and get analysis_file_name | ||
| analysis_file_name = create_analysis_nwb_file(result, key) | ||
|
|
||
| # Insert result | ||
| self.insert1({**key, "analysis_file_name": analysis_file_name}) | ||
| ``` | ||
|
|
||
| ### 4. NWB Ingestion Tables | ||
| Import data from NWB files. | ||
|
|
||
| **Conventions:** | ||
| - Usually `dj.Imported` | ||
| - Include `object_id` to track NWB objects | ||
| - Implement `make()` and `fetch_nwb()` methods | ||
|
|
||
| ### 5. Merge Tables | ||
| Combine outputs from different pipeline versions. | ||
|
|
||
| **Conventions:** | ||
| - Name ends with `Output` | ||
| - Inherit from custom `Merge` class | ||
| - Enable unified downstream processing | ||
|
|
||
| ## Development Standards | ||
|
|
||
| ### Code Organization | ||
| - **Modules**: Group related schemas (e.g., `common`, `spikesorting`, `lfp`) | ||
| - **Schemas**: Group tables by topic within modules | ||
| - **Common module**: Shared tables across pipelines (use sparingly) | ||
| - **Custom analysis**: Create your own schema for project-specific work | ||
|
|
||
| ### Naming Conventions | ||
| - **Classes**: PascalCase (e.g., `SpikeWaveforms`) | ||
| - **Schemas**: snake_case with module prefix (e.g., `spikesorting_recording`) | ||
| - **Variables**: snake_case | ||
| - **Primary keys**: descriptive names (e.g., `recording_id`, `sort_group_id`) | ||
|
|
||
| ### Required Methods | ||
| - **Parameters tables**: `insert_default()` classmethod for `dj.Lookup` | ||
| - **Computed tables**: `make(self, key)` method | ||
| - **NWB tables**: `make()` and `fetch_nwb()` methods | ||
|
|
||
| ### Documentation | ||
| - Use numpy-style docstrings | ||
| - Include clear table definition comments | ||
| - Document parameter meanings and expected formats | ||
|
|
||
| ## Common Workflows | ||
|
|
||
| ### Creating a New Pipeline | ||
| 1. Define Parameters table with default values | ||
| 2. Create Selection table linking data and parameters | ||
| 3. Implement Computed table with analysis logic | ||
| 4. Add Merge table if multiple pipeline versions exist | ||
|
|
||
| ### File Handling | ||
| - Use `AnalysisNwbfile` table for storing analysis results | ||
| - Follow NWB format standards | ||
| - Include proper metadata and provenance | ||
|
|
||
| ### Testing | ||
| - Test parameter insertion and defaults | ||
| - Verify selection table constraints | ||
| - Test computation pipeline end-to-end | ||
| - Check NWB file creation and retrieval | ||
|
|
||
| ## Best Practices for LLM Assistance | ||
|
|
||
| ### When Creating Tables: | ||
| 1. **Always inherit from SpyglassMixin first**: | ||
| `class MyTable(SpyglassMixin, dj.TableType)` | ||
| 2. **Use appropriate table types**: Lookup for parameters, Manual for | ||
| selections, Computed for analysis | ||
| 3. **Follow naming conventions**: Clear, descriptive names with appropriate | ||
| suffixes | ||
| 4. **Include proper foreign keys**: Establish clear data dependencies | ||
| 5. **Add comprehensive docstrings**: Explain purpose and usage | ||
|
|
||
| ### When Suggesting Code: | ||
| 1. **Check existing patterns**: Look at similar tables in the same module | ||
| 2. **Verify schema imports**: Ensure proper schema declaration and imports | ||
| 3. **Include error handling**: Especially for file operations and computations | ||
| 4. **Consider permissions**: Use SpyglassMixin methods for data access control | ||
| 5. **Follow dependency order**: Parameters → Selection → Computed → Merge | ||
|
|
||
| ### Common Mistakes to Avoid: | ||
| - Missing SpyglassMixin inheritance | ||
| - Incorrect table type selection | ||
| - Missing @schema decorator | ||
| - Improper foreign key relationships | ||
| - Not implementing required methods (make, insert_default) | ||
| - Inconsistent naming conventions | ||
|
|
||
| ## Key Imports and Dependencies | ||
| ```python | ||
| import datajoint as dj | ||
| from spyglass.utils import SpyglassMixin, logger | ||
| from spyglass.common import AnalysisNwbfile, IntervalList | ||
| ``` | ||
|
|
||
| ## Resources | ||
| - [Developer Documentation](docs/src/ForDevelopers/) | ||
| - [Table Types Guide](docs/src/ForDevelopers/TableTypes.md) | ||
| - [Schema Design](docs/src/ForDevelopers/Schema.md) | ||
| - [Contributing Guide](docs/src/ForDevelopers/Contribute.md) | ||
|
|
||
| This document should be referenced when providing assistance with Spyglass | ||
| development to ensure consistency with established patterns and standards. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrate the following information using prompting best practices:
Table-Joining Protocol
When connecting two tables:
.describe()or.headingto view primary keys (PK) and foreign keys (FK)..parents()/.children()..fetch(limit=1, as_dict=True).Knowledge Base
CORE API - use these commands to access data in Spyglass
&(restriction)(Session & {'nwb_file_name': 'j1620210710_.nwb'}).fetch1().proj()(Session * Subject).proj(session_date='session_start_time').fetch(limit=3).fetch()/.fetch1()(IntervalList & key).fetch('valid_times', limit=2).describe()PositionOutput.describe().aggr()Spikes.aggr(IntervalList, n='count(*)').fetch(limit=1)dj.U()task_or_sleep = dj.U((IntervalList & 'interval_list_name="task"'), (IntervalList & 'interval_list_name="sleep"'))dj.AndList()files = Session & dj.AndList([{'nwb_file_name': 'fileA.nwb'}, {'nwb_file_name': 'fileB.nwb'}]).headingSession.heading<<(up-stream)restrict_by(direction="up")).PositionOutput() << "nwb_file_name = 'j1620210710_.nwb'">>(down-stream)restrict_by(direction="down")).Session() >> 'trodes_pos_params_name="default"'.restrict_by()"up"/"down".PositionOutput().restrict_by("nwb_file_name = 'j1620210710_.nwb'", direction="up").merge_restrict()PositionOutput.merge_restrict().merge_fetch()PositionOutput.merge_fetch({'nwb_file_name': 'j1620210710_.nwb'}).fetch_nwb()h5py.File.(LFPOutput & part_key).fetch_nwb().fetch1_dataframe()pandas.DataFrame.(PositionOutput & part_key).fetch1_dataframe().fetch_pose_dataframe()(PositionOutput & part_key).fetch_pose_dataframe(bodypart='nose')fetch_resultsresults = (DecodingOutput & part_key).fetch_results()get_restricted_merge_idsids = SpikeSortingOutput.get_restricted_merge_ids(key)Spyglass — Common-schema “must-know” tables & key fields
nwb_file_nameanalysis_file_abs_pathnwb_file_namenwb_file_name,lab_member_namenwb_file_name,data_acquisition_device_namenwb_file_name,interval_list_namesubject_idlab_member_nameteam_nameinstitution_nameSession.institution_name.lab_nameSession.lab_name.data_acquisition_device_namecamera_nameprobe_typeprobe_idregion_idnwb_file_name,electrode_group_namenwb_file_name,electrode_idnwb_file_name,interval_list_name,raw_object_idfilter_name,filter_sampling_ratenwb_file_name,interval_list_namenwb_file_name,sensor_data_object_iddio_event_nametask_namenwb_file_name,epochTask,CameraDevice, andIntervalListmerge_idTrodesPosV1,DLCPosV1.merge_idLFPV1.merge_idCurationV1.nwb_file_name,sorted_spikes_group_name,unit_filter_params_namelfp_merge_id,filter_name,filter_sampling_rate,nwb_file_name,target_interval_list_name,lfp_band_sampling_rate,group_name,ripple_param_name,pos_merge_idmua_param_name,nwb_file_name,unit_filter_params_name,sorted_spikes_group_name,pos_merge_id,detection_intervalmerge_idClusterlessDecodingV1,SortedSpikesDecodingV1.nwb_file_name,pose_group_namekachery_zone_name,analysis_file_nameMini Glossary (symbols tutor should recognize)
Output; single, versioned endpoint for downstream analysis. A ‘master’ table with a DataJoint ‘part’ table connected to the endpoint of each available pipeline e.g.LFPOutputGroupthat groups rows for another table for easy usage e.g.SortedSpikesGroupgroups a set of spike sorting analyses.<<(up-stream),>>(down-stream) operators that filter based on attributes several joins away.fetch_nwb– returns anh5py.File-like NWBFile; auto-detects raw vs analysis files.fetch1_dataframe– returns apandas.DataFramefor the first matching row.Database Exploration (for tutor use)
For tables outside the knowledge base, you can internally use:
table.children(as_objects=bool): returns a list of all tables (or table names) with a foreign key reference totabletable.parents(as_objects=bool): returns a list of all upstream tables (or table names) on whichtabledepends on through a foreign key referencetable.heading: return the list of keys defining the entries of a table.dj.FreeTable(dj.conn(), full_table_name): returns a table object corresponding to the database tablefull_table_name. Allows access to table operations without requiring access to the python class.Common Linkers (memorize!)
(“via” = join through these)
Schema pre-fixes
The Spyglass pipeline is organized into schemas, each with a specific focus. Here are the main schemas and their purposes:
common_*– contains common data structures and utilities used across the pipeline, such as raw electrophysiology data.lfp_*– contains tables related to local field potentials (LFP) analysisposition_*– contains tables related to position data analysis, including raw and processed position data.spikesorting_*– contains tables related to spike sorting, including raw spike datadecoding_*– contains tables related to decoding analyses, such as decoding neural activityposition_linearization_*– contains tables related to linearizing position data for analysis along a trackripple_*– contains tables related to ripple analysis, which is often used in conjunction with LFP data.Merge tables
Analogy: PositionOutput (any *Output) is a Table-of-Contents; each part table is a chapter, and merge_id is the bookmark.
Discover chapters: dir(PositionOutput) → pick one, then PositionOutput.TrodesPosV1.describe() to see required keys.
Build your search: key = {"nwb_file_name": "...", ...}.
Get the bookmark: merge_key = PositionOutput.merge_get_part(key).fetch1("KEY"); fetch data with (PositionOutput & merge_key).fetch1_dataframe().
Never type merge_id by hand—lookup with merge_get_part (or get_restricted_merge_ids) and preview everything with merge_view().
merge_view(): "To quickly peek at the combined data from all pipelines and see what columns are available, usemerge_view()."get_restricted_merge_ids(): "For some tables likeSpikeSortingOutput, there are powerful shortcuts that do the lookup for you.get_restricted_merge_ids(key)can often replace steps 3 and 4."Quick examples for grabbing data (for tutor use)