Export generalizations #1387

samuelbray32 · 2025-08-27T22:21:21Z

Export Improvements

Running the export pipeline for publication I came across two common barriers not previously handled:

Compound restrictions: Table1 & (Table2 & key)
Excessively long keys

This PR solves both with the following:

In _log_fetch:
If either case applies:
- fetch the entry keys of the restricted table
- chunk into groups of entries that make a restriction shorter than the 2048 limit
- make an entry in ExportSelection.Table for each of these chunks.

I think this minimizes the adjustments users need to make to their code for exporting, while preserving the query ability in the ExportSelection entries

Other

Fixes Formatting of restriction string in _log_fetch_nwb #1390
Fixes Logging restrictions on projected tables #1392
- If a projected table is restricted, leading to a _log_fetch call:
  - Apply restrictions to projected table
  - undo the projection, bringing column names back to their original values
  - fetch keys in the original heading namespace and log these entries using the new methods above
Resolves Globally restrict export to a defined set of nwb_file_name's #1393
- Adds new intersect method for restrGraph (callable via graph1 & graph2)
- During export allows user to provide a list of nwb_files_included
  - These are used to make a restriction graph cascading down from Nwbfile and limited to these files
  - This is intersected with the restriction graph generated by ExportSelection to get the vertical database slice only dependent on the selected nwb files
- Helps prevent inclusion of unintended files captured during the Export Selection process due to non-specific restrictions
Misc.
- Enable multiprocessing during unpacking of linked files
- Condense restrictions for all ExportSelection.Table entries for a given table prior to creating restrGraph
  - For large project number of leaves that need cascaded goes from ~50k to ~50.

Dandi Export Improvements

Fixes partial New Dandi requirement: source_script_file_name #1400
- Add steps in update_analysis_for_dandi_standard to check for missing source_script_file_path and add rdefault value
Fixes partial Dandi Requirement: no use of float 16 #1403
- Adds h5py methods to scan files for dandi-invalid float-16 data types and remake the hdmf dataset object as float32
Allow multiprocessing for many of the slow files checks and operations

In progress

validation error due to link to Probe Device. See [Bug]: PYNWB_VALIDATION when using ndx-miniscope extension NeurodataWithoutBorders/pynwb#1777

Checklist:

If this PR should be accompanied by a release, I have updated the CITATION.cff
If this PR edits table definitions, I have included an alter snippet for release notes.
If this PR makes changes to position, I ran the relevant tests locally.
If this PR makes user-facing changes, I have added/edited docs/notebooks to reflect the changes
I have updated the CHANGELOG.md with PR number and description.

samuelbray32 · 2025-08-27T22:26:39Z

@CBroz1 I'm leaving this as a draft while I try it out, but would appreciate if you see any issues in the approach when you have time

codecov · 2025-08-27T23:13:48Z

Codecov Report

❌ Patch coverage is 48.71795% with 180 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.09%. Comparing base (7d39224) to head (11d7349).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
src/spyglass/utils/h5py_helper_fn.py	13.58%	70 Missing ⚠️
src/spyglass/utils/dj_helper_fn.py	10.16%	53 Missing ⚠️
src/spyglass/common/common_dandi.py	6.45%	29 Missing ⚠️
src/spyglass/common/common_usage.py	83.58%	11 Missing ⚠️
src/spyglass/utils/mixins/export.py	82.00%	9 Missing ⚠️
src/spyglass/utils/dj_graph.py	87.30%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1387      +/-   ##
==========================================
+ Coverage   69.79%   70.09%   +0.30%     
==========================================
  Files         104      105       +1     
  Lines       12728    12917     +189     
==========================================
+ Hits         8883     9054     +171     
- Misses       3845     3863      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CBroz1

Thanks for putting this together! Seems on track, I just had some questions about subqueries and chunking

CBroz1 · 2025-08-28T14:17:41Z