Skip to content

Improved benchmarks + NDIndex expanded sel options + better slicing with sorting#13

Open
ianhi wants to merge 20 commits intomainfrom
feature/performance-benchmarks
Open

Improved benchmarks + NDIndex expanded sel options + better slicing with sorting#13
ianhi wants to merge 20 commits intomainfrom
feature/performance-benchmarks

Conversation

@ianhi
Copy link
Copy Markdown
Owner

@ianhi ianhi commented Dec 20, 2025

No description provided.

@netlify
Copy link
Copy Markdown

netlify bot commented Dec 20, 2025

Deploy Preview for xarray-linked-indexes ready!

Name Link
🔨 Latest commit 6a2e7de
🔍 Latest deploy log https://app.netlify.com/projects/xarray-linked-indexes/deploys/6949d959e2db6b0008230fa7
😎 Deploy Preview https://deploy-preview-13--xarray-linked-indexes.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 20, 2025

Codecov Report

❌ Patch coverage is 99.33687% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.23%. Comparing base (d812baa) to head (6a2e7de).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/linked_indices/nd_index.py 95.57% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #13      +/-   ##
==========================================
+ Coverage   95.90%   99.23%   +3.33%     
==========================================
  Files           7       10       +3     
  Lines        1685     2486     +801     
  Branches      113      142      +29     
==========================================
+ Hits         1616     2467     +851     
+ Misses         36        4      -32     
+ Partials       33       15      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ianhi
Copy link
Copy Markdown
Owner Author

ianhi commented Dec 22, 2025

decent bit of claudeese that im still cleaning up here, but @keewis you might be interested in this https://deploy-preview-13--xarray-linked-indexes.netlify.app/alt-ndpoint/

which starts with a blob of moderately directionless human writing, followed by a fairly tight human explanation (whcih i think is right) of the difference to NDPointIndex

the other difference I realized is that if I have trials as a string I don't think I can reasonably use NDPointIndex here.

trying to figure out where to work in your great description of selecting a plane out of a surface

@keewis
Copy link
Copy Markdown

keewis commented Dec 23, 2025

NDPointIndex handles the case of allowing selecting by a distance metric for points distributed in N-dimensional space.

I would replace "by a distance metric" with "using a spatial data structure", as the point of the NDPointIndex is to use tree-like structures (like KDTrees or BallTrees) to accelerate searching in a generalized set of points. These data structures can have different query modes that often (but not always) need a metric to work. For example, the most common query mode, the nearest neighbour query, needs a metric, but a range query would instead perform a set of interval intersections.

Looking at your comparison table:

Coordinates	Multiple 2D coords that together define position	Single N-D coord with derived values

I'd cross out the "2D", the point is that you have n coords for n dimensions (even though xarray's implementation currently has a restriction on 2D coords).

For the other case you also have n coords for n dimensions, but implicitly: you have rows, columns, and the associated values (so three coords). This is very similar to how the COO sparse matrix encoding works, and my preferred real-world example is a digital elevation model that approximates the earth's surface using x/y/height coordinates. The search then neatly becomes a "isohypse" or, more generally, "contour lines" that you would see in maps and that mark cross sections of equal height (another example is "isobar"s, i.e. lines of equal pressure in weather maps).

@ianhi
Copy link
Copy Markdown
Owner Author

ianhi commented Dec 23, 2025

Notes from discussion with @dcherian:

DimensionInterval

  • normalize to a consistent representation (pd.IntervalIndex only?)
  • call it annotationindex? generalize to any pd.Index, not just pd.IntervalIndex?
  • set_xindex is not your only constructor option. Use classmethods
  • Might want to raise nice errors for other Xarray indexing modes.

lots of ways to construct intervals. see the various from methods here: https://pandas.pydata.org/docs/reference/api/pandas.IntervalIndex.html

can't just

NDIndex

see also pydata/xarray#3646

Ideas about how to represent the data in memory:

Make the user come to your packages internal datastructure, everything else can be thought of as encode/decode to disk. e.g. onset/duration is a way to store intervals, but do users think about things in terms of onset and duration or do they mentally convert to intervals to think things through. question for @prlabu: do you need to keep the the onset/duration in memory, or would being able ot select based on intervals be fine?

various class methods for creation examples: https://rasterix.readthedocs.io/en/latest/autoapi/rasterix/lib/index.html#rasterix.lib.affine_from_tiepoint_and_scale

for time-locking don't pollute the coords with lots of derived coordinates (more memory for a very cheap thing)
e.g. instead of creating event_locked_time have .sel(rel_time=EventLocked(speech_onset=slice(-0.5,1)))
e.g. https://rolodex.readthedocs.io/forecast.html#bestestimate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants