ianhi
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/alt-ndpoint.ipynb‎
Lines changed: 348 additions & 0 deletions b/‎docs/alt-ndpoint.ipynb‎
Lines changed: 348 additions & 0 deletions
@@ -39,3 +39,5 @@ htmlcov/
 # OS
 .DS_Store
 examples
+
+docs/expt
@@ -0,0 +1,348 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cell-0",
+   "metadata": {},
+   "source": [
+    "# NDPointIndex Approach\n",
+    "\n",
+    "xarray includes [`NDPointIndex`](https://xarray-indexes.readthedocs.io/blocks/ndpoint.html) for **unstructured point data** (e.g., irregular grids, scattered observations). It uses a KD-tree for spatial nearest-neighbor queries.\n",
+    "\n",
+    "This notebook explores whether `NDPointIndex` can solve the same problem as `NDIndex` for trial-based data with derived coordinates.\n",
+    "\n",
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import xarray as xr\n",
+    "from linked_indices.example_data import trial_based_dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-2",
+   "metadata": {},
+   "source": [
+    "## What NDPointIndex is designed for\n",
+    "\n",
+    "`NDPointIndex` is designed for **curvilinear grids** and **unstructured point clouds** where you have multiple coordinate variables that together define a point in N-dimensional space.\n",
+    "\n",
+    "The classic example is a 2D grid with latitude and longitude coordinates that vary in both dimensions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a curvilinear grid (like ocean model output)\n",
+    "# The lat/lon coordinates vary in BOTH dimensions\n",
+    "shape = (5, 10)\n",
+    "lon = xr.DataArray(np.random.uniform(-180, 180, size=shape), dims=(\"y\", \"x\"))\n",
+    "lat = xr.DataArray(np.random.uniform(-90, 90, size=shape), dims=(\"y\", \"x\"))\n",
+    "temperature = xr.DataArray(np.random.uniform(0, 30, size=shape), dims=(\"y\", \"x\"))\n",
+    "\n",
+    "ds_curvilinear = xr.Dataset(\n",
+    "    data_vars={\"temperature\": temperature}, coords={\"lon\": lon, \"lat\": lat}\n",
+    ")\n",
+    "ds_curvilinear"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Apply NDPointIndex - requires BOTH lon and lat together\n",
+    "ds_indexed = ds_curvilinear.set_xindex([\"lon\", \"lat\"], xr.indexes.NDPointIndex)\n",
+    "ds_indexed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we can query: \"Find the grid cell nearest to lat=45, lon=-120\"\n",
+    "# This is a SPATIAL query - both coordinates together define a point\n",
+    "ds_indexed.sel(lat=45.0, lon=-120.0, method=\"nearest\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-6",
+   "metadata": {},
+   "source": [
+    "## Trying NDPointIndex with trial-based data\n",
+    "\n",
+    "Now let's see what happens when we try to use `NDPointIndex` with our trial-based dataset where we have a single 2D `abs_time` coordinate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = trial_based_dataset(mode=\"stacked\").drop_vars(\"trial_onset\")\n",
+    "print(ds)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-8",
+   "metadata": {},
+   "source": [
+    "### Problem 1: NDPointIndex requires matching number of variables and dimensions\n",
+    "\n",
+    "`NDPointIndex` expects one coordinate variable per dimension. Our `abs_time` is a single 2D variable, not two 1D variables that define points in 2D space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This fails! NDPointIndex expects 2 variables for 2 dimensions\n",
+    "try:\n",
+    "    ds.set_xindex([\"abs_time\"], xr.indexes.NDPointIndex)\n",
+    "except ValueError as e:\n",
+    "    print(f\"ValueError: {e}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-10",
+   "metadata": {},
+   "source": [
+    "### Why this matters\n",
+    "\n",
+    "The fundamental difference is:\n",
+    "\n",
+    "| Aspect | NDPointIndex | NDIndex |\n",
+    "|--------|--------------|----------|\n",
+    "| **Coordinates** | Multiple 2D coords that together define position | Single N-D coord with derived values |\n",
+    "| **Query type** | Spatial: \"find point at (x, y)\" | Value: \"find cell where value ≈ target\" |\n",
+    "| **Use case** | Curvilinear grids, scattered observations | Structured arrays with computed coordinates |\n",
+    "\n",
+    "**NDPointIndex** answers: \"Which grid cell is nearest to coordinates (lat=45, lon=-120)?\"\n",
+    "\n",
+    "**NDIndex** answers: \"Which (trial, time) cell has `abs_time` closest to 7.5?\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-11",
+   "metadata": {},
+   "source": [
+    "### Could we reshape the data to use NDPointIndex?\n",
+    "\n",
+    "One might try to flatten the data and treat `(trial, rel_time)` as coordinate dimensions for NDPointIndex. Let's see what that looks like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-12",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Flatten the dataset to 1D\n",
+    "ds_flat = ds.stack(point=(\"trial\", \"rel_time\"))\n",
+    "print(f\"Original shape: {dict(ds.sizes)}\")\n",
+    "print(f\"Flattened shape: {dict(ds_flat.sizes)}\")\n",
+    "ds_flat"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create separate coordinate arrays for trial index and rel_time\n",
+    "# to use with NDPointIndex\n",
+    "trial_idx = xr.DataArray(np.repeat(np.arange(3), 500), dims=[\"point\"])\n",
+    "rel_time_flat = xr.DataArray(np.tile(ds.rel_time.values, 3), dims=[\"point\"])\n",
+    "\n",
+    "ds_for_ndpoint = xr.Dataset(\n",
+    "    data_vars={\"data\": ([\"point\"], ds_flat.data.values)},\n",
+    "    coords={\n",
+    "        \"trial_idx\": trial_idx,\n",
+    "        \"rel_time_flat\": rel_time_flat,\n",
+    "        \"abs_time\": ([\"point\"], ds_flat.abs_time.values),\n",
+    "    },\n",
+    ")\n",
+    "ds_for_ndpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-14",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we could apply NDPointIndex with trial_idx and rel_time_flat\n",
+    "ds_ndpoint = ds_for_ndpoint.set_xindex(\n",
+    "    [\"trial_idx\", \"rel_time_flat\"], xr.indexes.NDPointIndex\n",
+    ")\n",
+    "ds_ndpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-15",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Query: find point nearest to trial_idx=1, rel_time=2.5\n",
+    "result = ds_ndpoint.sel(trial_idx=1, rel_time_flat=2.5, method=\"nearest\")\n",
+    "print(\n",
+    "    f\"Found point at trial_idx={result.trial_idx.item()}, rel_time={result.rel_time_flat.item():.2f}\"\n",
+    ")\n",
+    "print(f\"abs_time at this point: {result.abs_time.item():.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-16",
+   "metadata": {},
+   "source": [
+    "### But this doesn't solve our problem!\n",
+    "\n",
+    "With this approach:\n",
+    "1. **We can't select by `abs_time` directly** - NDPointIndex uses the indexed coordinates (trial_idx, rel_time_flat), not derived values like abs_time\n",
+    "2. **We lose the structured array** - the data is now 1D instead of (trial, rel_time)\n",
+    "3. **We lose trial labels** - trial_idx is numeric, not the original string labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We CANNOT do this - abs_time is not an indexed coordinate:\n",
+    "try:\n",
+    "    ds_ndpoint.sel(abs_time=7.5, method=\"nearest\")\n",
+    "except KeyError as e:\n",
+    "    print(f\"KeyError: {e}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-18",
+   "metadata": {},
+   "source": [
+    "### Could we use abs_time with KDTree directly?\n",
+    "\n",
+    "Another approach might be to build a KDTree on abs_time values directly. But scipy's KDTree expects points in N-dimensional space, not scalar lookups:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from scipy.spatial import KDTree\n",
+    "\n",
+    "# KDTree expects (n_points, n_dims) array\n",
+    "# Our abs_time is shape (3, 500) = 1500 scalar values\n",
+    "# Reshaping to (1500, 1) treats each value as a 1D point\n",
+    "abs_time_flat = ds.abs_time.values.ravel().reshape(-1, 1)\n",
+    "tree = KDTree(abs_time_flat)\n",
+    "\n",
+    "# Query for abs_time ≈ 7.5\n",
+    "distance, flat_idx = tree.query([[7.5]])\n",
+    "trial_idx = flat_idx[0] // 500\n",
+    "time_idx = flat_idx[0] % 500\n",
+    "\n",
+    "print(\n",
+    "    f\"Found: trial={ds.trial.values[trial_idx]}, rel_time={ds.rel_time.values[time_idx]:.2f}\"\n",
+    ")\n",
+    "print(f\"abs_time at this point: {ds.abs_time.values[trial_idx, time_idx]:.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-20",
+   "metadata": {},
+   "source": [
+    "This works, but:\n",
+    "1. It's not integrated with xarray's indexing system\n",
+    "2. You have to manually convert between flat indices and (trial, time) indices\n",
+    "3. It doesn't support slices or other advanced indexing\n",
+    "4. The data structure is lost\n",
+    "\n",
+    "**This is essentially what `NDIndex` does internally, but with proper xarray integration.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-21",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "| Feature | NDPointIndex | NDIndex |\n",
+    "|---------|--------------|----------|\n",
+    "| **Use case** | Unstructured point clouds, curvilinear grids | Structured arrays with derived coordinates |\n",
+    "| **Query type** | Spatial: find nearest (x, y) point | Value: find cell where `abs_time ≈ 7.5` |\n",
+    "| **Coordinates** | Multiple N-D coords (one per dimension) | Single N-D coord with computed values |\n",
+    "| **Data structure** | Points in N-D coordinate space | N-D array of scalar values |\n",
+    "| **Returns** | Single nearest point | Dimensional slices |\n",
+    "| **Slice support** | No | Yes (bounding box) |\n",
+    "\n",
+    "`NDPointIndex` and `NDIndex` solve different problems:\n",
+    "\n",
+    "```python\n",
+    "# NDPointIndex: \"Find the grid cell nearest to lat=45.2, lon=-122.5\"\n",
+    "ds.sel(lat=45.2, lon=-122.5, method=\"nearest\")  # Spatial query\n",
+    "\n",
+    "# NDIndex: \"Find which (trial, time) has abs_time closest to 7.5\"\n",
+    "ds.sel(abs_time=7.5, method=\"nearest\")  # Value lookup in N-D array\n",
+    "```\n",
+    "\n",
+    "Use `NDPointIndex` when your coordinates define positions in space (or similar multi-dimensional coordinate systems).\n",
+    "\n",
+    "Use `NDIndex` when you have derived coordinates computed from dimension coordinates (like `abs_time = trial_onset + rel_time`)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}