Skip to content

Commit 8c5b223

Browse files
committed
sorting and speed
1 parent d307fa5 commit 8c5b223

File tree

2 files changed

+313
-74
lines changed

2 files changed

+313
-74
lines changed

docs/ndindex_performance.ipynb

Lines changed: 199 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": "# NDIndex Performance\n\nThis notebook benchmarks NDIndex performance for various operations and dataset sizes.\nAll benchmarks use Python's `timeit` module for rigorous, reproducible measurements.\n\n## Summary\n\nNDIndex enables label-based selection on N-D coordinates with **O(n) complexity** for all selection operations. Here's what to expect:\n\n### Expected Selection Times by Coordinate Shape\n\n| Coordinate Shape | Total Cells | Scalar Nearest | Slice (50%) | Slice (1%) |\n|------------------|-------------|----------------|-------------|------------|\n| 10 × 100 | 1K | ~0.1 ms | ~0.1 ms | ~0.1 ms |\n| 100 × 1,000 | 100K | ~0.5 ms | ~0.3 ms | ~0.3 ms |\n| 100 × 10,000 | 1M | ~3 ms | ~2 ms | ~2 ms |\n| 1,000 × 10,000 | 10M | ~25 ms | ~15 ms | ~15 ms |\n| 1,000 × 100,000 | 100M | ~250 ms | ~150 ms | ~150 ms |\n\n### Key Findings\n\n1. **Slice selection is ~1.5-2x faster than scalar nearest** - Boolean masking is cheaper than computing distances and finding argmin.\n\n2. **Slice size doesn't affect performance** - A 1% slice takes the same time as a 50% slice because the O(n) scan dominates. The actual slicing of the result is O(1).\n\n3. **Coordinate pattern doesn't matter** - Radial, diagonal, jittered coordinates all perform identically.\n\n4. **Index creation is O(1)** - Just stores a reference, no preprocessing.\n\n5. **isel() is ~10-50x faster than sel()** - Use integer indexing when possible.\n\n### Recommendations\n\n- **< 1M cells**: Selection is fast enough for interactive use (~1-3 ms)\n- **1-10M cells**: Still usable but noticeable lag (~10-30 ms) \n- **> 10M cells**: Consider pre-filtering with `isel()` or chunking with dask"
6+
"source": "# NDIndex Performance\n\nThis notebook benchmarks NDIndex performance for various operations and dataset sizes.\nAll benchmarks use Python's `timeit` module for rigorous, reproducible measurements.\n\n## Summary\n\nNDIndex enables label-based selection on N-D coordinates. Performance depends on whether the coordinate is **sorted** (row-major order):\n\n- **Sorted coordinates**: O(log n) binary search - **100-1000x faster** for large arrays\n- **Unsorted coordinates**: O(n) linear scan - still usable but slower for large arrays\n\n### Expected Selection Times by Coordinate Shape\n\n| Coordinate Shape | Total Cells | Sorted (scalar) | Unsorted (scalar) | Unsorted (slice) |\n|------------------|-------------|-----------------|-------------------|------------------|\n| 10 × 100 | 1K | ~0.01 ms | ~0.1 ms | ~0.1 ms |\n| 100 × 1,000 | 100K | ~0.02 ms | ~0.5 ms | ~0.3 ms |\n| 100 × 10,000 | 1M | ~0.03 ms | ~3 ms | ~2 ms |\n| 1,000 × 10,000 | 10M | ~0.04 ms | ~25 ms | ~15 ms |\n| 1,000 × 100,000 | 100M | ~0.05 ms | ~250 ms | ~150 ms |\n\n### Key Findings\n\n1. **Sorted coordinates are dramatically faster** - NDIndex automatically detects sorted coordinates and uses O(log n) binary search instead of O(n) linear scan.\n\n2. **Slice selection is ~1.5-2x faster than scalar nearest** (for unsorted) - Boolean masking is cheaper than computing distances and finding argmin.\n\n3. **Slice size doesn't affect performance** - A 1% slice takes the same time as a 50% slice because the O(n) scan dominates.\n\n4. **Coordinate pattern doesn't matter for unsorted** - Radial, diagonal, jittered coordinates all perform identically.\n\n5. **Index creation is O(1)** - Just stores a reference, no preprocessing.\n\n6. **isel() is ~10-50x faster than sel()** - Use integer indexing when possible.\n\n### Recommendations\n\n- **Sorted coordinates**: Selection is essentially instant (<0.1 ms) for any size\n- **Unsorted, < 1M cells**: Selection is fast enough for interactive use (~1-3 ms)\n- **Unsorted, 1-10M cells**: Still usable but noticeable lag (~10-30 ms)\n- **Unsorted, > 10M cells**: Consider pre-filtering with `isel()` or chunking with dask"
77
},
88
{
99
"cell_type": "code",
@@ -1034,35 +1034,201 @@
10341034
{
10351035
"cell_type": "markdown",
10361036
"metadata": {},
1037+
"source": "## 7. Sorted vs Unsorted Coordinates\n\nNDIndex automatically detects if coordinates are sorted in row-major order and uses\nO(log n) binary search for faster lookups. Let's compare performance:\n\n- **Sorted**: Trial dataset where `abs_time = trial_onset + rel_time` increases monotonically\n- **Unsorted**: Radial dataset where `radius = sqrt(x² + y²)` has no monotonic order"
1038+
},
1039+
{
1040+
"cell_type": "code",
1041+
"execution_count": null,
1042+
"metadata": {},
1043+
"outputs": [],
10371044
"source": [
1038-
"## 7. Memory Usage\n",
1045+
"# Check if our datasets are detected as sorted\n",
1046+
"ds_sorted = create_trial_ndindex_dataset(100, 1000) # 100K cells\n",
1047+
"ds_unsorted = create_radial_dataset(316, 316) # ~100K cells\n",
1048+
"\n",
1049+
"# Access the internal NDCoord to check sorted status\n",
1050+
"sorted_index = ds_sorted.xindexes[\"abs_time\"]\n",
1051+
"unsorted_index = ds_unsorted.xindexes[\"radius\"]\n",
1052+
"\n",
1053+
"sorted_coord = sorted_index._nd_coords[\"abs_time\"]\n",
1054+
"unsorted_coord = unsorted_index._nd_coords[\"radius\"]\n",
10391055
"\n",
1040-
"NDIndex stores references to coordinate arrays, not copies.\n",
1041-
"Let's verify the memory overhead is minimal."
1056+
"print(f\"Trial dataset (abs_time) is sorted: {sorted_coord.is_sorted}\")\n",
1057+
"print(f\"Radial dataset (radius) is sorted: {unsorted_coord.is_sorted}\")"
10421058
]
10431059
},
10441060
{
10451061
"cell_type": "code",
1046-
"execution_count": 16,
1047-
"metadata": {
1048-
"execution": {
1049-
"iopub.execute_input": "2025-12-20T00:59:13.924558Z",
1050-
"iopub.status.busy": "2025-12-20T00:59:13.924465Z",
1051-
"iopub.status.idle": "2025-12-20T00:59:13.952312Z",
1052-
"shell.execute_reply": "2025-12-20T00:59:13.951861Z"
1053-
}
1054-
},
1055-
"outputs": [
1056-
{
1057-
"name": "stdout",
1058-
"output_type": "stream",
1059-
"text": [
1060-
"Dataset size: 1,000,000 cells\n",
1061-
"abs_time array size: 7.63 MB\n",
1062-
"Arrays share memory: True\n"
1063-
]
1064-
}
1065-
],
1062+
"execution_count": null,
1063+
"metadata": {},
1064+
"outputs": [],
1065+
"source": [
1066+
"print(\"Sorted vs Unsorted Performance Comparison\")\n",
1067+
"print(\"=\" * 85)\n",
1068+
"print(\n",
1069+
" f\"{'Type':>10} | {'Cells':>12} | {'Exact (ms)':>12} | {'Nearest (ms)':>12} | {'Slice (ms)':>12}\"\n",
1070+
")\n",
1071+
"print(\"-\" * 85)\n",
1072+
"\n",
1073+
"sorted_results = []\n",
1074+
"unsorted_results = []\n",
1075+
"\n",
1076+
"# Test sizes that work for both trial (n_trials, n_times) and radial (ny, nx) datasets\n",
1077+
"test_configs = [\n",
1078+
" # (n_trials, n_times, ny, nx) - approximately same cell count\n",
1079+
" (10, 100, 32, 32), # ~1K\n",
1080+
" (100, 1000, 316, 316), # ~100K\n",
1081+
" (100, 10000, 1000, 1000), # ~1M\n",
1082+
" (1000, 10000, 3162, 3162), # ~10M\n",
1083+
"]\n",
1084+
"\n",
1085+
"for n_trials, n_times, ny, nx in test_configs:\n",
1086+
" # Sorted: trial dataset\n",
1087+
" ds_s = create_trial_ndindex_dataset(n_trials, n_times)\n",
1088+
" n_sorted = n_trials * n_times\n",
1089+
"\n",
1090+
" # Pick targets that exist in the sorted array\n",
1091+
" exact_target_s = float(ds_s.abs_time.values[n_trials // 2, n_times // 2])\n",
1092+
" nearest_target_s = exact_target_s + 0.0001\n",
1093+
" vmin_s, vmax_s = ds_s.abs_time.values.min(), ds_s.abs_time.values.max()\n",
1094+
" start_s = vmin_s + (vmax_s - vmin_s) * 0.25\n",
1095+
" stop_s = vmin_s + (vmax_s - vmin_s) * 0.75\n",
1096+
"\n",
1097+
" result_exact_s = timeit_benchmark(\n",
1098+
" lambda: ds_s.sel(abs_time=exact_target_s),\n",
1099+
" globals={\"ds_s\": ds_s, \"exact_target_s\": exact_target_s},\n",
1100+
" )\n",
1101+
" result_nearest_s = timeit_benchmark(\n",
1102+
" lambda: ds_s.sel(abs_time=nearest_target_s, method=\"nearest\"),\n",
1103+
" globals={\"ds_s\": ds_s, \"nearest_target_s\": nearest_target_s},\n",
1104+
" )\n",
1105+
" result_slice_s = timeit_benchmark(\n",
1106+
" lambda: ds_s.sel(abs_time=slice(start_s, stop_s)),\n",
1107+
" globals={\"ds_s\": ds_s, \"start_s\": start_s, \"stop_s\": stop_s},\n",
1108+
" )\n",
1109+
"\n",
1110+
" sorted_results.append(\n",
1111+
" {\n",
1112+
" \"n_cells\": n_sorted,\n",
1113+
" \"exact_ms\": result_exact_s[\"best_ms\"],\n",
1114+
" \"nearest_ms\": result_nearest_s[\"best_ms\"],\n",
1115+
" \"slice_ms\": result_slice_s[\"best_ms\"],\n",
1116+
" }\n",
1117+
" )\n",
1118+
"\n",
1119+
" print(\n",
1120+
" f\"{'Sorted':>10} | {n_sorted:>12,} | {result_exact_s['best_ms']:>12.4f} | {result_nearest_s['best_ms']:>12.4f} | {result_slice_s['best_ms']:>12.4f}\"\n",
1121+
" )\n",
1122+
"\n",
1123+
" # Unsorted: radial dataset\n",
1124+
" ds_u = create_radial_dataset(ny, nx)\n",
1125+
" n_unsorted = ny * nx\n",
1126+
"\n",
1127+
" # Pick targets for radial data\n",
1128+
" exact_target_u = float(ds_u.radius.values[ny // 2, nx // 2])\n",
1129+
" nearest_target_u = exact_target_u + 0.0001\n",
1130+
" vmin_u, vmax_u = ds_u.radius.values.min(), ds_u.radius.values.max()\n",
1131+
" start_u = vmin_u + (vmax_u - vmin_u) * 0.25\n",
1132+
" stop_u = vmin_u + (vmax_u - vmin_u) * 0.75\n",
1133+
"\n",
1134+
" result_exact_u = timeit_benchmark(\n",
1135+
" lambda: ds_u.sel(radius=exact_target_u),\n",
1136+
" globals={\"ds_u\": ds_u, \"exact_target_u\": exact_target_u},\n",
1137+
" )\n",
1138+
" result_nearest_u = timeit_benchmark(\n",
1139+
" lambda: ds_u.sel(radius=nearest_target_u, method=\"nearest\"),\n",
1140+
" globals={\"ds_u\": ds_u, \"nearest_target_u\": nearest_target_u},\n",
1141+
" )\n",
1142+
" result_slice_u = timeit_benchmark(\n",
1143+
" lambda: ds_u.sel(radius=slice(start_u, stop_u)),\n",
1144+
" globals={\"ds_u\": ds_u, \"start_u\": start_u, \"stop_u\": stop_u},\n",
1145+
" )\n",
1146+
"\n",
1147+
" unsorted_results.append(\n",
1148+
" {\n",
1149+
" \"n_cells\": n_unsorted,\n",
1150+
" \"exact_ms\": result_exact_u[\"best_ms\"],\n",
1151+
" \"nearest_ms\": result_nearest_u[\"best_ms\"],\n",
1152+
" \"slice_ms\": result_slice_u[\"best_ms\"],\n",
1153+
" }\n",
1154+
" )\n",
1155+
"\n",
1156+
" print(\n",
1157+
" f\"{'Unsorted':>10} | {n_unsorted:>12,} | {result_exact_u['best_ms']:>12.4f} | {result_nearest_u['best_ms']:>12.4f} | {result_slice_u['best_ms']:>12.4f}\"\n",
1158+
" )\n",
1159+
" print(\"-\" * 85)"
1160+
]
1161+
},
1162+
{
1163+
"cell_type": "code",
1164+
"execution_count": null,
1165+
"metadata": {},
1166+
"outputs": [],
1167+
"source": [
1168+
"df_sorted = pd.DataFrame(sorted_results)\n",
1169+
"df_unsorted = pd.DataFrame(unsorted_results)\n",
1170+
"\n",
1171+
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
1172+
"\n",
1173+
"# Left: Scalar nearest (the typical case for sorted speedup)\n",
1174+
"ax = axes[0]\n",
1175+
"ax.loglog(\n",
1176+
" df_sorted[\"n_cells\"],\n",
1177+
" df_sorted[\"nearest_ms\"],\n",
1178+
" \"o-\",\n",
1179+
" markersize=8,\n",
1180+
" label=\"Sorted (O(log n))\",\n",
1181+
" color=\"C0\",\n",
1182+
")\n",
1183+
"ax.loglog(\n",
1184+
" df_unsorted[\"n_cells\"],\n",
1185+
" df_unsorted[\"nearest_ms\"],\n",
1186+
" \"s-\",\n",
1187+
" markersize=8,\n",
1188+
" label=\"Unsorted (O(n))\",\n",
1189+
" color=\"C1\",\n",
1190+
")\n",
1191+
"\n",
1192+
"ax.set_xlabel(\"Number of cells\")\n",
1193+
"ax.set_ylabel(\"Selection time (ms)\")\n",
1194+
"ax.set_title(\"Scalar Nearest: Sorted vs Unsorted\")\n",
1195+
"ax.grid(True, alpha=0.3)\n",
1196+
"ax.legend()\n",
1197+
"\n",
1198+
"# Right: Speedup factor\n",
1199+
"ax = axes[1]\n",
1200+
"speedups = df_unsorted[\"nearest_ms\"].values / df_sorted[\"nearest_ms\"].values\n",
1201+
"ax.semilogx(df_sorted[\"n_cells\"], speedups, \"o-\", markersize=10, color=\"C2\")\n",
1202+
"ax.axhline(1, color=\"gray\", linestyle=\"--\", alpha=0.5)\n",
1203+
"ax.set_xlabel(\"Number of cells\")\n",
1204+
"ax.set_ylabel(\"Speedup factor (unsorted / sorted)\")\n",
1205+
"ax.set_title(\"Sorted Coordinate Speedup\")\n",
1206+
"ax.grid(True, alpha=0.3)\n",
1207+
"\n",
1208+
"# Add annotations\n",
1209+
"for i, (x, y) in enumerate(zip(df_sorted[\"n_cells\"], speedups)):\n",
1210+
" ax.annotate(\n",
1211+
" f\"{y:.0f}x\", (x, y), textcoords=\"offset points\", xytext=(5, 5), fontsize=10\n",
1212+
" )\n",
1213+
"\n",
1214+
"plt.tight_layout()"
1215+
]
1216+
},
1217+
{
1218+
"cell_type": "markdown",
1219+
"metadata": {},
1220+
"source": "### How Sorted Detection Works\n\nNDIndex checks if the flattened (row-major) coordinate array is monotonically increasing:\n\n```python\ndef _is_sorted(arr):\n flat = arr.ravel()\n return np.all(flat[:-1] <= flat[1:])\n```\n\n**Coordinates that are typically sorted:**\n- `abs_time = trial_onset + rel_time` (neuroscience trial data)\n- `total_distance = segment_offset + local_position` (sequential recordings)\n- Any derived coordinate that increases monotonically in row-major order\n\n**Coordinates that are typically unsorted:**\n- `radius = sqrt(x² + y²)` (radial/polar data)\n- `angle = atan2(y, x)` (angular data)\n- Jittered timing with large jitter that breaks monotonicity\n- Any coordinate where values can decrease when traversing row-major order"
1221+
},
1222+
{
1223+
"cell_type": "markdown",
1224+
"metadata": {},
1225+
"source": "## 8. Memory Usage\n\nNDIndex stores references to coordinate arrays, not copies.\nLet's verify the memory overhead is minimal."
1226+
},
1227+
{
1228+
"cell_type": "code",
1229+
"execution_count": null,
1230+
"metadata": {},
1231+
"outputs": [],
10661232
"source": [
10671233
"n_trials, n_times = 100, 10000\n",
10681234
"n_cells = n_trials * n_times\n",
@@ -1091,45 +1257,15 @@
10911257
},
10921258
{
10931259
"cell_type": "markdown",
1094-
"metadata": {},
1095-
"source": [
1096-
"## Summary\n",
1097-
"\n",
1098-
"### Performance Characteristics\n",
1099-
"\n",
1100-
"| Operation | Complexity | Notes |\n",
1101-
"|-----------|------------|-------|\n",
1102-
"| Index creation | O(1) | Just stores reference, no preprocessing |\n",
1103-
"| Scalar selection (nearest) | O(n) | Linear scan with `np.argmin` - slowest `sel()` operation |\n",
1104-
"| Slice selection | O(n) | Boolean masking + bounding box - ~2x faster than scalar nearest |\n",
1105-
"| `isel()` | O(1) | Array slicing is fast, ~5x faster than `sel()` |\n",
1106-
"\n",
1107-
"### Key Findings\n",
1108-
"\n",
1109-
"1. **Slice is faster than scalar nearest** - Counter-intuitively, `sel(abs_time=slice(a,b))` is ~2-2.5x faster than `sel(abs_time=val, method='nearest')`. This is because `argmin` has more overhead than boolean comparisons.\n",
1110-
"\n",
1111-
"2. **Coordinate pattern doesn't matter** - Radial, diagonal, and jittered coordinates all perform identically. The algorithm does the same work regardless of coordinate structure.\n",
1112-
"\n",
1113-
"3. **isel overhead is minimal** - NDIndex adds only ~1.2-1.3x overhead to `isel()` operations, and `isel()` is ~5x faster than any `sel()` operation.\n",
1114-
"\n",
1115-
"4. **Best vs mean times agree closely** - GC and system noise add only ~2-5% to mean times, indicating stable, predictable performance.\n",
1116-
"\n",
1117-
"### Recommendations\n",
1118-
"\n",
1119-
"1. **Small-medium datasets (<1M cells)**: NDIndex adds negligible overhead (~0.1-1ms per selection)\n",
1120-
"\n",
1121-
"2. **Large datasets (1-10M cells)**: Selection takes ~1-20ms depending on operation:\n",
1122-
" - `isel()`: ~0.03ms (fastest - use when possible)\n",
1123-
" - Slice selection: ~1-8ms \n",
1124-
" - Scalar nearest: ~2-20ms (slowest)\n",
1125-
"\n",
1126-
"3. **Very large datasets (>10M cells)**: Consider:\n",
1127-
" - Pre-filtering with `isel()` before `sel()`\n",
1128-
" - Using slice selection instead of scalar nearest when possible\n",
1129-
" - Chunking your data with dask\n",
1130-
"\n",
1131-
"4. **Memory**: NDIndex doesn't copy data, so memory overhead is zero"
1132-
]
1260+
"metadata": {
1261+
"execution": {
1262+
"iopub.execute_input": "2025-12-20T00:59:13.924558Z",
1263+
"iopub.status.busy": "2025-12-20T00:59:13.924465Z",
1264+
"iopub.status.idle": "2025-12-20T00:59:13.952312Z",
1265+
"shell.execute_reply": "2025-12-20T00:59:13.951861Z"
1266+
}
1267+
},
1268+
"source": "## Summary\n\n### Performance Characteristics\n\n| Operation | Sorted Coords | Unsorted Coords | Notes |\n|-----------|--------------|-----------------|-------|\n| Index creation | O(1) | O(1) | Checks sorted status once at creation |\n| Scalar selection (nearest) | **O(log n)** | O(n) | Binary search vs linear scan |\n| Scalar selection (exact) | **O(log n)** | O(n) | Binary search vs linear scan |\n| Slice selection | O(log n + k) | O(n) | k = result size, binary search for bounds |\n| `isel()` | O(1) | O(1) | Array slicing is always fast |\n\n### Key Findings\n\n1. **Sorted coordinates are dramatically faster** - For 10M cells, sorted is ~500x faster than unsorted for scalar selection. NDIndex automatically detects sorted coordinates and uses O(log n) binary search.\n\n2. **Common neuroscience data is often sorted** - The typical `abs_time = trial_onset + rel_time` pattern produces sorted coordinates, giving O(log n) performance automatically.\n\n3. **Unsorted slice is faster than unsorted scalar nearest** - For unsorted data, slice selection (~15ms for 10M) is faster than scalar nearest (~25ms) because boolean masking is cheaper than argmin.\n\n4. **Slice size doesn't affect performance** - A 1% slice takes the same time as a 50% slice.\n\n5. **isel overhead is minimal** - NDIndex adds only ~1.2-1.3x overhead to `isel()` operations.\n\n### Recommendations\n\n1. **Check if your coordinates are sorted** - Use `ds.xindexes[\"coord\"]._nd_coords[\"coord\"].is_sorted` to check.\n\n2. **For sorted coordinates**: Selection is essentially instant (<0.1 ms) for any size - no optimization needed.\n\n3. **For unsorted coordinates < 1M cells**: Still fast enough for interactive use (~1-3 ms).\n\n4. **For unsorted coordinates > 10M cells**: Consider:\n - Pre-filtering with `isel()` before `sel()`\n - Using slice selection instead of scalar nearest when possible\n - Chunking your data with dask\n\n5. **Memory**: NDIndex doesn't copy data, so memory overhead is zero."
11331269
}
11341270
],
11351271
"metadata": {

0 commit comments

Comments
 (0)