Skip to content

Commit 8d5bd26

Browse files
committed
doc improvements
1 parent 42d0a35 commit 8d5bd26

11 files changed

+2010
-573
lines changed

docs/abs_rel_time_example.ipynb

Lines changed: 1559 additions & 396 deletions
Large diffs are not rendered by default.

docs/alt-multiindex.ipynb

Lines changed: 55 additions & 76 deletions
Large diffs are not rendered by default.
293 KB
Loading
274 KB
Loading
380 KB
Loading

docs/index.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,27 @@ Custom xarray indexes for keeping multiple coordinates in sync across shared dim
1313
This library provides custom [xarray Index](https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html) implementations that automatically constrain related dimensions when you select on any one of them.
1414

1515

16+
### DimensionInterval
17+
18+
The DimensionInterval provides the ability to performantly store arbitrary intervals over a continuous coordinate. Like a multiindex but more generalized. See the [comparison with MultiIndex](alt-multiindex.ipynb) for an understanding of the comparison.
19+
20+
![diagram of possible sel calls for DimensionInterval](images/generic-intervals.png.excalidraw.png)
21+
22+
See the [Multi-Interval Example](multi_interval_example.ipynb) for a detailed walkthrough.
23+
24+
### AbsoluteRelative Index
25+
26+
Provides the ability to work with both absolute or relative coord (e.g. time) for trialed data.
27+
![diagram of two possible abs-rel indexes](images/abs-rel.png.excalidraw.png)
28+
29+
See the [Absolute vs Relative Time Example](abs_rel_time_example.ipynb) for a detailed walkthrough.
30+
31+
This then enables more advanced use cases such as building multiple time reference frames without having to shuffle the underlying data. This makes a task such as time-locking a low cost operation:
32+
33+
![diagram of timelocking](images/event-locking.png.excalidraw.png)
34+
35+
See the [Time-Locking Example](time-locking.ipynb) for a demonstration of event-locked analysis.
36+
1637
### Use Cases
1738

1839
- **Speech/audio data** with hierarchical annotations (words, phonemes, time)

docs/multi_interval_example.ipynb

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,7 @@
44
"cell_type": "markdown",
55
"id": "cell-0",
66
"metadata": {},
7-
"source": [
8-
"# Multi-Interval Index Example\n",
9-
"\n",
10-
"This notebook demonstrates how `DimensionInterval` enables automatic cross-slicing between multiple interval types over a shared continuous dimension.\n",
11-
"\n",
12-
"## Use Case: Speech Data\n",
13-
"\n",
14-
"Imagine you have speech data with:\n",
15-
"- A **continuous time dimension** (e.g., audio samples at regular intervals)\n",
16-
"- **Word intervals** - each word spans a range of time\n",
17-
"- **Phoneme intervals** - each phoneme spans a smaller range of time within words\n",
18-
"\n",
19-
"TODO: add an explanatory image\n",
20-
"\n",
21-
"\n",
22-
"When you select a specific word, you want the time and phoneme dimensions to automatically constrain to only the overlapping values. This is exactly what `DimensionInterval` provides."
23-
]
7+
"source": "# Multi-Interval Index Example\n\nThis notebook demonstrates how `DimensionInterval` enables automatic cross-slicing between multiple interval types over a shared continuous dimension.\n\n![Diagram of possible sel calls for DimensionInterval](images/generic-intervals.png.excalidraw.png)\n\nWhen you select a specific word, you want the time and phoneme dimensions to automatically constrain to only the overlapping values. This is exactly what `DimensionInterval` provides.\n\n::::{note}\nThere are two ways to encode intervals with `DimensionInterval`:\n1. **Pandas IntervalIndex** - Used in this notebook, intervals are encoded directly as `pd.IntervalIndex` objects\n2. **Onset/Duration format** - Intervals are specified as separate onset and duration coordinates, see the [Onset/Duration Example](onset_duration_example.ipynb)\n::::\n\n::::{seealso}\nFor a comparison of `DimensionInterval` with xarray's built-in `MultiIndex`, see the [MultiIndex Comparison](alt-multiindex.ipynb) notebook.\n::::"
248
},
259
{
2610
"cell_type": "code",
@@ -43,6 +27,20 @@
4327
"from linked_indices import DimensionInterval"
4428
]
4529
},
30+
{
31+
"cell_type": "markdown",
32+
"id": "271be299-c765-488f-804d-5322552e41e0",
33+
"metadata": {},
34+
"source": [
35+
"\n",
36+
"## Use Case: Speech Data\n",
37+
"\n",
38+
"Imagine you have speech data with:\n",
39+
"- A **continuous time dimension** (e.g., audio samples at regular intervals)\n",
40+
"- **Word intervals** - each word spans a range of time\n",
41+
"- **Phoneme intervals** - each phoneme spans a smaller range of time within words\n"
42+
]
43+
},
4644
{
4745
"cell_type": "markdown",
4846
"id": "cell-2",
@@ -5444,7 +5442,7 @@
54445442
],
54455443
"metadata": {
54465444
"kernelspec": {
5447-
"display_name": "Python 3",
5445+
"display_name": "Python 3 (ipykernel)",
54485446
"language": "python",
54495447
"name": "python3"
54505448
},
@@ -5463,4 +5461,4 @@
54635461
},
54645462
"nbformat": 4,
54655463
"nbformat_minor": 5
5466-
}
5464+
}

docs/onset_duration_example.ipynb

Lines changed: 73 additions & 46 deletions
Large diffs are not rendered by default.

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ dev = [
2424
"isort>=7.0.0",
2525
"jupyterlab>=4.5.0",
2626
"jupyterlab-code-formatter>=3.0.2",
27+
"jupyterlab-git>=0.51.3",
2728
"jupyterlab-myst>=2.4.2",
2829
"matplotlib>=3.10.7",
2930
"netcdf4>=1.7.3",
@@ -32,6 +33,7 @@ dev = [
3233
"pytest-cov>=7.0.0",
3334
"ruff>=0.14.8",
3435
"scipy>=1.16.3",
36+
"xarray-fancy-repr>=0.0.2",
3537
"zarr>=3.1.5",
3638
]
3739
docs = [

src/linked_indices/example_data.py

Lines changed: 99 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -512,39 +512,51 @@ def onset_duration_dataset() -> "xr.Dataset":
512512

513513

514514
def trial_based_dataset(
515-
n_trials: int = 5,
516-
trial_length: float = 10.0,
515+
n_trials: int = 3,
516+
trial_length: float = 5.0,
517517
sample_rate: int = 100,
518518
trial_labels: list[str] | None = None,
519519
seed: int | None = 42,
520+
mode: str = "stacked",
520521
) -> "xr.Dataset":
521522
"""
522523
Create a dataset with trial-based data and both absolute and relative time.
523524
524525
This is useful for testing AbsoluteRelativeIndex.
525526
526-
The dataset has dimensions (trial, rel_time) with:
527-
- rel_time: relative time within each trial (0 to trial_length)
528-
- trial: trial labels
529-
- abs_time: 2D coordinate (trial, rel_time) mapping to absolute time
527+
Supports two modes:
528+
- "stacked" (default): 2D array with dimensions (trial, rel_time). Each trial
529+
has the same relative time coordinates but different absolute time ranges.
530+
- "linear": 1D array with dimension (abs_time). All trials concatenated into
531+
a single continuous stream indexed by absolute time, with trial as a
532+
1D coordinate indicating which trial each timepoint belongs to.
533+
534+
By default, creates 3 trials with distinct waveforms:
535+
- Trial 1 ("cosine"): cosine wave
536+
- Trial 2 ("square"): square wave
537+
- Trial 3 ("sawtooth"): sawtooth wave
530538
531539
Parameters
532540
----------
533541
n_trials : int
534-
Number of trials. Default: 5
542+
Number of trials. Default: 3
535543
trial_length : float
536-
Duration of each trial in seconds. Default: 10.0
544+
Duration of each trial in seconds. Default: 5.0
537545
sample_rate : int
538546
Samples per second within each trial. Default: 100
539547
trial_labels : list[str] | None
540-
Labels for each trial. If None, uses ["trial_0", "trial_1", ...].
548+
Labels for each trial. If None, uses ["cosine", "square", "sawtooth"]
549+
for 3 trials, or ["trial_0", "trial_1", ...] for other counts.
541550
seed : int | None
542551
Random seed for reproducibility. None for random.
552+
mode : str
553+
Either "stacked" (2D with trial × rel_time) or "linear" (1D with abs_time).
554+
Default: "stacked"
543555
544556
Returns
545557
-------
546558
xr.Dataset
547-
Dataset with structure:
559+
For mode="stacked":
548560
Dimensions: (trial: n_trials, rel_time: trial_length * sample_rate)
549561
Coordinates:
550562
* trial (trial) str - trial labels
@@ -554,6 +566,16 @@ def trial_based_dataset(
554566
Data variables:
555567
data (trial, rel_time) float64 - simulated signal
556568
569+
For mode="linear":
570+
Dimensions: (abs_time: n_trials * trial_length * sample_rate)
571+
Coordinates:
572+
* abs_time (abs_time) float64 - absolute time
573+
rel_time (abs_time) float64 - relative time within each trial
574+
trial (abs_time) str - trial label for each timepoint
575+
trial_onset (abs_time) float64 - onset time of each trial
576+
Data variables:
577+
data (abs_time) float64 - simulated signal
578+
557579
Examples
558580
--------
559581
>>> from linked_indices.example_data import trial_based_dataset
@@ -566,8 +588,16 @@ def trial_based_dataset(
566588
0.0
567589
>>> float(ds.abs_time[1, 0]) # Second trial starts at t=5
568590
5.0
591+
592+
>>> ds_linear = trial_based_dataset(mode="linear")
593+
>>> dict(ds_linear.dims)
594+
{'abs_time': 1500}
569595
"""
570596
import xarray as xr
597+
from scipy import signal
598+
599+
if mode not in ("stacked", "linear"):
600+
raise ValueError(f"mode must be 'stacked' or 'linear', got '{mode}'")
571601

572602
if seed is not None:
573603
np.random.seed(seed)
@@ -578,7 +608,10 @@ def trial_based_dataset(
578608

579609
# Trial labels
580610
if trial_labels is None:
581-
trial_labels = [f"trial_{i}" for i in range(n_trials)]
611+
if n_trials == 3:
612+
trial_labels = ["cosine", "square", "sawtooth"]
613+
else:
614+
trial_labels = [f"trial_{i}" for i in range(n_trials)]
582615
elif len(trial_labels) != n_trials:
583616
raise ValueError(
584617
f"trial_labels length ({len(trial_labels)}) must match n_trials ({n_trials})"
@@ -587,25 +620,63 @@ def trial_based_dataset(
587620
# Trial onsets (absolute time when each trial starts)
588621
trial_onsets = np.arange(n_trials) * trial_length
589622

590-
# Absolute time is a 2D array: abs_time[trial, rel_time_idx] = trial_onset + rel_time
591-
abs_time_2d = trial_onsets[:, np.newaxis] + rel_times[np.newaxis, :]
623+
# Generate distinct waveforms for each trial
624+
freq = 0.5 # Base frequency in Hz
625+
data_2d = np.zeros((n_trials, n_samples))
592626

593-
# Generate different signal for each trial (sine, square, sawtooth, etc.)
594-
data = np.zeros((n_trials, n_samples))
595627
for i in range(n_trials):
596-
freq = 1.0 + i * 0.5 # Different frequency per trial
597-
phase = i * np.pi / 4 # Different phase
598-
data[i] = np.sin(2 * np.pi * freq * rel_times + phase)
599-
data[i] += 0.1 * np.random.randn(n_samples) # Add noise
628+
waveform_type = i % 3 # Cycle through cosine, square, sawtooth
629+
if waveform_type == 0:
630+
# Cosine wave
631+
data_2d[i] = np.cos(2 * np.pi * freq * rel_times)
632+
elif waveform_type == 1:
633+
# Square wave
634+
data_2d[i] = signal.square(2 * np.pi * freq * rel_times)
635+
else:
636+
# Sawtooth wave
637+
data_2d[i] = signal.sawtooth(2 * np.pi * freq * rel_times)
638+
639+
if mode == "stacked":
640+
# 2D mode: (trial, rel_time)
641+
# Absolute time is a 2D array: abs_time[trial, rel_time_idx] = trial_onset + rel_time
642+
abs_time_2d = trial_onsets[:, np.newaxis] + rel_times[np.newaxis, :]
643+
644+
ds = xr.Dataset(
645+
{"data": (("trial", "rel_time"), data_2d)},
646+
coords={
647+
"trial": trial_labels,
648+
"rel_time": rel_times,
649+
"abs_time": (("trial", "rel_time"), abs_time_2d),
650+
"trial_onset": ("trial", trial_onsets),
651+
},
652+
)
653+
else:
654+
# Linear mode: (abs_time,)
655+
# Concatenate all trials into a single 1D array
656+
data_1d = data_2d.flatten()
657+
658+
# Absolute time is continuous across all trials
659+
abs_time_1d = np.concatenate(
660+
[trial_onsets[i] + rel_times for i in range(n_trials)]
661+
)
600662

601-
ds = xr.Dataset(
602-
{"data": (("trial", "rel_time"), data)},
603-
coords={
604-
"trial": trial_labels,
605-
"rel_time": rel_times,
606-
"abs_time": (("trial", "rel_time"), abs_time_2d),
607-
"trial_onset": ("trial", trial_onsets),
608-
},
609-
)
663+
# Relative time repeats for each trial
664+
rel_time_1d = np.tile(rel_times, n_trials)
665+
666+
# Trial label for each timepoint
667+
trial_1d = np.repeat(trial_labels, n_samples)
668+
669+
# Trial onset for each timepoint
670+
trial_onset_1d = np.repeat(trial_onsets, n_samples)
671+
672+
ds = xr.Dataset(
673+
{"data": (("abs_time",), data_1d)},
674+
coords={
675+
"abs_time": abs_time_1d,
676+
"rel_time": ("abs_time", rel_time_1d),
677+
"trial": ("abs_time", trial_1d),
678+
"trial_onset": ("abs_time", trial_onset_1d),
679+
},
680+
)
610681

611682
return ds

0 commit comments

Comments
 (0)