scverse
diff --git a/‎.claude/skills/memray/SKILL.md‎
Lines changed: 37 additions & 0 deletions b/‎.claude/skills/memray/SKILL.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎.claude/skills/profimp/SKILL.md‎
Lines changed: 30 additions & 0 deletions b/‎.claude/skills/profimp/SKILL.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎.claude/skills/pyspy/SKILL.md‎
Lines changed: 45 additions & 0 deletions b/‎.claude/skills/pyspy/SKILL.md‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎.github/workflows/release.yaml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/release.yaml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/test.yaml‎
Lines changed: 17 additions & 19 deletions b/‎.github/workflows/test.yaml‎
Lines changed: 17 additions & 19 deletions
diff --git a/‎.gitignore‎
Lines changed: 11 additions & 2 deletions b/‎.gitignore‎
Lines changed: 11 additions & 2 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 3 additions & 3 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎.readthedocs.yaml‎
Lines changed: 15 additions & 12 deletions b/‎.readthedocs.yaml‎
Lines changed: 15 additions & 12 deletions
diff --git a/‎benchmarks/README.md‎
Lines changed: 49 additions & 11 deletions b/‎benchmarks/README.md‎
Lines changed: 49 additions & 11 deletions
@@ -0,0 +1,37 @@
+---
+name: memray
+description: Profile the memory usage of a Python script using memray and visualize a temporal flamegraph in the browser. Use when the user wants to investigate memory consumption, find leaks, or understand allocation patterns.
+compatibility: Requires the pixi profiling environment (pixi run -e profiling). Supports Linux and macOS.
+allowed-tools: Bash(pixi run -e profiling memray-run:*) Bash(pixi run -e profiling memray-flame:*) Bash(open:*) Bash(xdg-open:*) Bash(python -m webbrowser:*)
+---
+
+## Steps
+
+1. Ask the user which script to profile (full or relative path).
+
+2. Run the script under memray:
+
+    ```bash
+    pixi run -e profiling memray-run script.py
+    ```
+
+    This produces a binary file named `memray-script.py.<pid>.bin` in the current directory.
+
+3. Generate the flamegraph HTML report from the `.bin` file:
+
+    ```bash
+    pixi run -e profiling memray-flame memray-script.py.<pid>.bin
+    ```
+
+    Replace `<pid>` with the actual PID shown in the filename. This writes `memray-flamegraph-script.py.<pid>.html`.
+
+4. Open the report in the browser:
+    - macOS: `open memray-flamegraph-script.py.<pid>.html`
+    - Linux: `xdg-open memray-flamegraph-script.py.<pid>.html`
+    - Either: `python -m webbrowser memray-flamegraph-script.py.<pid>.html`
+
+## Notes
+
+- The `--temporal` flag (included in `memray-flame`) shows memory over time, not just peak — use this to spot leaks and allocation bursts.
+- To find the `.bin` file if unsure of the name: `ls memray-*.bin`
+- To compare runs, save the previous report: `cp memray-flamegraph-script.py.<pid>.html memray-flamegraph-before.html`
@@ -0,0 +1,30 @@
+---
+name: profimp
+description: Profile Python import time using profimp and open a waterfall HTML report. Use when investigating slow startup or wanting to identify which imports are most expensive.
+compatibility: Requires profimp (available as a pixi dependency). macOS or Linux.
+allowed-tools: Bash(profimp:*) Bash(open:*) Bash(xdg-open:*) Bash(python -m webbrowser:*)
+---
+
+## Steps
+
+1. Ask what to profile. Suggest common patterns for this repo:
+    - `import spatialdata`
+    - `from spatialdata import SpatialData`
+    - `from spatialdata_io import xenium`
+
+2. Run:
+
+```bash
+profimp --html "<import_stmt>" > /tmp/profimp.html
+```
+
+3. Open the report:
+    - macOS: `open /tmp/profimp.html`
+    - Linux: `xdg-open /tmp/profimp.html`
+    - Either: `python -m webbrowser /tmp/profimp.html`
+
+## Notes
+
+- The report is a waterfall chart showing every sub-import and its timing.
+- To compare before/after: `cp /tmp/profimp.html /tmp/profimp-before.html` before re-running.
+- `pixi run python -m profimp` also works if `profimp` is not on PATH.
@@ -0,0 +1,45 @@
+---
+name: pyspy
+description: Profile the execution time of a Python script using py-spy and visualize the result with speedscope. Use when the user wants to benchmark performance, find slow code paths, or profile CPU time.
+compatibility: Requires the pixi profiling environment (pixi run -e profiling). Speedscope must be installed separately (npm install -g speedscope). sudo is required on macOS.
+allowed-tools: Bash(pixi run -e profiling pyspy:*) Bash(pixi run -e profiling speedscope:*) Bash(sudo pixi run -e profiling pyspy:*)
+---
+
+## Steps
+
+1. Ask the user which script to profile (full or relative path).
+
+2. Run py-spy to record the profile. The output is always written to `profile.speedscope.json` in the current directory.
+
+    **Linux** (no sudo needed):
+
+    ```bash
+    pixi run -e profiling pyspy script.py
+    ```
+
+    **macOS** (sudo required — py-spy needs to attach to the process):
+
+    ```bash
+    sudo pixi run -e profiling pyspy script.py
+    ```
+
+    If sudo fails to find the pixi environment, use absolute paths:
+
+    ```bash
+    sudo /path/to/.pixi/envs/profiling/bin/py-spy record --gil \
+      -o profile.speedscope.json --format speedscope \
+      -- /path/to/.pixi/envs/profiling/bin/python script.py
+    ```
+
+3. Open the result in speedscope:
+    ```bash
+    pixi run -e profiling speedscope
+    ```
+    This opens `profile.speedscope.json` in the browser via the local speedscope CLI.
+
+## Notes
+
+- If the speedscope view is blank, switch threads using the thread selector in the top-right corner.
+- To save a profile before overwriting: `cp profile.speedscope.json profile-before.speedscope.json`
+- `--gil` records only time when the GIL is held (Python-level CPU time). Drop it to include C extension time.
+- speedscope must be installed globally: `npm install -g speedscope`
@@ -9,9 +9,9 @@ jobs:
         runs-on: ubuntu-latest
         if: startsWith(github.ref, 'refs/tags/v')
         steps:
-            - uses: actions/checkout@v3
+            - uses: actions/checkout@v6
             - name: Set up Python 3.12
-              uses: actions/setup-python@v4
+              uses: actions/setup-python@v6
               with:
                   python-version: "3.12"
                   cache: pip
 
@@ -13,57 +13,55 @@ jobs:
         runs-on: ${{ matrix.os }}
         defaults:
             run:
-                shell: bash -e {0}
+                shell: bash  # bash also on windows
 
         strategy:
             fail-fast: false
             matrix:
                 include:
-                    - {os: windows-latest, python: "3.11", dask-version: "2025.2.0", name: "Dask 2025.2.0"}
-                    - {os: windows-latest, python: "3.13", dask-version: "latest", name: "Dask latest"}
-                    - {os: ubuntu-latest, python: "3.11", dask-version: "latest", name: "Dask latest"}
-                    - {os: ubuntu-latest, python: "3.13", dask-version: "latest", name: "Dask latest"}
-                    - {os: macos-latest, python: "3.11", dask-version: "latest", name: "Dask latest"}
-                    - {os: macos-latest, python: "3.13", prerelease: "allow", name: "Python 3.13 (pre-release)"}
+                    - {os: windows-latest, python: "3.11", dask-version: "2025.12.0", name: "min dask"}
+                    - {os: windows-latest, python: "3.14", dask-version: "latest"}
+                    - {os: ubuntu-latest, python: "3.11", dask-version: "latest"}
+                    - {os: ubuntu-latest, python: "3.14", dask-version: "latest"}
+                    - {os: macos-latest, python: "3.11", dask-version: "latest"}
+                    - {os: macos-latest, python: "3.14", prerelease: "allow", name: "prerelease"}
         env:
             OS: ${{ matrix.os }}
             PYTHON: ${{ matrix.python }}
             DASK_VERSION: ${{ matrix.dask-version }}
             PRERELEASE: ${{ matrix.prerelease }}
 
         steps:
-            - uses: actions/checkout@v2
-            - uses: astral-sh/setup-uv@v5
+            - uses: actions/checkout@v6
+            - uses: astral-sh/setup-uv@v7
               id: setup-uv
               with:
                   version: "latest"
                   python-version: ${{ matrix.python }}
             - name: Install dependencies
               run: |
                   if [[ "${PRERELEASE}" == "allow" ]]; then
-                    uv sync --extra test
-                    : # uv sync --extra test --prerelease ${PRERELEASE}
-                    uv pip install git+https://github.com/scverse/anndata.git
-                    uv pip install --prerelease allow pandas
-                  else
-                    uv sync --extra test
+                    sed -i '' 's/requires-python.*//' pyproject.toml # otherwise uv complains that anndata requires python>=3.12 and we only do >=3.11 😱
+                    uv add git+https://github.com/scverse/anndata.git
+                    uv add pandas>=3.dev0
                   fi
                   if [[ -n "${DASK_VERSION}" ]]; then
                     if [[ "${DASK_VERSION}" == "latest" ]]; then
-                      uv pip install --upgrade dask
+                      uv add dask
                     else
-                      uv pip install dask==${DASK_VERSION}
+                      uv add dask==${DASK_VERSION}
                     fi
                   fi
+                  uv sync --group=test
             - name: Test
               env:
                   MPLBACKEND: agg
                   PLATFORM: ${{ matrix.os }}
                   DISPLAY: :42
               run: |
-                  uv run pytest --cov --color=yes --cov-report=xml
+                  uv run pytest --cov --color=yes --cov-report=xml -n auto --dist worksteal
             - name: Upload coverage to Codecov
-              uses: codecov/codecov-action@v4
+              uses: codecov/codecov-action@v5
               with:
                   name: coverage
                   verbose: true
 
@@ -20,6 +20,7 @@ __pycache__/
 docs/_build
 !docs/api/.md
 docs/**/generated
+docs/_static/datasets_data.js
 
 # IDEs
 /.idea/
@@ -45,10 +46,18 @@ spatialdata-sandbox
 # version file
 _version.py
 
-# other
-node_modules/
+# agents configurations
+.claude/settings.local.json
 
+# benchmarking and profiling
 .asv/
+profile.speedscope.json
+
+# other
+node_modules/
 
 .mypy_cache
 .ruff_cache
+uv.lock
+pixi.lock
+
@@ -9,18 +9,18 @@ ci:
     skip: []
 repos:
     - repo: https://github.com/rbubley/mirrors-prettier
-      rev: v3.7.4
+      rev: v3.8.3
       hooks:
           - id: prettier
             exclude: ^.github/workflows/test.yaml
     - repo: https://github.com/pre-commit/mirrors-mypy
-      rev: v1.19.1
+      rev: v2.0.0
       hooks:
           - id: mypy
             additional_dependencies: [numpy, types-requests]
             exclude: tests/|docs/
     - repo: https://github.com/astral-sh/ruff-pre-commit
-      rev: v0.14.10
+      rev: v0.15.12
       hooks:
           - id: ruff
             args: [--fix, --exit-non-zero-on-fix]
 
@@ -1,19 +1,22 @@
 # https://docs.readthedocs.io/en/stable/config-file/v2.html
 version: 2
 build:
-    os: ubuntu-20.04
+    os: ubuntu-24.04
     tools:
-        python: "3.11"
-sphinx:
-    configuration: docs/conf.py
-    fail_on_warning: true
-python:
-    install:
-        - method: pip
-          path: .
-          extra_requirements:
-              - docs
-              - torch
+        python: "3.13"
+    jobs:
+        post_checkout:
+            # unshallow so version can be derived from tag
+            - git fetch --unshallow || true
+        create_environment:
+            - asdf plugin add uv
+            - asdf install uv latest
+            - asdf global uv latest
+        build:
+            html:
+                - uv sync --group=docs --extra=torch
+                - uv run make --directory=docs html
+                - mv docs/_build $READTHEDOCS_OUTPUT
 submodules:
     include:
         - "docs/tutorials/notebooks"
 
@@ -14,25 +14,63 @@ pip install -e '.[docs,test,benchmark]'
 
 ## Usage
 
-Running all the benchmarks is usually not needed. You run the benchmark using `asv run`. See the [asv documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-run) for interesting arguments, like selecting the benchmarks you're interested in by providing a regex pattern `-b` or `--bench` that links to a function or class method e.g. the option `-b timeraw_import_inspect` selects the function `timeraw_import_inspect` in `benchmarks/spatialdata_benchmark.py`. You can run the benchmark in your current environment with `--python=same`. Some example benchmarks:
+Running all the benchmarks is usually not needed. You run the benchmark using `asv run`. See the [asv documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-run) for interesting arguments, like selecting the benchmarks you're interested in by providing a regex pattern `-b` or `--bench` that links to a function or class method. You can run the benchmark in your current environment with `--python=same`. Some example benchmarks:
 
-Importing the SpatialData library can take around 4 seconds:
+### Import time benchmarks
+
+Import benchmarks live in `benchmarks/benchmark_imports.py`. Each `timeraw_*` function returns a Python code snippet that asv runs in a fresh interpreter (cold import, empty module cache):
+
+Run all import benchmarks in your current environment:
 
 ```
-PYTHONWARNINGS="ignore" asv run --python=same --show-stderr -b timeraw_import_inspect
-Couldn't load asv.plugins._mamba_helpers because
-No module named 'conda'
-· Discovering benchmarks
-· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
-[ 0.00%] ·· Benchmarking existing-py_opt_homebrew_Caskroom_mambaforge_base_envs_spatialdata2_bin_python3.12
-[50.00%] ··· Running (spatialdata_benchmark.timeraw_import_inspect--).
-[100.00%] ··· spatialdata_benchmark.timeraw_import_inspect                                                                            3.65±0.2s
+asv run --python=same --show-stderr -b timeraw
+```
+
+Or a single one:
+
+```
+asv run --python=same --show-stderr -b timeraw_import_spatialdata
+```
+
+### Comparing the current branch against `main`
+
+The simplest way is `asv continuous`, which builds both commits, runs the benchmarks, and prints the comparison in one shot:
+
+```bash
+asv continuous --show-stderr -v -b timeraw main faster-import
 ```
 
+Replace `faster-import` with any branch name or commit hash. The `-v` flag prints per-sample timings; drop it for a shorter summary.
+
+Alternatively, collect results separately and compare afterwards:
+
+```bash
+# 1. Collect results for the tip of main and the tip of your branch
+asv run --show-stderr -b timeraw main
+asv run --show-stderr -b timeraw HEAD
+
+# 2. Print a side-by-side comparison
+asv compare main HEAD
+```
+
+Both approaches build isolated environments from scratch. If you prefer to skip the rebuild and reuse your current environment (faster, less accurate):
+
+```bash
+asv run --python=same --show-stderr -b timeraw HEAD
+
+git stash && git checkout main
+asv run --python=same --show-stderr -b timeraw HEAD
+git checkout - && git stash pop
+
+asv compare main HEAD
+```
+
+### Querying benchmarks
+
 Querying using a bounding box without a spatial index is highly impacted by large amounts of points (transcripts), more than table rows (cells).
 
 ```
-$ PYTHONWARNINGS="ignore" asv run --python=same --show-stderr -b time_query_bounding_box
+$ asv run --python=same --show-stderr -b time_query_bounding_box
 
 [100.00%] ··· ======== ============ ============= ============= ==============
               --                filter_table / n_transcripts_per_cell