diff --git a/docs/index.md b/docs/index.md
index dd61acc..aad5984 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -117,6 +117,5 @@ api.md
 changelog.md
 contributing.md
 references.md
-
-notebooks/example
+notebooks/index
 ```
diff --git a/docs/notebooks/example.ipynb b/docs/notebooks/example.ipynb
index 24eab28..6ffa0ce 100644
--- a/docs/notebooks/example.ipynb
+++ b/docs/notebooks/example.ipynb
@@ -4,13 +4,338 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Example notebook"
+    "# Quickstart `annbatch`"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook will walk you through the following steps:\n",
+    "1. How to convert an existing collection of `anndata` files into a shuffled, zarr-based, collection of `anndata` datasets\n",
+    "2. How to load the converted collection using `annbatch`\n",
+    "3. Extend an existing collection with new `anndata` datasets\n",
+    "\n",
+    "To use this notebook, install the extras:\n",
+    "\n",
+    "```\n",
+    "pip install annbatch[zarrs, torch]\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2025-10-09 09:43:19--  https://datasets.cellxgene.cziscience.com/866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad\n",
+      "Resolving datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)... 18.64.79.73, 18.64.79.80, 18.64.79.109, ...\n",
+      "Connecting to datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)|18.64.79.73|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 773247972 (737M) [binary/octet-stream]\n",
+      "Saving to: ‘866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad’\n",
+      "\n",
+      "866d7d5e-436b-4dbd- 100%[===================>] 737.43M   398MB/s    in 1.9s    \n",
+      "\n",
+      "2025-10-09 09:43:21 (398 MB/s) - ‘866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad’ saved [773247972/773247972]\n",
+      "\n",
+      "--2025-10-09 09:43:22--  https://datasets.cellxgene.cziscience.com/f81463b8-4986-4904-a0ea-20ff02cbb317.h5ad\n",
+      "Resolving datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)... 18.64.79.73, 18.64.79.80, 18.64.79.72, ...\n",
+      "Connecting to datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)|18.64.79.73|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 1631759823 (1.5G) [binary/octet-stream]\n",
+      "Saving to: ‘f81463b8-4986-4904-a0ea-20ff02cbb317.h5ad’\n",
+      "\n",
+      "f81463b8-4986-4904- 100%[===================>]   1.52G   425MB/s    in 3.9s    \n",
+      "\n",
+      "2025-10-09 09:43:26 (403 MB/s) - ‘f81463b8-4986-4904-a0ea-20ff02cbb317.h5ad’ saved [1631759823/1631759823]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Download two example datasets from CELLxGENE\n",
+    "!wget https://datasets.cellxgene.cziscience.com/866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad\n",
+    "!wget https://datasets.cellxgene.cziscience.com/f81463b8-4986-4904-a0ea-20ff02cbb317.h5ad"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**IMPORTANT**: Configure zarrs\n",
+    "\n",
+    "This step is both required for converting existing `anndata` files into a performant, shuffled collection of datasets for mini batch loading"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<donfig.config_obj.ConfigSet at 0x7f203c0b6900>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import zarr\n",
+    "import zarrs  # noqa\n",
+    "\n",
+    "zarr.config.set({\"codec_pipeline.path\": \"zarrs.ZarrsCodecPipeline\"})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "\n",
+    "# Suppress zarr vlen-utf8 codec warnings\n",
+    "warnings.filterwarnings(\n",
+    "    \"ignore\",\n",
+    "    message=\"The codec `vlen-utf8` is currently not part in the Zarr format 3 specification.*\",\n",
+    "    category=UserWarning,\n",
+    "    module=\"zarr.codecs.vlen_utf8\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Converting existing `anndata` files into a shuffled collection"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The conversion code will take care of the following things:\n",
+    "* Align (outer join) the gene spaces across all datasets listed in `adata_paths`\n",
+    "  * The gene spaces are outer-joined based on the gene names provided in the `var_names` field of the individual `AnnData` objects.\n",
+    "  * If you want to subset to specific gene space, you can provide a list of gene names via the `var_subset` parameter.\n",
+    "* Shuffle the cells across all datasets (this works on larger than memory datasets as well).\n",
+    "  * This is important for block-wise shuffling during data loading.\n",
+    "* Shuffle the input files across multiple output datasets:\n",
+    "  * The size of each individual output dataset can be controlled via the `n_obs_per_dataset` parameter.\n",
+    "  * We recommend to choose a dataset size that comfortably fits into system memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/mnt/volume/arrayloaders/src/annbatch/io.py:228: UserWarning: Some anndatas have layers keys not present in others' layers, consider stopping and using the `transform_input_adata` argument to alter layers accordingly.\n",
+      "  adata_concat = _lazy_load_with_obs_var_in_memory(adata_paths)\n",
+      "  0%|          | 0/1 [00:00<?, ?it/s]/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.\n",
+      "  self._set_arrayXarray(i, j, x)\n",
+      "/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.\n",
+      "  self._set_arrayXarray(i, j, x)\n",
+      "/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/zarr/api/asynchronous.py:244: ZarrUserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.\n",
+      "  warnings.warn(\n",
+      "100%|██████████| 1/1 [12:26<00:00, 746.02s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from annbatch import create_anndata_collection\n",
+    "from anndata import AnnData\n",
+    "\n",
+    "def del_layers(adata: AnnData) -> AnnData:\n",
+    "    del adata.layers # soupX is present in one of the datasets' layers but not the other\n",
+    "    return adata\n",
+    "\n",
+    "create_anndata_collection(\n",
+    "    # List all the h5ad files you want to include in the collection\n",
+    "    adata_paths=[\"866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad\", \"f81463b8-4986-4904-a0ea-20ff02cbb317.h5ad\"],\n",
+    "    # Path to store the output collection\n",
+    "    output_path=\"annbatch_collection\",\n",
+    "    shuffle=True,  # Whether to pre-shuffle the cells of the collection\n",
+    "    n_obs_per_dataset=2_097_152,  # Number of cells per dataset shard\n",
+    "    var_subset=None,  # Optionally subset the collection to a specific gene space\n",
+    "    should_denseify=False,\n",
+    "    transform_input_adata=del_layers\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data loading example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "\n",
+    "COLLECTION_PATH = Path(\"annbatch_collection/\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<annbatch.sparse.ZarrSparseDataset at 0x7f1ed87a88c0>"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import anndata as ad\n",
+    "\n",
+    "from annbatch import ZarrSparseDataset\n",
+    "\n",
+    "ds = ZarrSparseDataset(\n",
+    "    batch_size=4096,  # Total number of obs per yielded batch\n",
+    "    chunk_size=256,  # Number of obs to load from disk contiguously - default settings should work well\n",
+    "    preload_nchunks=32,  # Number of chunks to preload + shuffle - default settings should work well\n",
+    "    preload_to_gpu=False,  # If True, preloaded chunks are moved to GPU memory via `cupy`, which can put more pressure on GPU memory but will accelerate loading ~20%\n",
+    "    to_torch=True\n",
+    ")\n",
+    "\n",
+    "# Add dataset that should be used for training\n",
+    "ds.add_anndatas(\n",
+    "    [\n",
+    "        ad.AnnData(\n",
+    "            X=ad.io.sparse_dataset(zarr.open(p)[\"X\"]),\n",
+    "            obs=ad.io.read_elem(zarr.open(p)[\"obs\"]),\n",
+    "        )\n",
+    "        for p in COLLECTION_PATH.glob(\"*.zarr\")\n",
+    "    ],\n",
+    "    obs_keys=\"cell_type\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**IMPORTANT:**\n",
+    "* The `ZarrSparseDataset` yields batches of sparse tensors.\n",
+    "* The conversion to dense tensors should be done on the GPU, as shown in the example below.\n",
+    "  * First call `.cuda()` and then `.to_dense()`\n",
+    "  * E.g. `x = x.cuda().to_dense()`\n",
+    "  * This is significantly faster than doing the dense conversion on the CPU.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "  0%|          | 0/171792 [00:00<?, ?it/s]/mnt/volume/arrayloaders/src/annbatch/utils.py:307: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /pytorch/aten/src/ATen/SparseCsrTensorImpl.cpp:53.)\n",
+      "  tensor = torch.sparse_csr_tensor(\n",
+      "  0%|          | 42/171792 [00:04<5:33:36,  8.58it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Iterate over dataloader\n",
+    "import tqdm\n",
+    "\n",
+    "for batch in tqdm.tqdm(ds):\n",
+    "    x, obs = batch\n",
+    "    # Important: Convert to dense on GPU\n",
+    "    x = x.cuda().to_dense()\n",
+    "    # Feed data into your model\n",
+    "    ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Optional: Extend an existing collection with a new dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You might want to extend an existing pre-shuffled collection with a new dataset.\n",
+    "This can be done using the `add_to_collection` function.\n",
+    "\n",
+    "This function will take care of shuffling the new dataset into the existing collection without having to re-shuffle the entire collection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "  0%|          | 0/1 [00:00<?, ?it/s]/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/anndata/_core/anndata.py:1791: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.\n",
+      "  utils.warn_names_duplicates(\"obs\")\n",
+      "/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/anndata/_core/anndata.py:1791: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.\n",
+      "  utils.warn_names_duplicates(\"obs\")\n",
+      "/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/anndata/_core/anndata.py:1791: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.\n",
+      "  utils.warn_names_duplicates(\"obs\")\n",
+      "/home/ubuntu/home_drive/volume/arrayloaders/venv/lib/python3.12/site-packages/zarr/api/asynchronous.py:244: ZarrUserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.\n",
+      "  warnings.warn(\n",
+      "100%|██████████| 1/1 [00:30<00:00, 30.76s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from annbatch import add_to_collection\n",
+    "\n",
+    "add_to_collection(\n",
+    "    adata_paths=[\n",
+    "        \"866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad\",\n",
+    "    ],\n",
+    "    output_path=\"annbatch_collection\",\n",
+    "    read_full_anndatas=True,  # This should be set to False if the new datasets DO NOT fit into memory\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.9.12 ('squidpy39')",
+   "display_name": "venv",
    "language": "python",
    "name": "python3"
   },
@@ -24,12 +349,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.3"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "ae6466e8d4f517858789b5c9e8f0ed238fb8964458a36305fca7bddc149e9c64"
-   }
+   "version": "3.12.6"
   }
  },
  "nbformat": 4,
diff --git a/docs/notebooks/index.md b/docs/notebooks/index.md
new file mode 100644
index 0000000..de485bf
--- /dev/null
+++ b/docs/notebooks/index.md
@@ -0,0 +1,8 @@
+# Notebooks
+
+```{toctree}
+:hidden: true
+:maxdepth: 1
+
+example
+```