czbiohub-sf
diff --git a/‎docs/source/tutorials/notebooks/DC_tutorial.ipynb‎
Lines changed: 20 additions & 14 deletions b/‎docs/source/tutorials/notebooks/DC_tutorial.ipynb‎
Lines changed: 20 additions & 14 deletions
@@ -5,16 +5,16 @@
    "id": "0022d8fe",
    "metadata": {},
    "source": [
-    "# Tutorial: Differential Centrifugation Workflow "
+    "# Tutorial: Differential Centrifugation (DC) Workflow "
    ]
   },
   {
    "cell_type": "markdown",
    "id": "867ed1e7",
    "metadata": {},
    "source": [
-    "*grassp* is a python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses).  \n",
-    "In this tutorial we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n"
+    "*grassp* is a Python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses).  \n",
+    "In this tutorial, we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n"
    ]
   },
   {
@@ -83,7 +83,7 @@
    "id": "e9da1946",
    "metadata": {},
    "source": [
-    "The centrifugation data in this tutorial comes from Leonetti and Elias labs at CZ Biohub SF (unpublished).  \n",
+    "The centrifugation data in this tutorial comes from the Leonetti and Elias labs at CZ Biohub SF (unpublished).  \n",
     "Centrifugation-based subcellular fractionation experiments separate cellular components by spinning samples at increasing speeds (1K, 3K, 5K, 12K, 24K, 80K × g) to partition organelles and subcellular structures based on their density and size, with the cytoplasmic fraction (Cyt) representing the final supernatant. \n",
     "\n",
     "This approach differs from immunoprecipitation (IP) pull-downs, which use antibodies to specifically capture target proteins and their interacting partners."
@@ -95,7 +95,7 @@
    "metadata": {},
    "source": [
     "Although we are loading in the data from our online data repository, grassp comes with several example datasets.  \n",
-    "The commented-out lines below shows how to load in these data, including arguments specifying raw or enriched data. "
+    "The commented-out lines below show how to load in these data, including arguments specifying raw or enriched data. "
    ]
   },
   {
@@ -144,13 +144,13 @@
     "\n",
     "`n_obs` is the number of observations (i.e. proteins in this tutorial)  \n",
     "`n_var` is the number of variables (i.e. spin-fractions/pulldowns in this tutorial)  \n",
-    "please note in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n",
+    "Please note that in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n",
     "> AnnData object with n_obs × n_vars = 10224 × 42\n",
     "\n",
     "Under `obs` we find the metadata for the proteins. Each entry is a column in a pandas DataFrame.\n",
     ">    obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'Only identified by site', 'Reverse', 'Potential contaminant', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions'\n",
     "\n",
-    "Under `var` we find the metadata for the pulldowns/Fractions.\n",
+    "Under `var` we find the metadata for the pulldowns/fractions.\n",
     ">   var: 'subcellular_enrichment', 'biological_replicate'"
    ]
   },
@@ -162,7 +162,7 @@
     "## Preprocessing\n",
     "\n",
     "### Adding Compartment Annotations\n",
-    "We add annotations that specify the ground truth subcellular compartments for each protein."
+    "We add ground-truth subcellular compartment annotations."
    ]
   },
   {
@@ -687,7 +687,7 @@
    "metadata": {},
    "source": [
     "### Adding QC metrics to the metadata\n",
-    "Before performing filtering and transformations, let's compute quality control metrics of the raw data to the metadata, which we can plot later on. "
+    "Before performing filtering and transformations, let's compute quality control metrics for the raw data and add them to the metadata, which we can plot later on. "
    ]
   },
   {
@@ -708,6 +708,12 @@
     "### Filtering"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e8aeddd9",
+   "metadata": {},
+   "source": []
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -801,7 +807,7 @@
       " [3.4632e+07 0.0000e+00 1.4594e+07 1.4703e+08 3.2308e+08]\n",
       " [2.2813e+08 1.8855e+08 2.8589e+08 5.1585e+08 8.2965e+08]\n",
       " [0.0000e+00 2.0237e+07 5.8801e+07 5.3140e+07 2.3226e+07]]\n",
-      "DC data before imputating [[18.12537  20.254702 19.400434 16.14503   0.      ]\n",
+      "DC data before imputing [[18.12537  20.254702 19.400434 16.14503   0.      ]\n",
       " [22.624493 24.182165 23.328693 21.195704 18.694214]\n",
       " [18.336037 20.590878 19.623503 17.52703   0.      ]\n",
       " [18.289284 18.643744 18.04338  19.179241 20.107265]\n",
@@ -819,7 +825,7 @@
     "print(f\"DC data before log transforming {dc_filtered.X[:10, :5]}\")\n",
     "\n",
     "dc_filtered.X = np.log1p(dc_filtered.X)\n",
-    "print(f\"DC data before imputating {dc_filtered.X[:10, :5]}\")\n",
+    "print(f\"DC data before imputing {dc_filtered.X[:10, :5]}\")\n",
     "dc_filtered.layers[\"log_intensities\"] = dc_filtered.X.copy()"
    ]
   },
@@ -867,7 +873,7 @@
    "id": "89f55f4e",
    "metadata": {},
    "source": [
-    "Plotting histogram of data distribution before vs after imputation"
+    "Plotting histogram of data distribution before vs. after imputation"
    ]
   },
   {
@@ -879,7 +885,7 @@
     {
      "data": {
       "text/plain": [
-       "<matplotlib.legend.Legend at 0x168f9a2a0>"
+       "<matplotlib.legend.Legend at 0x33ad53aa0>"
       ]
      },
      "execution_count": 11,
@@ -1233,7 +1239,7 @@
    "source": [
     "After these steps, you will see that new analysis results are stored in various Anndata compartments: PCA components and UMAP coordinates are saved in .obsm, while metadata like search engine parameters and visualization settings are stored in .uns, and protein-protein relationships are captured in .obsp as distance and connectivity matrices. \n",
     "\n",
-    ">   uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap'\n",
+    ">   uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap', 'knn_annotation_colors'\n",
     "\n",
     ">   obsm: 'X_pca', 'X_umap'\n",
     "\n",