Skip to content

Commit 892068f

Browse files
committed
Refined the OrgIP tutorial
also fixed typo for both DC and OrgIP tutorials
1 parent 2d1da65 commit 892068f

File tree

2 files changed

+799
-87
lines changed

2 files changed

+799
-87
lines changed

docs/source/tutorials/notebooks/DC_tutorial.ipynb

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,16 @@
55
"id": "0022d8fe",
66
"metadata": {},
77
"source": [
8-
"# Tutorial: Differential Centrifugation Workflow "
8+
"# Tutorial: Differential Centrifugation (DC) Workflow "
99
]
1010
},
1111
{
1212
"cell_type": "markdown",
1313
"id": "867ed1e7",
1414
"metadata": {},
1515
"source": [
16-
"*grassp* is a python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses). \n",
17-
"In this tutorial we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n"
16+
"*grassp* is a Python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses). \n",
17+
"In this tutorial, we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n"
1818
]
1919
},
2020
{
@@ -83,7 +83,7 @@
8383
"id": "e9da1946",
8484
"metadata": {},
8585
"source": [
86-
"The centrifugation data in this tutorial comes from Leonetti and Elias labs at CZ Biohub SF (unpublished). \n",
86+
"The centrifugation data in this tutorial comes from the Leonetti and Elias labs at CZ Biohub SF (unpublished). \n",
8787
"Centrifugation-based subcellular fractionation experiments separate cellular components by spinning samples at increasing speeds (1K, 3K, 5K, 12K, 24K, 80K × g) to partition organelles and subcellular structures based on their density and size, with the cytoplasmic fraction (Cyt) representing the final supernatant. \n",
8888
"\n",
8989
"This approach differs from immunoprecipitation (IP) pull-downs, which use antibodies to specifically capture target proteins and their interacting partners."
@@ -95,7 +95,7 @@
9595
"metadata": {},
9696
"source": [
9797
"Although we are loading in the data from our online data repository, grassp comes with several example datasets. \n",
98-
"The commented-out lines below shows how to load in these data, including arguments specifying raw or enriched data. "
98+
"The commented-out lines below show how to load in these data, including arguments specifying raw or enriched data. "
9999
]
100100
},
101101
{
@@ -144,13 +144,13 @@
144144
"\n",
145145
"`n_obs` is the number of observations (i.e. proteins in this tutorial) \n",
146146
"`n_var` is the number of variables (i.e. spin-fractions/pulldowns in this tutorial) \n",
147-
"please note in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n",
147+
"Please note that in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n",
148148
"> AnnData object with n_obs × n_vars = 10224 × 42\n",
149149
"\n",
150150
"Under `obs` we find the metadata for the proteins. Each entry is a column in a pandas DataFrame.\n",
151151
"> obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'Only identified by site', 'Reverse', 'Potential contaminant', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions'\n",
152152
"\n",
153-
"Under `var` we find the metadata for the pulldowns/Fractions.\n",
153+
"Under `var` we find the metadata for the pulldowns/fractions.\n",
154154
"> var: 'subcellular_enrichment', 'biological_replicate'"
155155
]
156156
},
@@ -162,7 +162,7 @@
162162
"## Preprocessing\n",
163163
"\n",
164164
"### Adding Compartment Annotations\n",
165-
"We add annotations that specify the ground truth subcellular compartments for each protein."
165+
"We add ground-truth subcellular compartment annotations."
166166
]
167167
},
168168
{
@@ -687,7 +687,7 @@
687687
"metadata": {},
688688
"source": [
689689
"### Adding QC metrics to the metadata\n",
690-
"Before performing filtering and transformations, let's compute quality control metrics of the raw data to the metadata, which we can plot later on. "
690+
"Before performing filtering and transformations, let's compute quality control metrics for the raw data and add them to the metadata, which we can plot later on. "
691691
]
692692
},
693693
{
@@ -708,6 +708,12 @@
708708
"### Filtering"
709709
]
710710
},
711+
{
712+
"cell_type": "markdown",
713+
"id": "e8aeddd9",
714+
"metadata": {},
715+
"source": []
716+
},
711717
{
712718
"cell_type": "code",
713719
"execution_count": 7,
@@ -801,7 +807,7 @@
801807
" [3.4632e+07 0.0000e+00 1.4594e+07 1.4703e+08 3.2308e+08]\n",
802808
" [2.2813e+08 1.8855e+08 2.8589e+08 5.1585e+08 8.2965e+08]\n",
803809
" [0.0000e+00 2.0237e+07 5.8801e+07 5.3140e+07 2.3226e+07]]\n",
804-
"DC data before imputating [[18.12537 20.254702 19.400434 16.14503 0. ]\n",
810+
"DC data before imputing [[18.12537 20.254702 19.400434 16.14503 0. ]\n",
805811
" [22.624493 24.182165 23.328693 21.195704 18.694214]\n",
806812
" [18.336037 20.590878 19.623503 17.52703 0. ]\n",
807813
" [18.289284 18.643744 18.04338 19.179241 20.107265]\n",
@@ -819,7 +825,7 @@
819825
"print(f\"DC data before log transforming {dc_filtered.X[:10, :5]}\")\n",
820826
"\n",
821827
"dc_filtered.X = np.log1p(dc_filtered.X)\n",
822-
"print(f\"DC data before imputating {dc_filtered.X[:10, :5]}\")\n",
828+
"print(f\"DC data before imputing {dc_filtered.X[:10, :5]}\")\n",
823829
"dc_filtered.layers[\"log_intensities\"] = dc_filtered.X.copy()"
824830
]
825831
},
@@ -867,7 +873,7 @@
867873
"id": "89f55f4e",
868874
"metadata": {},
869875
"source": [
870-
"Plotting histogram of data distribution before vs after imputation"
876+
"Plotting histogram of data distribution before vs. after imputation"
871877
]
872878
},
873879
{
@@ -879,7 +885,7 @@
879885
{
880886
"data": {
881887
"text/plain": [
882-
"<matplotlib.legend.Legend at 0x168f9a2a0>"
888+
"<matplotlib.legend.Legend at 0x33ad53aa0>"
883889
]
884890
},
885891
"execution_count": 11,
@@ -1233,7 +1239,7 @@
12331239
"source": [
12341240
"After these steps, you will see that new analysis results are stored in various Anndata compartments: PCA components and UMAP coordinates are saved in .obsm, while metadata like search engine parameters and visualization settings are stored in .uns, and protein-protein relationships are captured in .obsp as distance and connectivity matrices. \n",
12351241
"\n",
1236-
"> uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap'\n",
1242+
"> uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap', 'knn_annotation_colors'\n",
12371243
"\n",
12381244
"> obsm: 'X_pca', 'X_umap'\n",
12391245
"\n",

0 commit comments

Comments
 (0)