|
5 | 5 | "id": "0022d8fe", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# Tutorial: Differential Centrifugation Workflow " |
| 8 | + "# Tutorial: Differential Centrifugation (DC) Workflow " |
9 | 9 | ] |
10 | 10 | }, |
11 | 11 | { |
12 | 12 | "cell_type": "markdown", |
13 | 13 | "id": "867ed1e7", |
14 | 14 | "metadata": {}, |
15 | 15 | "source": [ |
16 | | - "*grassp* is a python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses). \n", |
17 | | - "In this tutorial we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n" |
| 16 | + "*grassp* is a Python package that facilitates the analysis of subcellular proteomics data (with an emphasis on graph-based analyses). \n", |
| 17 | + "In this tutorial, we analyze subcellular proteomics data produced by differential ultracentrifugation (DC). \n" |
18 | 18 | ] |
19 | 19 | }, |
20 | 20 | { |
|
83 | 83 | "id": "e9da1946", |
84 | 84 | "metadata": {}, |
85 | 85 | "source": [ |
86 | | - "The centrifugation data in this tutorial comes from Leonetti and Elias labs at CZ Biohub SF (unpublished). \n", |
| 86 | + "The centrifugation data in this tutorial comes from the Leonetti and Elias labs at CZ Biohub SF (unpublished). \n", |
87 | 87 | "Centrifugation-based subcellular fractionation experiments separate cellular components by spinning samples at increasing speeds (1K, 3K, 5K, 12K, 24K, 80K × g) to partition organelles and subcellular structures based on their density and size, with the cytoplasmic fraction (Cyt) representing the final supernatant. \n", |
88 | 88 | "\n", |
89 | 89 | "This approach differs from immunoprecipitation (IP) pull-downs, which use antibodies to specifically capture target proteins and their interacting partners." |
|
95 | 95 | "metadata": {}, |
96 | 96 | "source": [ |
97 | 97 | "Although we are loading in the data from our online data repository, grassp comes with several example datasets. \n", |
98 | | - "The commented-out lines below shows how to load in these data, including arguments specifying raw or enriched data. " |
| 98 | + "The commented-out lines below show how to load in these data, including arguments specifying raw or enriched data. " |
99 | 99 | ] |
100 | 100 | }, |
101 | 101 | { |
|
144 | 144 | "\n", |
145 | 145 | "`n_obs` is the number of observations (i.e. proteins in this tutorial) \n", |
146 | 146 | "`n_var` is the number of variables (i.e. spin-fractions/pulldowns in this tutorial) \n", |
147 | | - "please note in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n", |
| 147 | + "Please note that in single-cell RNA-seq datasets, observations correspond to individual cells, and variables correspond to genes (or transcripts)\n", |
148 | 148 | "> AnnData object with n_obs × n_vars = 10224 × 42\n", |
149 | 149 | "\n", |
150 | 150 | "Under `obs` we find the metadata for the proteins. Each entry is a column in a pandas DataFrame.\n", |
151 | 151 | "> obs: 'Protein IDs', 'Majority protein IDs', 'Peptide counts (all)', 'Peptide counts (razor+unique)', 'Peptide counts (unique)', 'Protein names', 'Gene names', 'Fasta headers', 'Number of proteins', 'Peptides', 'Razor + unique peptides', 'Unique peptides', 'Sequence coverage [%]', 'Unique + razor sequence coverage [%]', 'Unique sequence coverage [%]', 'Mol. weight [kDa]', 'Sequence length', 'Sequence lengths', 'Fraction average', 'Fraction 1', 'Fraction 2', 'Fraction 3', 'Q-value', 'Score', 'Intensity', 'iBAQ', 'MS/MS count', 'Only identified by site', 'Reverse', 'Potential contaminant', 'id', 'Peptide IDs', 'Peptide is razor', 'Mod. peptide IDs', 'Evidence IDs', 'MS/MS IDs', 'Best MS/MS', 'Oxidation (M) site IDs', 'Oxidation (M) site positions'\n", |
152 | 152 | "\n", |
153 | | - "Under `var` we find the metadata for the pulldowns/Fractions.\n", |
| 153 | + "Under `var` we find the metadata for the pulldowns/fractions.\n", |
154 | 154 | "> var: 'subcellular_enrichment', 'biological_replicate'" |
155 | 155 | ] |
156 | 156 | }, |
|
162 | 162 | "## Preprocessing\n", |
163 | 163 | "\n", |
164 | 164 | "### Adding Compartment Annotations\n", |
165 | | - "We add annotations that specify the ground truth subcellular compartments for each protein." |
| 165 | + "We add ground-truth subcellular compartment annotations." |
166 | 166 | ] |
167 | 167 | }, |
168 | 168 | { |
|
687 | 687 | "metadata": {}, |
688 | 688 | "source": [ |
689 | 689 | "### Adding QC metrics to the metadata\n", |
690 | | - "Before performing filtering and transformations, let's compute quality control metrics of the raw data to the metadata, which we can plot later on. " |
| 690 | + "Before performing filtering and transformations, let's compute quality control metrics for the raw data and add them to the metadata, which we can plot later on. " |
691 | 691 | ] |
692 | 692 | }, |
693 | 693 | { |
|
708 | 708 | "### Filtering" |
709 | 709 | ] |
710 | 710 | }, |
| 711 | + { |
| 712 | + "cell_type": "markdown", |
| 713 | + "id": "e8aeddd9", |
| 714 | + "metadata": {}, |
| 715 | + "source": [] |
| 716 | + }, |
711 | 717 | { |
712 | 718 | "cell_type": "code", |
713 | 719 | "execution_count": 7, |
|
801 | 807 | " [3.4632e+07 0.0000e+00 1.4594e+07 1.4703e+08 3.2308e+08]\n", |
802 | 808 | " [2.2813e+08 1.8855e+08 2.8589e+08 5.1585e+08 8.2965e+08]\n", |
803 | 809 | " [0.0000e+00 2.0237e+07 5.8801e+07 5.3140e+07 2.3226e+07]]\n", |
804 | | - "DC data before imputating [[18.12537 20.254702 19.400434 16.14503 0. ]\n", |
| 810 | + "DC data before imputing [[18.12537 20.254702 19.400434 16.14503 0. ]\n", |
805 | 811 | " [22.624493 24.182165 23.328693 21.195704 18.694214]\n", |
806 | 812 | " [18.336037 20.590878 19.623503 17.52703 0. ]\n", |
807 | 813 | " [18.289284 18.643744 18.04338 19.179241 20.107265]\n", |
|
819 | 825 | "print(f\"DC data before log transforming {dc_filtered.X[:10, :5]}\")\n", |
820 | 826 | "\n", |
821 | 827 | "dc_filtered.X = np.log1p(dc_filtered.X)\n", |
822 | | - "print(f\"DC data before imputating {dc_filtered.X[:10, :5]}\")\n", |
| 828 | + "print(f\"DC data before imputing {dc_filtered.X[:10, :5]}\")\n", |
823 | 829 | "dc_filtered.layers[\"log_intensities\"] = dc_filtered.X.copy()" |
824 | 830 | ] |
825 | 831 | }, |
|
867 | 873 | "id": "89f55f4e", |
868 | 874 | "metadata": {}, |
869 | 875 | "source": [ |
870 | | - "Plotting histogram of data distribution before vs after imputation" |
| 876 | + "Plotting histogram of data distribution before vs. after imputation" |
871 | 877 | ] |
872 | 878 | }, |
873 | 879 | { |
|
879 | 885 | { |
880 | 886 | "data": { |
881 | 887 | "text/plain": [ |
882 | | - "<matplotlib.legend.Legend at 0x168f9a2a0>" |
| 888 | + "<matplotlib.legend.Legend at 0x33ad53aa0>" |
883 | 889 | ] |
884 | 890 | }, |
885 | 891 | "execution_count": 11, |
|
1233 | 1239 | "source": [ |
1234 | 1240 | "After these steps, you will see that new analysis results are stored in various Anndata compartments: PCA components and UMAP coordinates are saved in .obsm, while metadata like search engine parameters and visualization settings are stored in .uns, and protein-protein relationships are captured in .obsp as distance and connectivity matrices. \n", |
1235 | 1241 | "\n", |
1236 | | - "> uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap'\n", |
| 1242 | + "> uns: 'Search_Engine', 'pca', 'hein2024_gt_component_colors', 'hein2024_component_colors', 'neighbors', 'umap', 'knn_annotation_colors'\n", |
1237 | 1243 | "\n", |
1238 | 1244 | "> obsm: 'X_pca', 'X_umap'\n", |
1239 | 1245 | "\n", |
|
0 commit comments