|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 | 4 | "cell_type": "markdown",
|
5 |
| - "id": "ff46b205", |
6 | 5 | "metadata": {},
|
7 | 6 | "source": [
|
8 | 7 | "# Decomposing gene co-expression networks with COBRA (Python version)\n",
|
|
13 | 12 | },
|
14 | 13 | {
|
15 | 14 | "cell_type": "markdown",
|
16 |
| - "id": "71d4f5e4", |
17 | 15 | "metadata": {},
|
18 | 16 | "source": [
|
19 | 17 | "## 1. Introduction\n",
|
|
23 | 21 | "\n",
|
24 | 22 | "COBRA is now part of the [netZooPy package](https://github.com/netZoo/netZooPy). Please follow the installation guidelines on the [README](https://github.com/netZoo/netZooPy/blob/master/README.md). If you need help or if you have any question about netZoo, feel free to start with [discussions](https://github.com/netZoo/netZooPy/discussions). To report a bug, please open a new [issue](https://github.com/netZoo/netZooPy/issues). \n",
|
25 | 23 | "\n",
|
26 |
| - "To illustrate how to use COBRA for different tasks, we import thyroid carcinoma (THCA) data from the TCGA project <sup>1</sup>. " |
| 24 | + "To illustrate how to use COBRA for different tasks, we import thyroid carcinoma (THCA) data from the TCGA project <sup>1</sup>. \n", |
| 25 | + "\n", |
| 26 | + "This vignette can be ran on netbooks server or locally by setting the `runserver` parameter" |
| 27 | + ] |
| 28 | + }, |
| 29 | + { |
| 30 | + "cell_type": "code", |
| 31 | + "execution_count": null, |
| 32 | + "metadata": {}, |
| 33 | + "outputs": [], |
| 34 | + "source": [ |
| 35 | + "runserver=1" |
| 36 | + ] |
| 37 | + }, |
| 38 | + { |
| 39 | + "cell_type": "markdown", |
| 40 | + "metadata": {}, |
| 41 | + "source": [ |
| 42 | + "On the server, we need to change the working directory to the `data` folder of the current useer." |
27 | 43 | ]
|
28 | 44 | },
|
29 | 45 | {
|
30 | 46 | "cell_type": "code",
|
31 |
| - "execution_count": 1, |
32 |
| - "id": "69f50403", |
| 47 | + "execution_count": null, |
33 | 48 | "metadata": {},
|
34 |
| - "outputs": [ |
35 |
| - { |
36 |
| - "name": "stdout", |
37 |
| - "output_type": "stream", |
38 |
| - "text": [ |
39 |
| - "/home/soel/Desktop/netbooks/netbooks\n" |
40 |
| - ] |
41 |
| - } |
42 |
| - ], |
| 49 | + "outputs": [], |
43 | 50 | "source": [
|
44 |
| - "cd .." |
| 51 | + "if runserver==1:\n", |
| 52 | + " ppath='/opt/data/netZooPy/cobra/'" |
45 | 53 | ]
|
46 | 54 | },
|
47 | 55 | {
|
48 | 56 | "cell_type": "code",
|
49 |
| - "execution_count": 2, |
50 |
| - "id": "4d8dd9bf", |
| 57 | + "execution_count": null, |
51 | 58 | "metadata": {},
|
52 | 59 | "outputs": [],
|
53 | 60 | "source": [
|
|
58 | 65 | },
|
59 | 66 | {
|
60 | 67 | "cell_type": "code",
|
61 |
| - "execution_count": 3, |
62 |
| - "id": "27ba906e", |
| 68 | + "execution_count": null, |
63 | 69 | "metadata": {},
|
64 | 70 | "outputs": [],
|
65 | 71 | "source": [
|
66 |
| - "gene_expression = pd.read_csv(\"data/gene_expression_thca.csv\", index_col = 0).to_numpy()\n", |
67 |
| - "metadata = pd.read_csv(\"data/thca_metadata.csv\", index_col = 0)\n", |
| 72 | + "gene_expression = pd.read_csv(ppath+\"gene_expression_thca.csv\", index_col = 0).to_numpy()\n", |
| 73 | + "metadata = pd.read_csv(ppath+\"data/thca_metadata.csv\", index_col = 0)\n", |
68 | 74 | "batch = metadata['batch'].to_numpy()\n",
|
69 | 75 | "cancer = metadata['status'].to_numpy()\n",
|
70 | 76 | "sex = metadata['sex'].to_numpy()"
|
71 | 77 | ]
|
72 | 78 | },
|
73 | 79 | {
|
74 | 80 | "cell_type": "markdown",
|
75 |
| - "id": "1b97bf47", |
76 | 81 | "metadata": {},
|
77 | 82 | "source": [
|
78 | 83 | "Here gene_expression is a gene expression matrix for 19711 genes and 572 samples. Batch, cancer, and sex are sample-specific metadata as vectors of length 572."
|
79 | 84 | ]
|
80 | 85 | },
|
81 | 86 | {
|
82 | 87 | "cell_type": "code",
|
83 |
| - "execution_count": 4, |
84 |
| - "id": "62cfa651", |
| 88 | + "execution_count": null, |
85 | 89 | "metadata": {},
|
86 |
| - "outputs": [ |
87 |
| - { |
88 |
| - "name": "stdout", |
89 |
| - "output_type": "stream", |
90 |
| - "text": [ |
91 |
| - "Gene expression shape = (19711, 572)\n", |
92 |
| - "Batch vector length = 572\n", |
93 |
| - "Cancer vector length = 572\n", |
94 |
| - "Sex vector length = 572\n" |
95 |
| - ] |
96 |
| - } |
97 |
| - ], |
| 90 | + "outputs": [], |
98 | 91 | "source": [
|
99 | 92 | "print(\"Gene expression shape = \" + str(gene_expression.shape))\n",
|
100 | 93 | "print(\"Batch vector length = \" + str(len(batch)))\n",
|
|
104 | 97 | },
|
105 | 98 | {
|
106 | 99 | "cell_type": "markdown",
|
107 |
| - "id": "71d4f89a", |
108 | 100 | "metadata": {},
|
109 | 101 | "source": [
|
110 | 102 | "## 2. Applications of COBRA\n",
|
|
121 | 113 | },
|
122 | 114 | {
|
123 | 115 | "cell_type": "code",
|
124 |
| - "execution_count": 5, |
125 |
| - "id": "d9bcc699", |
| 116 | + "execution_count": null, |
126 | 117 | "metadata": {},
|
127 |
| - "outputs": [ |
128 |
| - { |
129 |
| - "data": { |
130 |
| - "text/plain": [ |
131 |
| - "17" |
132 |
| - ] |
133 |
| - }, |
134 |
| - "execution_count": 5, |
135 |
| - "metadata": {}, |
136 |
| - "output_type": "execute_result" |
137 |
| - } |
138 |
| - ], |
| 118 | + "outputs": [], |
139 | 119 | "source": [
|
140 | 120 | "len(np.unique(batch))"
|
141 | 121 | ]
|
142 | 122 | },
|
143 | 123 | {
|
144 | 124 | "cell_type": "markdown",
|
145 |
| - "id": "d9dd368e", |
146 | 125 | "metadata": {},
|
147 | 126 | "source": [
|
148 | 127 | "For batch correction, the design matrix must contain an intercept in the first column, and the batches (encoded usy dummy coding for identifiability) in the remaining columns. "
|
149 | 128 | ]
|
150 | 129 | },
|
151 | 130 | {
|
152 | 131 | "cell_type": "code",
|
153 |
| - "execution_count": 6, |
154 |
| - "id": "9b1c0f8b", |
| 132 | + "execution_count": null, |
155 | 133 | "metadata": {},
|
156 | 134 | "outputs": [],
|
157 | 135 | "source": [
|
|
162 | 140 | },
|
163 | 141 | {
|
164 | 142 | "cell_type": "markdown",
|
165 |
| - "id": "32c6fb4f", |
166 | 143 | "metadata": {},
|
167 | 144 | "source": [
|
168 | 145 | "We get a design matrix with 17 covariates (an intercept and 16 for the dummy coding) for the 572 samples in our study. "
|
169 | 146 | ]
|
170 | 147 | },
|
171 | 148 | {
|
172 | 149 | "cell_type": "code",
|
173 |
| - "execution_count": 7, |
174 |
| - "id": "672634c4", |
| 150 | + "execution_count": null, |
175 | 151 | "metadata": {},
|
176 |
| - "outputs": [ |
177 |
| - { |
178 |
| - "data": { |
179 |
| - "text/plain": [ |
180 |
| - "(572, 17)" |
181 |
| - ] |
182 |
| - }, |
183 |
| - "execution_count": 7, |
184 |
| - "metadata": {}, |
185 |
| - "output_type": "execute_result" |
186 |
| - } |
187 |
| - ], |
| 152 | + "outputs": [], |
188 | 153 | "source": [
|
189 | 154 | "X.shape"
|
190 | 155 | ]
|
191 | 156 | },
|
192 | 157 | {
|
193 | 158 | "cell_type": "markdown",
|
194 |
| - "id": "711ee90f", |
195 | 159 | "metadata": {},
|
196 | 160 | "source": [
|
197 | 161 | "We are now ready to fit COBRA"
|
198 | 162 | ]
|
199 | 163 | },
|
200 | 164 | {
|
201 | 165 | "cell_type": "code",
|
202 |
| - "execution_count": 8, |
203 |
| - "id": "9d6ebeb7", |
| 166 | + "execution_count": null, |
204 | 167 | "metadata": {},
|
205 | 168 | "outputs": [],
|
206 | 169 | "source": [
|
|
209 | 172 | },
|
210 | 173 | {
|
211 | 174 | "cell_type": "markdown",
|
212 |
| - "id": "fafd4006", |
213 | 175 | "metadata": {},
|
214 | 176 | "source": [
|
215 | 177 | "The batch corrected network consider only the mean effect after removing the contribution of the batch variables. It is computed as follows. "
|
216 | 178 | ]
|
217 | 179 | },
|
218 | 180 | {
|
219 | 181 | "cell_type": "code",
|
220 |
| - "execution_count": 9, |
221 |
| - "id": "d9db88a2", |
| 182 | + "execution_count": null, |
222 | 183 | "metadata": {},
|
223 | 184 | "outputs": [],
|
224 | 185 | "source": [
|
|
227 | 188 | },
|
228 | 189 | {
|
229 | 190 | "cell_type": "markdown",
|
230 |
| - "id": "7d007b0f", |
231 | 191 | "metadata": {},
|
232 | 192 | "source": [
|
233 | 193 | "### 3.2 Differential co-expression analysis\n",
|
|
236 | 196 | },
|
237 | 197 | {
|
238 | 198 | "cell_type": "code",
|
239 |
| - "execution_count": 10, |
240 |
| - "id": "cc2e923c", |
| 199 | + "execution_count": null, |
241 | 200 | "metadata": {},
|
242 | 201 | "outputs": [],
|
243 | 202 | "source": [
|
|
246 | 205 | },
|
247 | 206 | {
|
248 | 207 | "cell_type": "markdown",
|
249 |
| - "id": "88330e19", |
250 | 208 | "metadata": {},
|
251 | 209 | "source": [
|
252 | 210 | "In this case, the design matrix contains an intercept an a second column with an indicator for cancer/ healthy. The additional columns are for the variables we want to adjust for. Similarly as before, we consider the batch variable. "
|
253 | 211 | ]
|
254 | 212 | },
|
255 | 213 | {
|
256 | 214 | "cell_type": "code",
|
257 |
| - "execution_count": 11, |
258 |
| - "id": "2659f530", |
| 215 | + "execution_count": null, |
259 | 216 | "metadata": {},
|
260 | 217 | "outputs": [],
|
261 | 218 | "source": [
|
|
267 | 224 | },
|
268 | 225 | {
|
269 | 226 | "cell_type": "markdown",
|
270 |
| - "id": "1a9b9bd4", |
271 | 227 | "metadata": {},
|
272 | 228 | "source": [
|
273 | 229 | "We are now ready to fit COBRA and extract the component corresponding to the differential co-expression. Since the indicator variable for cancer is the second column in our design matrix, the COBRA-adjusted differential co-expression network corresponds to the second component of COBRA's decomposition. "
|
274 | 230 | ]
|
275 | 231 | },
|
276 | 232 | {
|
277 | 233 | "cell_type": "code",
|
278 |
| - "execution_count": 12, |
279 |
| - "id": "a03f1d0e", |
| 234 | + "execution_count": null, |
280 | 235 | "metadata": {},
|
281 | 236 | "outputs": [],
|
282 | 237 | "source": [
|
|
286 | 241 | },
|
287 | 242 | {
|
288 | 243 | "cell_type": "markdown",
|
289 |
| - "id": "ea5f0359", |
290 | 244 | "metadata": {},
|
291 | 245 | "source": [
|
292 | 246 | "### 3.3 Identifying the component for a covariate of interest\n",
|
|
296 | 250 | },
|
297 | 251 | {
|
298 | 252 | "cell_type": "code",
|
299 |
| - "execution_count": 13, |
300 |
| - "id": "63764747", |
| 253 | + "execution_count": null, |
301 | 254 | "metadata": {},
|
302 | 255 | "outputs": [],
|
303 | 256 | "source": [
|
|
308 | 261 | },
|
309 | 262 | {
|
310 | 263 | "cell_type": "markdown",
|
311 |
| - "id": "55d09ce6", |
312 | 264 | "metadata": {},
|
313 | 265 | "source": [
|
314 | 266 | "With this design, the last component of COBRA's decomposition describes the sex differes in cancer between male and females. "
|
315 | 267 | ]
|
316 | 268 | },
|
317 | 269 | {
|
318 | 270 | "cell_type": "code",
|
319 |
| - "execution_count": 14, |
320 |
| - "id": "e46405d0", |
| 271 | + "execution_count": null, |
321 | 272 | "metadata": {},
|
322 | 273 | "outputs": [],
|
323 | 274 | "source": [
|
|
327 | 278 | },
|
328 | 279 | {
|
329 | 280 | "cell_type": "markdown",
|
330 |
| - "id": "afdf8a4d", |
331 | 281 | "metadata": {},
|
332 | 282 | "source": [
|
333 | 283 | "## Reference\n",
|
|
338 | 288 | ],
|
339 | 289 | "metadata": {
|
340 | 290 | "kernelspec": {
|
341 |
| - "display_name": "Python 3 (ipykernel)", |
| 291 | + "display_name": "Python 3", |
342 | 292 | "language": "python",
|
343 | 293 | "name": "python3"
|
344 | 294 | },
|
|
352 | 302 | "name": "python",
|
353 | 303 | "nbconvert_exporter": "python",
|
354 | 304 | "pygments_lexer": "ipython3",
|
355 |
| - "version": "3.10.12" |
| 305 | + "version": "3.9.7" |
356 | 306 | }
|
357 | 307 | },
|
358 | 308 | "nbformat": 4,
|
|
0 commit comments