Skip to content

Commit 7763a55

Browse files
committed
add cobra pythonn
1 parent c24cfc1 commit 7763a55

File tree

2 files changed

+47
-97
lines changed

2 files changed

+47
-97
lines changed

netbooks/Welcome_to_netBooks.ipynb

+4-4
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,8 @@
102102
" \n",
103103
" - [Uncovering Associations among Genes and Phenotypes with SEAHORSE](netZooR/seahorse.ipynb)\n",
104104
" \n",
105-
" - [Decomposing gene co-expression networks with COBRA (R version)](netZooR/COBRA.ipynb)\n",
105+
" - [Decomposing gene co-expression networks with COBRA](netZooR/COBRA.ipynb)\n",
106106
" \n",
107-
" - [Decomposing gene co-expression networks with COBRA (Python version)](netZooPy/cobra.ipynb)\n",
108-
" \n",
109107
" - Case studies\n",
110108
" \n",
111109
" - [Building PANDA and LIONESS Regulatory Networks from GTEx Gene Expression Data in R](netZooR/ApplicationinGTExData.ipynb)\n",
@@ -152,6 +150,8 @@
152150
" - [Identifying mutation networks using SAMBAR](netZooPy/sambar_tutorial.ipynb)\n",
153151
" \n",
154152
" - [DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks](netZooPy/dragon_tutorial.ipynb)\n",
153+
" \n",
154+
" - [Decomposing gene co-expression networks with COBRA](netZooPy/cobra.ipynb)\n",
155155
"\n",
156156
" - Case studies\n",
157157
"\n",
@@ -192,7 +192,7 @@
192192
"mimetype": "text/x-r-source",
193193
"name": "R",
194194
"pygments_lexer": "r",
195-
"version": "4.3.0"
195+
"version": "4.2.2"
196196
}
197197
},
198198
"nbformat": 4,

netbooks/netZooPy/cobra.ipynb

+43-93
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "ff46b205",
65
"metadata": {},
76
"source": [
87
"# Decomposing gene co-expression networks with COBRA (Python version)\n",
@@ -13,7 +12,6 @@
1312
},
1413
{
1514
"cell_type": "markdown",
16-
"id": "71d4f5e4",
1715
"metadata": {},
1816
"source": [
1917
"## 1. Introduction\n",
@@ -23,31 +21,40 @@
2321
"\n",
2422
"COBRA is now part of the [netZooPy package](https://github.com/netZoo/netZooPy). Please follow the installation guidelines on the [README](https://github.com/netZoo/netZooPy/blob/master/README.md). If you need help or if you have any question about netZoo, feel free to start with [discussions](https://github.com/netZoo/netZooPy/discussions). To report a bug, please open a new [issue](https://github.com/netZoo/netZooPy/issues). \n",
2523
"\n",
26-
"To illustrate how to use COBRA for different tasks, we import thyroid carcinoma (THCA) data from the TCGA project <sup>1</sup>. "
24+
"To illustrate how to use COBRA for different tasks, we import thyroid carcinoma (THCA) data from the TCGA project <sup>1</sup>. \n",
25+
"\n",
26+
"This vignette can be ran on netbooks server or locally by setting the `runserver` parameter"
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"execution_count": null,
32+
"metadata": {},
33+
"outputs": [],
34+
"source": [
35+
"runserver=1"
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"metadata": {},
41+
"source": [
42+
"On the server, we need to change the working directory to the `data` folder of the current useer."
2743
]
2844
},
2945
{
3046
"cell_type": "code",
31-
"execution_count": 1,
32-
"id": "69f50403",
47+
"execution_count": null,
3348
"metadata": {},
34-
"outputs": [
35-
{
36-
"name": "stdout",
37-
"output_type": "stream",
38-
"text": [
39-
"/home/soel/Desktop/netbooks/netbooks\n"
40-
]
41-
}
42-
],
49+
"outputs": [],
4350
"source": [
44-
"cd .."
51+
"if runserver==1:\n",
52+
" ppath='/opt/data/netZooPy/cobra/'"
4553
]
4654
},
4755
{
4856
"cell_type": "code",
49-
"execution_count": 2,
50-
"id": "4d8dd9bf",
57+
"execution_count": null,
5158
"metadata": {},
5259
"outputs": [],
5360
"source": [
@@ -58,43 +65,29 @@
5865
},
5966
{
6067
"cell_type": "code",
61-
"execution_count": 3,
62-
"id": "27ba906e",
68+
"execution_count": null,
6369
"metadata": {},
6470
"outputs": [],
6571
"source": [
66-
"gene_expression = pd.read_csv(\"data/gene_expression_thca.csv\", index_col = 0).to_numpy()\n",
67-
"metadata = pd.read_csv(\"data/thca_metadata.csv\", index_col = 0)\n",
72+
"gene_expression = pd.read_csv(ppath+\"gene_expression_thca.csv\", index_col = 0).to_numpy()\n",
73+
"metadata = pd.read_csv(ppath+\"data/thca_metadata.csv\", index_col = 0)\n",
6874
"batch = metadata['batch'].to_numpy()\n",
6975
"cancer = metadata['status'].to_numpy()\n",
7076
"sex = metadata['sex'].to_numpy()"
7177
]
7278
},
7379
{
7480
"cell_type": "markdown",
75-
"id": "1b97bf47",
7681
"metadata": {},
7782
"source": [
7883
"Here gene_expression is a gene expression matrix for 19711 genes and 572 samples. Batch, cancer, and sex are sample-specific metadata as vectors of length 572."
7984
]
8085
},
8186
{
8287
"cell_type": "code",
83-
"execution_count": 4,
84-
"id": "62cfa651",
88+
"execution_count": null,
8589
"metadata": {},
86-
"outputs": [
87-
{
88-
"name": "stdout",
89-
"output_type": "stream",
90-
"text": [
91-
"Gene expression shape = (19711, 572)\n",
92-
"Batch vector length = 572\n",
93-
"Cancer vector length = 572\n",
94-
"Sex vector length = 572\n"
95-
]
96-
}
97-
],
90+
"outputs": [],
9891
"source": [
9992
"print(\"Gene expression shape = \" + str(gene_expression.shape))\n",
10093
"print(\"Batch vector length = \" + str(len(batch)))\n",
@@ -104,7 +97,6 @@
10497
},
10598
{
10699
"cell_type": "markdown",
107-
"id": "71d4f89a",
108100
"metadata": {},
109101
"source": [
110102
"## 2. Applications of COBRA\n",
@@ -121,37 +113,23 @@
121113
},
122114
{
123115
"cell_type": "code",
124-
"execution_count": 5,
125-
"id": "d9bcc699",
116+
"execution_count": null,
126117
"metadata": {},
127-
"outputs": [
128-
{
129-
"data": {
130-
"text/plain": [
131-
"17"
132-
]
133-
},
134-
"execution_count": 5,
135-
"metadata": {},
136-
"output_type": "execute_result"
137-
}
138-
],
118+
"outputs": [],
139119
"source": [
140120
"len(np.unique(batch))"
141121
]
142122
},
143123
{
144124
"cell_type": "markdown",
145-
"id": "d9dd368e",
146125
"metadata": {},
147126
"source": [
148127
"For batch correction, the design matrix must contain an intercept in the first column, and the batches (encoded usy dummy coding for identifiability) in the remaining columns. "
149128
]
150129
},
151130
{
152131
"cell_type": "code",
153-
"execution_count": 6,
154-
"id": "9b1c0f8b",
132+
"execution_count": null,
155133
"metadata": {},
156134
"outputs": [],
157135
"source": [
@@ -162,45 +140,30 @@
162140
},
163141
{
164142
"cell_type": "markdown",
165-
"id": "32c6fb4f",
166143
"metadata": {},
167144
"source": [
168145
"We get a design matrix with 17 covariates (an intercept and 16 for the dummy coding) for the 572 samples in our study. "
169146
]
170147
},
171148
{
172149
"cell_type": "code",
173-
"execution_count": 7,
174-
"id": "672634c4",
150+
"execution_count": null,
175151
"metadata": {},
176-
"outputs": [
177-
{
178-
"data": {
179-
"text/plain": [
180-
"(572, 17)"
181-
]
182-
},
183-
"execution_count": 7,
184-
"metadata": {},
185-
"output_type": "execute_result"
186-
}
187-
],
152+
"outputs": [],
188153
"source": [
189154
"X.shape"
190155
]
191156
},
192157
{
193158
"cell_type": "markdown",
194-
"id": "711ee90f",
195159
"metadata": {},
196160
"source": [
197161
"We are now ready to fit COBRA"
198162
]
199163
},
200164
{
201165
"cell_type": "code",
202-
"execution_count": 8,
203-
"id": "9d6ebeb7",
166+
"execution_count": null,
204167
"metadata": {},
205168
"outputs": [],
206169
"source": [
@@ -209,16 +172,14 @@
209172
},
210173
{
211174
"cell_type": "markdown",
212-
"id": "fafd4006",
213175
"metadata": {},
214176
"source": [
215177
"The batch corrected network consider only the mean effect after removing the contribution of the batch variables. It is computed as follows. "
216178
]
217179
},
218180
{
219181
"cell_type": "code",
220-
"execution_count": 9,
221-
"id": "d9db88a2",
182+
"execution_count": null,
222183
"metadata": {},
223184
"outputs": [],
224185
"source": [
@@ -227,7 +188,6 @@
227188
},
228189
{
229190
"cell_type": "markdown",
230-
"id": "7d007b0f",
231191
"metadata": {},
232192
"source": [
233193
"### 3.2 Differential co-expression analysis\n",
@@ -236,8 +196,7 @@
236196
},
237197
{
238198
"cell_type": "code",
239-
"execution_count": 10,
240-
"id": "cc2e923c",
199+
"execution_count": null,
241200
"metadata": {},
242201
"outputs": [],
243202
"source": [
@@ -246,16 +205,14 @@
246205
},
247206
{
248207
"cell_type": "markdown",
249-
"id": "88330e19",
250208
"metadata": {},
251209
"source": [
252210
"In this case, the design matrix contains an intercept an a second column with an indicator for cancer/ healthy. The additional columns are for the variables we want to adjust for. Similarly as before, we consider the batch variable. "
253211
]
254212
},
255213
{
256214
"cell_type": "code",
257-
"execution_count": 11,
258-
"id": "2659f530",
215+
"execution_count": null,
259216
"metadata": {},
260217
"outputs": [],
261218
"source": [
@@ -267,16 +224,14 @@
267224
},
268225
{
269226
"cell_type": "markdown",
270-
"id": "1a9b9bd4",
271227
"metadata": {},
272228
"source": [
273229
"We are now ready to fit COBRA and extract the component corresponding to the differential co-expression. Since the indicator variable for cancer is the second column in our design matrix, the COBRA-adjusted differential co-expression network corresponds to the second component of COBRA's decomposition. "
274230
]
275231
},
276232
{
277233
"cell_type": "code",
278-
"execution_count": 12,
279-
"id": "a03f1d0e",
234+
"execution_count": null,
280235
"metadata": {},
281236
"outputs": [],
282237
"source": [
@@ -286,7 +241,6 @@
286241
},
287242
{
288243
"cell_type": "markdown",
289-
"id": "ea5f0359",
290244
"metadata": {},
291245
"source": [
292246
"### 3.3 Identifying the component for a covariate of interest\n",
@@ -296,8 +250,7 @@
296250
},
297251
{
298252
"cell_type": "code",
299-
"execution_count": 13,
300-
"id": "63764747",
253+
"execution_count": null,
301254
"metadata": {},
302255
"outputs": [],
303256
"source": [
@@ -308,16 +261,14 @@
308261
},
309262
{
310263
"cell_type": "markdown",
311-
"id": "55d09ce6",
312264
"metadata": {},
313265
"source": [
314266
"With this design, the last component of COBRA's decomposition describes the sex differes in cancer between male and females. "
315267
]
316268
},
317269
{
318270
"cell_type": "code",
319-
"execution_count": 14,
320-
"id": "e46405d0",
271+
"execution_count": null,
321272
"metadata": {},
322273
"outputs": [],
323274
"source": [
@@ -327,7 +278,6 @@
327278
},
328279
{
329280
"cell_type": "markdown",
330-
"id": "afdf8a4d",
331281
"metadata": {},
332282
"source": [
333283
"## Reference\n",
@@ -338,7 +288,7 @@
338288
],
339289
"metadata": {
340290
"kernelspec": {
341-
"display_name": "Python 3 (ipykernel)",
291+
"display_name": "Python 3",
342292
"language": "python",
343293
"name": "python3"
344294
},
@@ -352,7 +302,7 @@
352302
"name": "python",
353303
"nbconvert_exporter": "python",
354304
"pygments_lexer": "ipython3",
355-
"version": "3.10.12"
305+
"version": "3.9.7"
356306
}
357307
},
358308
"nbformat": 4,

0 commit comments

Comments
 (0)