|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 | 4 | "cell_type": "markdown",
|
| 5 | + "id": "7e26ed88", |
5 | 6 | "metadata": {},
|
6 | 7 | "source": [
|
7 | 8 | "# Decomposing gene co-expression networks with COBRA (Python version)\n",
|
|
12 | 13 | },
|
13 | 14 | {
|
14 | 15 | "cell_type": "markdown",
|
| 16 | + "id": "8520e757", |
15 | 17 | "metadata": {},
|
16 | 18 | "source": [
|
17 | 19 | "## 1. Introduction\n",
|
|
29 | 31 | {
|
30 | 32 | "cell_type": "code",
|
31 | 33 | "execution_count": null,
|
| 34 | + "id": "fe567bf2", |
32 | 35 | "metadata": {},
|
33 | 36 | "outputs": [],
|
34 | 37 | "source": [
|
|
37 | 40 | },
|
38 | 41 | {
|
39 | 42 | "cell_type": "markdown",
|
| 43 | + "id": "c27aff27", |
40 | 44 | "metadata": {},
|
41 | 45 | "source": [
|
42 | 46 | "On the server, we need to change the working directory to the `data` folder of the current useer."
|
|
45 | 49 | {
|
46 | 50 | "cell_type": "code",
|
47 | 51 | "execution_count": null,
|
| 52 | + "id": "7e5dc5ad", |
48 | 53 | "metadata": {},
|
49 | 54 | "outputs": [],
|
50 | 55 | "source": [
|
|
55 | 60 | {
|
56 | 61 | "cell_type": "code",
|
57 | 62 | "execution_count": null,
|
| 63 | + "id": "402463e2", |
58 | 64 | "metadata": {},
|
59 | 65 | "outputs": [],
|
60 | 66 | "source": [
|
|
66 | 72 | {
|
67 | 73 | "cell_type": "code",
|
68 | 74 | "execution_count": null,
|
| 75 | + "id": "1f43ac8c", |
69 | 76 | "metadata": {},
|
70 | 77 | "outputs": [],
|
71 | 78 | "source": [
|
72 | 79 | "gene_expression = pd.read_csv(ppath+\"gene_expression_thca.csv\", index_col = 0).to_numpy()\n",
|
73 |
| - "metadata = pd.read_csv(ppath+\"data/thca_metadata.csv\", index_col = 0)\n", |
| 80 | + "metadata = pd.read_csv(ppath+\"thca_metadata.csv\", index_col = 0)\n", |
74 | 81 | "batch = metadata['batch'].to_numpy()\n",
|
75 | 82 | "cancer = metadata['status'].to_numpy()\n",
|
76 | 83 | "sex = metadata['sex'].to_numpy()"
|
77 | 84 | ]
|
78 | 85 | },
|
79 | 86 | {
|
80 | 87 | "cell_type": "markdown",
|
| 88 | + "id": "f01301b9", |
81 | 89 | "metadata": {},
|
82 | 90 | "source": [
|
83 | 91 | "Here gene_expression is a gene expression matrix for 19711 genes and 572 samples. Batch, cancer, and sex are sample-specific metadata as vectors of length 572."
|
|
86 | 94 | {
|
87 | 95 | "cell_type": "code",
|
88 | 96 | "execution_count": null,
|
| 97 | + "id": "eefe741a", |
89 | 98 | "metadata": {},
|
90 | 99 | "outputs": [],
|
91 | 100 | "source": [
|
|
97 | 106 | },
|
98 | 107 | {
|
99 | 108 | "cell_type": "markdown",
|
| 109 | + "id": "e23e09b2", |
100 | 110 | "metadata": {},
|
101 | 111 | "source": [
|
102 | 112 | "## 2. Applications of COBRA\n",
|
|
114 | 124 | {
|
115 | 125 | "cell_type": "code",
|
116 | 126 | "execution_count": null,
|
| 127 | + "id": "56c0ce1f", |
117 | 128 | "metadata": {},
|
118 | 129 | "outputs": [],
|
119 | 130 | "source": [
|
|
122 | 133 | },
|
123 | 134 | {
|
124 | 135 | "cell_type": "markdown",
|
| 136 | + "id": "19e30957", |
125 | 137 | "metadata": {},
|
126 | 138 | "source": [
|
127 | 139 | "For batch correction, the design matrix must contain an intercept in the first column, and the batches (encoded usy dummy coding for identifiability) in the remaining columns. "
|
|
130 | 142 | {
|
131 | 143 | "cell_type": "code",
|
132 | 144 | "execution_count": null,
|
| 145 | + "id": "008a3832", |
133 | 146 | "metadata": {},
|
134 | 147 | "outputs": [],
|
135 | 148 | "source": [
|
|
140 | 153 | },
|
141 | 154 | {
|
142 | 155 | "cell_type": "markdown",
|
| 156 | + "id": "db8d69f1", |
143 | 157 | "metadata": {},
|
144 | 158 | "source": [
|
145 | 159 | "We get a design matrix with 17 covariates (an intercept and 16 for the dummy coding) for the 572 samples in our study. "
|
|
148 | 162 | {
|
149 | 163 | "cell_type": "code",
|
150 | 164 | "execution_count": null,
|
| 165 | + "id": "45b0a40d", |
151 | 166 | "metadata": {},
|
152 | 167 | "outputs": [],
|
153 | 168 | "source": [
|
|
156 | 171 | },
|
157 | 172 | {
|
158 | 173 | "cell_type": "markdown",
|
| 174 | + "id": "9556f9b9", |
159 | 175 | "metadata": {},
|
160 | 176 | "source": [
|
161 | 177 | "We are now ready to fit COBRA"
|
|
164 | 180 | {
|
165 | 181 | "cell_type": "code",
|
166 | 182 | "execution_count": null,
|
| 183 | + "id": "c0b3776b", |
167 | 184 | "metadata": {},
|
168 | 185 | "outputs": [],
|
169 | 186 | "source": [
|
|
172 | 189 | },
|
173 | 190 | {
|
174 | 191 | "cell_type": "markdown",
|
| 192 | + "id": "f07cae8c", |
175 | 193 | "metadata": {},
|
176 | 194 | "source": [
|
177 | 195 | "The batch corrected network consider only the mean effect after removing the contribution of the batch variables. It is computed as follows. "
|
|
180 | 198 | {
|
181 | 199 | "cell_type": "code",
|
182 | 200 | "execution_count": null,
|
| 201 | + "id": "feb8b5a5", |
183 | 202 | "metadata": {},
|
184 | 203 | "outputs": [],
|
185 | 204 | "source": [
|
|
188 | 207 | },
|
189 | 208 | {
|
190 | 209 | "cell_type": "markdown",
|
| 210 | + "id": "3c577a75", |
191 | 211 | "metadata": {},
|
192 | 212 | "source": [
|
193 | 213 | "### 3.2 Differential co-expression analysis\n",
|
|
197 | 217 | {
|
198 | 218 | "cell_type": "code",
|
199 | 219 | "execution_count": null,
|
| 220 | + "id": "fc0a5747", |
200 | 221 | "metadata": {},
|
201 | 222 | "outputs": [],
|
202 | 223 | "source": [
|
|
205 | 226 | },
|
206 | 227 | {
|
207 | 228 | "cell_type": "markdown",
|
| 229 | + "id": "21a412df", |
208 | 230 | "metadata": {},
|
209 | 231 | "source": [
|
210 | 232 | "In this case, the design matrix contains an intercept an a second column with an indicator for cancer/ healthy. The additional columns are for the variables we want to adjust for. Similarly as before, we consider the batch variable. "
|
|
213 | 235 | {
|
214 | 236 | "cell_type": "code",
|
215 | 237 | "execution_count": null,
|
| 238 | + "id": "d26518b1", |
216 | 239 | "metadata": {},
|
217 | 240 | "outputs": [],
|
218 | 241 | "source": [
|
|
224 | 247 | },
|
225 | 248 | {
|
226 | 249 | "cell_type": "markdown",
|
| 250 | + "id": "0df93493", |
227 | 251 | "metadata": {},
|
228 | 252 | "source": [
|
229 | 253 | "We are now ready to fit COBRA and extract the component corresponding to the differential co-expression. Since the indicator variable for cancer is the second column in our design matrix, the COBRA-adjusted differential co-expression network corresponds to the second component of COBRA's decomposition. "
|
|
232 | 256 | {
|
233 | 257 | "cell_type": "code",
|
234 | 258 | "execution_count": null,
|
| 259 | + "id": "3887dba5", |
235 | 260 | "metadata": {},
|
236 | 261 | "outputs": [],
|
237 | 262 | "source": [
|
|
241 | 266 | },
|
242 | 267 | {
|
243 | 268 | "cell_type": "markdown",
|
| 269 | + "id": "15b8e757", |
244 | 270 | "metadata": {},
|
245 | 271 | "source": [
|
246 | 272 | "### 3.3 Identifying the component for a covariate of interest\n",
|
|
251 | 277 | {
|
252 | 278 | "cell_type": "code",
|
253 | 279 | "execution_count": null,
|
| 280 | + "id": "c3f136eb", |
254 | 281 | "metadata": {},
|
255 | 282 | "outputs": [],
|
256 | 283 | "source": [
|
|
261 | 288 | },
|
262 | 289 | {
|
263 | 290 | "cell_type": "markdown",
|
| 291 | + "id": "78f33e7b", |
264 | 292 | "metadata": {},
|
265 | 293 | "source": [
|
266 | 294 | "With this design, the last component of COBRA's decomposition describes the sex differes in cancer between male and females. "
|
|
269 | 297 | {
|
270 | 298 | "cell_type": "code",
|
271 | 299 | "execution_count": null,
|
| 300 | + "id": "a55644d4", |
272 | 301 | "metadata": {},
|
273 | 302 | "outputs": [],
|
274 | 303 | "source": [
|
|
278 | 307 | },
|
279 | 308 | {
|
280 | 309 | "cell_type": "markdown",
|
| 310 | + "id": "8286994c", |
281 | 311 | "metadata": {},
|
282 | 312 | "source": [
|
283 | 313 | "## Reference\n",
|
|
288 | 318 | ],
|
289 | 319 | "metadata": {
|
290 | 320 | "kernelspec": {
|
291 |
| - "display_name": "Python 3", |
| 321 | + "display_name": "Python 3 (ipykernel)", |
292 | 322 | "language": "python",
|
293 | 323 | "name": "python3"
|
294 | 324 | },
|
|
302 | 332 | "name": "python",
|
303 | 333 | "nbconvert_exporter": "python",
|
304 | 334 | "pygments_lexer": "ipython3",
|
305 |
| - "version": "3.9.7" |
| 335 | + "version": "3.10.12" |
306 | 336 | }
|
307 | 337 | },
|
308 | 338 | "nbformat": 4,
|
|
0 commit comments