Skip to content

loci RNA association

Sergey Venev edited this page Feb 1, 2016 · 4 revisions

nothing to note so far. See read for brief description.

Remarks on GLPK integer optimization for RNA folding

In our study, we are trying to determine structure of viral RNA (vRNA) encapsidated by nucleoprotein (NP). This structure is believed to be local: bulk of vRNA is condensed on NP, while the most stable hairpins are protruding from RNA-protein complex (vRNP).

We are trying to unveil those stable hairpins using local RNA folding. RNALfold local RNA folder from ViennaRNA package yields a set of local RNA structures (hairpins) smaller than window size. These hairpins are overlapping with each other in general case. To mimic encapsidated vRNA structure we select the best subset of non-overlapping hairpins from RNALfold output.

In our case, the best, implies that these hairpins yield minimum free energy of the vRNP, RNA-protein complex. Free energy functional:

F= a_1*(dg_1 - dg_NP)*L_1 + a_2*(dg_2 - dg_NP)*L_2 + ...,

where dg_i is MFE of i hairpin per nucleotide, dg_NP - free energy for nucleotide in NP-condensed form, L_i - hairpin's size in nucleotides, and a_i are binary coefficien assuming 1 if i hairpin is included in global RNA structure, and 0 otherwise. We are minimizing such a functional with respect to the set of inclusion coefficients a_1,a_2,..., given the non-overlapping constraint.

It is a Mixed Integer Programming (MIP) problem, with the objective function F(a_1,a_2,...), where (dg_i - dg_NP)*L_i are objective coefficients, and coefficients a_1,a_2,... are bound by the non-overlapping constraint. The constraint can be formalized as a set of linear inequalities, derived from matrix OM of hairpin overlap: OM_ij = 1 if hairpins i and j are overlapping and OM_ij = 0 otherwise. OM is symmetrical by definition and diagonal elements can be set OM_ii = 1 as well, see OM example: OM example

Let us consider following example: hairpin 1 overlaps with 2, 2 overlaps with 3,4, and 5, while 3 and 4 overlaps with each other but not with 5:

1 ...((..))...........................
2 .......((((((....)))))).............
3 ...........((..))...................
4 ............(((..)))................
5 ......................(((...))).....

Given this set of hairpins (presumably RNALfold output), we can write non-overlap constraints as follows:

0<= a_1 + a_2 <=1
0<= a_2 + a_3 + a_4 <=1
0<= a_2 + a_5 <=1,

i.e., 1 and 2 are not allowed at the same time, 2,3,4 - as well, yet both 1,3 and 1,4 combinations are permitted, similarly combination 2,5 is prohibited, while 5 can combine with any of 1,3,4. Both this formalism and vocal description make sense, however it is hard to generate such a set of inequalities programmatically. Instead, we use an equivalent pair-wise overlap constraints. Considering the same example the set of constraint inequalities would be:

0<= a_1 + a_2 <=1 (1)
0<= a_2 + a_3 <=1 (2)
0<= a_2 + a_4 <=1 (3)
0<= a_2 + a_5 <=1 (4)
0<= a_3 + a_4 <=1 (5)

Similar inequalities split into pairs. Equivalency can be demonstrated by example: summing inequalities (2),(3), and a_3+a_4 = a_3+a_4 yields:

0 <= 2*a_2 + 2*a_3 + 2*a_4 <= 2 + (a_3+a_4),

using (5), this turns into

0 <= a_2 + a_3 + a_4 <= 3/2,

which is equivalent to 0 <= a_2 + a_3 + a_4 <= 1, because all variables a_i are binary, i.e. either 1 or 0. Thus we arrived at condensed form of constraint inequalities that is easier to argue about. Yet, demonstrated pair-wise form of constraints is easy to generate programmatically, so it'll be used in the corresponding section of the scripts.

It is also important to mention the difference between the hairpins' overlap matrix OM and the matrix of constraint coefficients derived from either condensed or pair-wise form of constraints. As it turned out, the two can be confused after a break in influenza research quite easily. Confusion can actually lead to reasonable results, affecting selection of hairpins. Example of the results:

 ...((..))........................... -2
 .......((((((....))))))............. -6
 ...........((..))................... -2
 ............(((..)))................ -3
 ......................(((...)))..... -3

 ...((..))...(((..)))..(((...)))..... -2-3-3=-8

if by a strange coincidence largest hairpins would have more favorable MFE, then

 ...((..))........................... -2
 .......((((((....))))))............. -11
 ...........((..))................... -2
 ............(((..)))................ -3
 ......................(((...)))..... -3

 .......((((((....))))))............. -11

In case 4 and 5 would be longer, and overlap, then for example: ble MFE, then

 ...((..))........................... -2
 .......((((((....))))))............. -6
 ...........((..))................... -2
 ............((((..)))).............. -4
 ....................((((...))))..... -4

 ...((..))..((..))...((((...))))..... -2-2-4=-8

For this particular example straightforward constrains would yield:

0<= a_1 + a_2 <=1
0<= a_2 + a_3 + a_4 <=1
0<= a_2 + a_4 + a_5 <=1,

whereas the pair-wise form would produce:

0<= a_1 + a_2 <=1 (1)
0<= a_2 + a_3 <=1 (2)
0<= a_2 + a_4 <=1 (3)
0<= a_2 + a_5 <=1 (4)
0<= a_3 + a_4 <=1 (5)
0<= a_4 + a_5 <=1 (6)

where (2),(4) and (5) can be again turned to 0<= a_2 + a_3 + a_4 <=1, and (3),(4) and (6) - into 0<= a_2 + a_4 + a_5 <=1. With this we can conclude out remark on the MIP problem for RNA folding and update flu_module accordingly.

Clone this wiki locally