I am having trouble understanding why for term 1, the sum is taken between the two computed entropy terms:
|
# E [ log Q(s|pi) - log Q(s|o,pi) ] |
|
term1 = - tf.reduce_sum(entropy_normal_from_logvar(ps1_logvar) + entropy_normal_from_logvar(qs1_logvar), axis=1) |
I understand this gives the sum of two entropy terms (where ps1 $= p_\tau = s_\tau|\pi$ and qs1 $= q_\tau = s_\tau|o_\tau,\pi$ ):
$$- \sum_\tau ( \frac{1}{2} \log(2 e \pi \sigma^2_{p_\tau} ) + \frac{1}{2} \log (2 e \pi \sigma^2_{q_\tau} ) )
\xrightarrow{\text{simplify}} - \sum_\tau ( H_{p_\tau} + H_{q_\tau} ) $$
But in the paper, we see that term 1 is given by:
$$\sum_\tau E_{Q(\theta | \pi)}[E_{Q(o_\tau | \theta, \pi)}[H(s_\tau | o_\tau, \pi)] - H(s_\tau | \pi)]
\xrightarrow{\text{simplify}} + \sum_\tau ( H_{q_\tau} - H_{p_\tau} ) $$
Why is there the discrepancy between the "+" and the "-"? Or where is my understanding breaking down? Am I simplifying the equations incorrectly? If so, can you explain how to correctly transform between the two?
I am having trouble understanding why for term 1, the sum is taken between the two computed entropy terms:
deep-active-inference-mc/src/tfmodel.py
Lines 343 to 344 in c40ef0d
I understand this gives the sum of two entropy terms (where$= p_\tau = s_\tau|\pi$ and $= q_\tau = s_\tau|o_\tau,\pi$ ):
ps1qs1But in the paper, we see that term 1 is given by:
Why is there the discrepancy between the "+" and the "-"? Or where is my understanding breaking down? Am I simplifying the equations incorrectly? If so, can you explain how to correctly transform between the two?