Skip to content

Calculation of Term 1 in G #4

@nickypro

Description

@nickypro

I am having trouble understanding why for term 1, the sum is taken between the two computed entropy terms:

# E [ log Q(s|pi) - log Q(s|o,pi) ]
term1 = - tf.reduce_sum(entropy_normal_from_logvar(ps1_logvar) + entropy_normal_from_logvar(qs1_logvar), axis=1)

I understand this gives the sum of two entropy terms (where ps1 $= p_\tau = s_\tau|\pi$ and qs1 $= q_\tau = s_\tau|o_\tau,\pi$ ):

$$- \sum_\tau ( \frac{1}{2} \log(2 e \pi \sigma^2_{p_\tau} ) + \frac{1}{2} \log (2 e \pi \sigma^2_{q_\tau} ) ) \xrightarrow{\text{simplify}} - \sum_\tau ( H_{p_\tau} + H_{q_\tau} ) $$

But in the paper, we see that term 1 is given by:

$$\sum_\tau E_{Q(\theta | \pi)}[E_{Q(o_\tau | \theta, \pi)}[H(s_\tau | o_\tau, \pi)] - H(s_\tau | \pi)] \xrightarrow{\text{simplify}} + \sum_\tau ( H_{q_\tau} - H_{p_\tau} ) $$

Why is there the discrepancy between the "+" and the "-"? Or where is my understanding breaking down? Am I simplifying the equations incorrectly? If so, can you explain how to correctly transform between the two?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions