Skip to content

[Question] Why x_hate_gate is computed with un-normalized self.ae.decoder.weight.detach().T? #54

@jasonrichdarmawan

Description

@jasonrichdarmawan

Question

Suppose ConstrainedAdam optimizer is not used

Wouldn't this allow the model to cheat the L_sparse in the next step?

For example:

  1. 1st step: f_gate shrinks slightly because of L_sparse
  2. 2nd step: L_recon increase because f_gate shrinks in the first step (affecting f), so now it tries to compensate by increasing decoder weights
  3. and the pattern continues

x_hat_gate = f_gate @ self.ae.decoder.weight.detach().T + self.ae.decoder_bias.detach()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions