[Question] Why x_hate_gate is computed with un-normalized self.ae.decoder.weight.detach().T?

# Question

Suppose `ConstrainedAdam` optimizer is not used

Wouldn't this allow the model to cheat the `L_sparse` in the next step?

For example:
1. 1st step: `f_gate` shrinks slightly because of `L_sparse`
2. 2nd step: `L_recon` increase because `f_gate` shrinks in the first step (affecting `f`), so now it tries to compensate by increasing decoder weights
3. and the pattern continues

https://github.com/saprmarks/dictionary_learning/blob/60ec6bf5264944d64a4ca271f45a29ebfb9d4946/dictionary_learning/trainers/gdm.py#L80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Why x_hate_gate is computed with un-normalized self.ae.decoder.weight.detach().T? #54

Question

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question] Why x_hate_gate is computed with un-normalized self.ae.decoder.weight.detach().T? #54

Description

Question

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions