Skip to content
Discussion options

You must be logged in to vote

Hi @yangysc, sorry for the delay, I was very busy with deadlines and my thesis.

The hyper-network self.hyper is a MaskedMLP. The goal of this network is to make the parameters $\phi_i$ of the transformation $y_i = f(x_i; \phi_i)$ only dependent on preceding features $x_{<i}$ and the context $c$. This is done with a series of masks that depend on the ordering of the variables.

If the hyper-network was a single MaskedLinear layer, then what you propose would have (almost) worked (it would be MaskedLinear(features) + Linear(context)). However, we want $\phi_i$ to be a non-linear combination of $x_{<i}$ and $c$. Therefore, after the first layer we have to use MaskedLinear layers only.

P…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by francois-rozet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
enhancement New feature or request
2 participants
Converted from issue

This discussion was converted from issue #67 on October 03, 2025 15:43.