-
Notifications
You must be signed in to change notification settings - Fork 221
Description
🚀 Feature Request
Transformers can be a flexible embedding network for general data modalities. We currently have permutation-invariant networks, whereas plain transformers are permutation equivariant (allowing support for exchangeable but not independent data). With suitable positional embeddings, this can also serve as a general embedding network.
Describe the solution you'd like
Todo so the following steps have to be completed:
- Add a PyTorch transformer class here
- Currently, all flows will need a "statically" size input. So, the output sequence of the transformer needs to be "pooled" into a single vector of fixed dimension. There are multiple ways to do this and this needs some testing/literature research on what we want to use as default (but multiple methods can be implemented).
- Add tests
📌 Additional Context
Currently, other "sequence" models like the permutation-invariant networks support learning on sequences of different sizes in parallel using "nan"-padding. One can think of adding this support here, too (if not please add an additional issue).
The issue #1324 #218 does currently soft-block variable sequence lengths, but should not have an effect on this feature request.