You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our RNNs are functional and expressive right now, but we should think a bit about what optimisations we need.
CUDNN exposes both chains of RNNs (e.g. 3 stacked LSTMs) and input of sequences via concatenated arrays (as opposed to calling the forward pass multiple times in sequence), and most frameworks follow this model. These are easy to expose similarly in Flux -- perhaps via having gpu convert to some CuLSTM type when possible -- but ideally our AD and compiler is good enough to make them unnecessary (they are not custom kernels, just hand-coded C++ backprop).
Then there's batching; ideally this is transparent to the user via Hydra, but perhaps we still expose some padding/masking primitives and utilities.
Our current model for RNNs is pretty nice in that it's very close to the intuitive mental model; the question really is whether any of these future optimisations might require be incompatible with that design. So far though, it's been fairly effective to ignore CUDNN's programming model entirely and figure it out later.
The text was updated successfully, but these errors were encountered:
I think this is covered by the more well-developed list at #1678. The 3D interface is already implemented in Flux 0.12 (though not completely optimized for accelerators).
Our RNNs are functional and expressive right now, but we should think a bit about what optimisations we need.
CUDNN exposes both chains of RNNs (e.g. 3 stacked LSTMs) and input of sequences via concatenated arrays (as opposed to calling the forward pass multiple times in sequence), and most frameworks follow this model. These are easy to expose similarly in Flux -- perhaps via having
gpu
convert to someCuLSTM
type when possible -- but ideally our AD and compiler is good enough to make them unnecessary (they are not custom kernels, just hand-coded C++ backprop).Then there's batching; ideally this is transparent to the user via Hydra, but perhaps we still expose some padding/masking primitives and utilities.
Our current model for RNNs is pretty nice in that it's very close to the intuitive mental model; the question really is whether any of these future optimisations might require be incompatible with that design. So far though, it's been fairly effective to ignore CUDNN's programming model entirely and figure it out later.
The text was updated successfully, but these errors were encountered: