RNNs, batching and sequences

Our RNNs are functional and expressive right now, but we should think a bit about what optimisations we need.

CUDNN exposes both chains of RNNs (e.g. 3 stacked LSTMs) and input of sequences via concatenated arrays (as opposed to calling the forward pass multiple times in sequence), and most frameworks follow this model. These are easy to expose similarly in Flux -- perhaps via having `gpu` convert to some `CuLSTM` type when possible -- but ideally our AD and compiler is good enough to make them unnecessary (they are not custom kernels, just hand-coded C++ backprop).

Then there's batching; ideally this is transparent to the user via Hydra, but perhaps we still expose some padding/masking primitives and utilities.

Our current model for RNNs is pretty nice in that it's very close to the intuitive mental model; the question really is whether any of these future optimisations might require be incompatible with that design. So far though, it's been fairly effective to ignore CUDNN's programming model entirely and figure it out later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RNNs, batching and sequences #705

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RNNs, batching and sequences #705

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions