Skip to content

RNNs, batching and sequences #705

Closed
Closed
@MikeInnes

Description

@MikeInnes

Our RNNs are functional and expressive right now, but we should think a bit about what optimisations we need.

CUDNN exposes both chains of RNNs (e.g. 3 stacked LSTMs) and input of sequences via concatenated arrays (as opposed to calling the forward pass multiple times in sequence), and most frameworks follow this model. These are easy to expose similarly in Flux -- perhaps via having gpu convert to some CuLSTM type when possible -- but ideally our AD and compiler is good enough to make them unnecessary (they are not custom kernels, just hand-coded C++ backprop).

Then there's batching; ideally this is transparent to the user via Hydra, but perhaps we still expose some padding/masking primitives and utilities.

Our current model for RNNs is pretty nice in that it's very close to the intuitive mental model; the question really is whether any of these future optimisations might require be incompatible with that design. So far though, it's been fairly effective to ignore CUDNN's programming model entirely and figure it out later.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions