Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNNs, batching and sequences #705

Closed
MikeInnes opened this issue Mar 26, 2019 · 2 comments
Closed

RNNs, batching and sequences #705

MikeInnes opened this issue Mar 26, 2019 · 2 comments

Comments

@MikeInnes
Copy link
Member

Our RNNs are functional and expressive right now, but we should think a bit about what optimisations we need.

CUDNN exposes both chains of RNNs (e.g. 3 stacked LSTMs) and input of sequences via concatenated arrays (as opposed to calling the forward pass multiple times in sequence), and most frameworks follow this model. These are easy to expose similarly in Flux -- perhaps via having gpu convert to some CuLSTM type when possible -- but ideally our AD and compiler is good enough to make them unnecessary (they are not custom kernels, just hand-coded C++ backprop).

Then there's batching; ideally this is transparent to the user via Hydra, but perhaps we still expose some padding/masking primitives and utilities.

Our current model for RNNs is pretty nice in that it's very close to the intuitive mental model; the question really is whether any of these future optimisations might require be incompatible with that design. So far though, it's been fairly effective to ignore CUDNN's programming model entirely and figure it out later.

@tbenst
Copy link

tbenst commented Nov 15, 2020

Is it currently possible to train a RNN with (parallel) batches on GPU with Flux..?

If I understand correctly, the following will run in parallel on a GPU for a feedforward model (please correct me if this is not run in parallel!):

model2 = Chain(
  Dense(10, 5, σ),
  Dense(5, 2),
  softmax) |> gpu

batch_size = 5
model2.([rand(10) for i in 1:batch_size] |> gpu ) # => 5 x 2-element vector

But this by necessity must be run sequentially:

seq = [rand(1) for i = 1:32] |> gpu
m = LSTM(1,2) |> gpu
out = m.(seq)
size(out), size(out[1]), size(out[1][1])

Flux.reset!(m)
@assert all(m(seq[2]) .!= out[2])
Flux.reset!(m)
@assert all(m(seq[1]) .== out[1])
@assert all(m(seq[2]) .== out[2])

Is it currently possible to run an RNN in batched mode on GPU / with CUDA?

Edit: I think the proper way to do this is here https://discourse.julialang.org/t/simple-flux-lstm-for-time-series/35494/17

using Flux

m = Chain(LSTM(3,2))
data = rand(3,10,4)
inputs = [data[:,:,t] for t in 1:4]
output = m.(inputs)
size(output), size(output[1])

@ToucheSir
Copy link
Member

I think this is covered by the more well-developed list at #1678. The 3D interface is already implemented in Flux 0.12 (though not completely optimized for accelerators).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants