Added the wrapper `Bidirectional` for RNN layers #1790

dcecchini · 2021-11-27T22:20:44Z

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable
API changes require approval from a committer (different from the author, if applicable)

Description

The implementation was based on the implementation of BiLSTM_CNN_CRF_Model present on TextModels.jl but has a few differences:

It leaves the Bidirectional usage closer to Keras as it allows the creation of, for example, Bidirectional(Flux.LSTM, in, out) inside a Chain declarations.
It explicitly adds the dims=1 parameter on the reverse function, which was needed in my experiments but more generic implementation may be possible (I was not able to use flip probably for the same reason).

I also updated the Base.show and _big_show functions so that it is printed in the same style of Flux giving the number of trainable parameters. The implementation added for the _big_show could be added to src/show.jl later.

This is my first PR here, I read the information on CONTRIBUTING.md, but please let me know if it is not following the correct standards. I really enjoy Julia even though I am still learning.

Hope I can contribute to this great project Flux.

src/layers/recurrent.jl

ToucheSir · 2021-11-28T20:41:19Z

src/layers/recurrent.jl

+
+# Concatenate the forward and reversed backward weights
+function (m::Bidirectional)(x::Union{AbstractVecOrMat{T},OneHotArray}) where {T}
+  return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1))


Suggested change

return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1))

return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=3)); dims=3))

Sorry, just found this. When applying Flux RNNs on dense sequences, the temporal dim is actually the last one. See

Flux.jl/src/layers/recurrent.jl

Line 83 in dd6b70e

function (m::Recur)(x::AbstractArray{T, 3}) where T

and

Flux.jl/src/layers/recurrent.jl

Line 27 in dd6b70e

Folding over a 3d Array of dimensions `(features, batch, time)` is also supported:

.

I'm quite confused now because I was expecting the variable x to be the result of an OneHotMatrix operation on a sentence and would be a matrix. In my experiments, I was using an array of one-hot encoded sentences (that were padded to fixed size) like this:

X_onehot = [OneHotMatrix(x, vocab_size) for x in X] y_onehot = [OneHotMatrix(x, num_tags) for x in y]

where X is an array of padded sentences and y is the corresponding labels of each word in the sentence (I am experimenting with Named Entity Recognition models). So the input data would be a matrix (batch, (time, features)) and not in the (features, batch, time) format.

I am not sure how to proceed.

https://fluxml.ai/Flux.jl/stable/models/recurrence/#Working-with-sequences has most of the nitty-gritty details about working with batched sequences for RNNs. In short, the only supported formats are ((features, batch), time) and (features, batch, time). Unlike Python frameworks, Flux puts the batch dim last for all layers because of column major layout. RNNs just add another time dim after that.

Since this definition of Bidirectional takes a contiguous array instead of a vector of arrays, m.forward() and m.backward() dispatch to (m::Recur)(x::AbstractArray{T, 3}). To support both, you'd need something like the following (note: untested!):

(m::Bidirectional)(x::AbstractArray{T, 3}) where T for (features, batch, time)

(m::Bidirectional)(x::Vector{<:AbstractVecOrMat{T}}) where T for ((features, batch), time)

But the rnn layers do not literally accept a vector of matrices. At the link this is iterated through by hand. Should method 2 of Bidirectional(fwd, rev)(x) handle that?

julia> LSTM(3,5)(randn(Float32, 3,)) |> size (5,) julia> LSTM(3,5)(randn(Float32, 3,7)) |> size (5, 7) julia> LSTM(3,5)(randn(Float32, 3,7,11)) |> size (5, 7, 11) julia> LSTM(3,5)([randn(Float32, 3,7) for _ in 1:11]) |> size ERROR: MethodError julia> function (b::Bidirectional)(xs::AbstractVector{<:AbstractVecOrMat}) top = [b.forward(x) for x in xs] bot = reverse([b.reverse(x) for x in reverse(xs)]) vcat.(top, bot) end

Should method 2 of Bidirectional(fwd, rev)(x) handle that?

Yes, exactly.

And applied to randn(Float32, 3,7), should it fail, or still reverse in the 3rd dimension?

The signatures specifically prohibit that, so it should fail. Did you have a specific edge case in mind that isn't caught by the above?

No, I just wonder about the mismatch between what LSTM accepts and what this accepts. Prohibiting things which don't make sense is good, you want to know soon.

Right, the RNN layers can work with single timesteps but Bidirectional can't, as it needs to see the whole sequence up-front in order to reverse it. If anything the former should be brought in line with the latter, but that's a conversation for another issue (#1678 probably).

Thank you for the great conversation here and the suggestions. I just tested and the Bidirectional, as it is, is able to process randn(Float32, 3,7):

Bidirectional(LSTM, 3, 5)(randn(Float32, 3, 7)) |> size (10, 7)

But about the format, I am still confused on how to preprocess the text data so that it would end in (seq_length, (features, samples)) format. It seems counterintuitive to me.

I usually follow: read the data -> split sentences -> split words -> pad -> one-hot. So my data would be an array with N sentences, where every sentence is described as a one-hot matrix of its words. In this way, I ended up with the (samples, (features, seq_length)) format. How should I preprocess the text data so that I would end up with (seq_length, (features, samples)).

Also, by checking these formats I discovered that I should probably use dims=2 in my formulation on the reverse function (not use the default to reverse in all dimensions, it should reverse only the time that in my case is the second dimension of the onehotmatrix).

ToucheSir · 2021-11-28T20:42:37Z

@dcecchini do you have local (behavioural) tests that weren't pushed? I only see changes in src/.

dcecchini · 2021-11-28T20:50:20Z

@dcecchini do you have local (behavioural) tests that weren't pushed? I only see changes in src/.

I don't think so, I did all the changes on src/ only and pushed all of them. I will make a few modifications as mentioned on the comments here and push them shortly.

…w method to the `show.jl` file. Modified the constructor to accept `a...` and `ka...` instead of only the `in` and `out`.

ToucheSir · 2021-11-28T23:34:58Z

Testing wise, having something in https://github.com/FluxML/Flux.jl/blob/6168692ddb2680ebe02f461bf0c59c2b6cb0ff1e/test/layers/recurrent.jl that checks the logic (need not be complex) would be good. You can see how this is done for other composite layers such as SkipConnection or Parallel.

darsnack · 2021-11-29T00:32:54Z

src/layers/recurrent.jl

+
+
+"""
+    Bidirectional{A,B} 


Once we settle on the constructor, this line should be the various constructors not the type definition.

…es on documentation

ToucheSir · 2021-11-29T20:41:25Z

To be clear, accepting that input is a bug, not a feature :P Workflow-wise, the key is to put the batch dim in between the time and feature ones. There are a few ways to do this, but one of the easiest would be Flux.onehotbatch(transpose(reduce(hcat, batch_of_padded_sentences)), vocab). See the Flux.onehotbatch <https://fluxml.ai/Flux.jl/stable/data/onehot/#Flux.onehotbatch> docs for more details. We're actively thinking about how to make working with RNNs more ergonomic (the old interface is not terribly so), but in the meantime getting your data into (samples, seq_length) or (samples, (seq_length)) first and then doing the one-hot transform is the easiest way to go about things.

…

On Mon, 29 Nov 2021 at 11:12, David Cecchini ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/layers/recurrent.jl <#1790 (comment)>: > + backward::B +end + +# Generic constructor for every case +Bidirectional(forward, f_in::Integer, f_out::Integer, backward, b_in::Integer, b_out::Integer) = Bidirectional(forward(f_in, f_out), backward(b_in, b_out)) + +# Constructor for forward and backward having the same size +Bidirectional(forward, backward, in::Integer, out::Integer) = Bidirectional(forward(in, out), backward(in, out)) + +# Constructor to add the same cell as forward and backward with given input and output sizes +Bidirectional(rnn, in::Integer, out::Integer) = Bidirectional(rnn(in, out), rnn(in, out)) + + +# Concatenate the forward and reversed backward weights +function (m::Bidirectional)(x::Union{AbstractVecOrMat{T},OneHotArray}) where {T} + return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1)) Thank you for the great conversation here and the suggestions. I just tested and the Bidirectional, as it is, is able to process randn(Float32, 3,7): Bidirectional(LSTM, 3, 5)(randn(Float32, 3, 7)) |> size (10, 7) But about the format, I am still confused on how to preprocess the text data so that it would end in (seq_length, (features, samples)) format. It seems counterintuitive to me. I usually follow: read the data -> split sentences -> split words -> pad -> one-hot. So my data would be an array with N sentences, where every sentence is described as a one-hot matrix of its words. In this way, I ended up with the (samples, (features, seq_length)) format. How should I preprocess the text data so that I would end up with (seq_length, (features, samples)). Also, by checking these formats I discovered that I should probably use dims=2 in my formulation on the reverse function (not use the default to reverse in all dimensions, it should reverse only the time that in my case is the second dimension of the onehotmatrix). — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1790 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA67XP2QOU46T6PVT6X34MTUOPGB7ANCNFSM5I4SLATQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

dcecchini · 2021-11-29T21:26:06Z

To be clear, accepting that input is a bug, not a feature :P Workflow-wise, the key is to put the batch dim in between the time and feature ones. There are a few ways to do this, but one of the easiest would be Flux.onehotbatch(transpose(reduce(hcat, batch_of_padded_sentences)), vocab). See the Flux.onehotbatch https://fluxml.ai/Flux.jl/stable/data/onehot/#Flux.onehotbatch docs for more details. We're actively thinking about how to make working with RNNs more ergonomic (the old interface is not terribly so), but in the meantime getting your data into (samples, seq_length) or (samples, (seq_length)) first and then doing the one-hot transform is the easiest way to go about things.

Yes! To add the batch between time and feature is not difficult, the problem is to have the batch as the last dimension! I was able to transform in (time, (samples, features)) format (being an array of array of array and not an array of matrixes), but could not keep the samples at the end. I used:

[[X_onehot[i][:, j] for i in 1:length(X_onehot)] for j in 1:size(X_onehot[1])[2]]

but after checking your example I was able to do this:

aux = reduce(hcat, X_pad); # size(aux) = (time, samples)
Xs = [Flux.onehotbatch(aux[i, :], 1:vocab_size) for i in 1:size(aux )[1]]; 
size(Xs) # (time, )
size(Xs[1]) # (features, samples)

So, at the end, I had to use the one-hot encoding in a different way: instead of using it in a sentence, use it in arrays representing first words, seconds words etc.

Edit ----

After experimenting with this, I was unable to make it work with Embedding layer. I would like to use the (time, (features, samples)) as input to the Embedding layer, but it was not working. Also, I tried to pass it to LSTM cells and it didn't work as well.

So I really would prefer the "bugged" version as it allows to send the whole sentence in a batch and it works fine passing it as (samples, (features, time)) to embedding layer and further one...

Anyone has other solution?

ToucheSir · 2021-12-13T18:25:30Z

(time, (features, samples)) as input to the Embedding layer, but it was not working

You'd have to pass each timestep in separately to the embedding layer, because it takes an input of size (samples,) or (features, samples). This holds true for the alternative format you posted as well, because Embedding doesn't know how to work with the outer vector otherwise.

Also, I tried to pass it to LSTM cells and it didn't work as well.

Same story here. LSTM takes an input of shape (features, samples) (one timestep, you'll have to pass it each timestep separately if you're using ((features, samples), time), or (features, samples, time) (multiple timesteps).

Alternatively, you could use dense arrays all the way through. Given a set of inputs of shape (samples, time), pass it to Embedding to get an output of (features, samples, time). As you can see, this is already in the right shape to be handled by LSTM or any following RNN layer.

...it allows to send the whole sentence in a batch

Just to clarify, using the time dimension as the batch dim means that each token in the sentence will be processed in parallel...which is probably not what you want with an RNN :P

NeroBlackstone · 2024-06-03T11:19:29Z

Any updates for this PR?
This is a very useful feature.
Are there any other examples of building bi-LSTM in Flux.jl 0.14?

NeroBlackstone · 2024-06-03T17:54:15Z

Questions about this implementation of Bidirectional RNN:

My question is, since bidirectional RNN requires a full time step (forward hidden state calculation ➡️ + backward hidden state calculation⬅️), so, for a single time step (1/seq_length) input, we cannot calculate the hidden state of bidirectional RNN, right?

But if we initialize model in this PR:

julia> model = Bidirectional(LSTM, 3, 5) 
Bidirectional(LSTMCell(3 => 5), LSTMCell(3 => 5))

julia> x = rand(Float32, 3)
3-element Vector{Float32}:
 0.40451288
 0.40546536
 0.7272219

julia> model(x)
10-element Vector{Float32}:
 -0.15061164
 -0.105684996
 -0.08399847
  0.113812014
 -0.03911737
  0.061232302
 -0.011352453
 -0.0018385417
 -0.06642
  0.006111855

It output single step result, is this work expected?

I think the correct behavior is (for batch size 1):

# input size(all_time_step) =  (sequence_len * features) 
julia> model(all_time_step)
# output size = (sequence_len * outsize*2)

NeroBlackstone · 2024-06-03T17:56:27Z

It seems that the original author has given up on implementing it, and I want to try to continue his work (this is related to my graduation,

NeroBlackstone · 2024-06-04T03:03:04Z

src/layers/recurrent.jl

+  return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1))
+end
+
+@functor Bidirectional


Flux.@layer here?

NeroBlackstone · 2024-06-04T09:26:13Z

I think BRNN should take all the time step, so We can't use it like RNN that we have implemented, which means:

This usage is valid in RNN:

# 2 => feature, 3 => sequence_len, 4 => batch_size
x = [[1 2 1 2; 3 4 3 4] for i = 1:3]
rnn = RNN(2=>5)
# out_dim = 3(sequence_len) * 5(hidden) * 4(batch_size)
rnn_out = [rnn(xi) for xi in x]

But same usage is invalid in BRNN:

# 2 => feature, 3 => sequence_len, 4 => batch_size
x = [[1 2 1 2; 3 4 3 4] for i = 1:3]
brnn = Bidirectional(RNN,2,5)
# out_dim = 3(sequence_len) * 10(hidden) * 4(batch_size)
brnn_out = [brnn(xi) for xi in x]

Because in (m::Bidirectional)(x::Union{AbstractVecOrMat{T},Flux.OneHotArray}) where {T}:

function (m::Bidirectional)(x::Union{AbstractVecOrMat{T},Flux.OneHotArray}) where {T}
    #size(x) = (2,4)(feature*batch_size)
    return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1))
end

the size of x is (feature*batch_size), we actually "reverse feature", not "reverse timestep" in this implementation

NeroBlackstone · 2024-06-04T09:46:10Z

So for BRNN input, I think correct dimensions is:

For 2d-matrix input or vector of vector input, which means seq_length * feature with only one batch size.

# seq_len = 3
# feature = 2

# vector of vector
x = [rand(Float32,2) for i = 1:3]
# or
x = rand(Float32,3,2)

# out put all hidden state, with double size.
# size = sequence_len * 2hidden_size
brnn(x)

For vector of 2d-matrix input, which means seq_length * feature * batch_size.

# seq_len = 3
# feature = 2
# batch_size = 4

x = [[1 2 1 2; 3 4 3 4] for i = 1:3]

# out put all hidden state, with double size.
# size = sequence_len * 2hidden_size * batch_size
brnn_out = brnn(x)

NeroBlackstone · 2024-06-04T10:50:35Z

I think the correct implement is:

function (m::Bidirectional)(x::Union{AbstractVecOrMat{T},Flux.OneHotArray}) where {T}
    vec_for = [m.forward(xi) for xi in x]
    vec_back = reverse([m.backward(xi) for xi in reverse(x)])
    return vcat.(vec_for,vec_back)
end

for no batch input:

x = [rand(Float32,2) for i = 1:3]
# size = 3(seq_len) * 10(double hidden_size)
brnn(x)
# size = 3(seq_len) * 5(hidden_size)
[rnn(xi) for xi in x]

for input with batchs:

x = [[1 2 1 2; 3 4 3 4] for i = 1:3]
# size = 3(seq_len) * 10(double hidden_size) * 4(batch_size)
brnn(x1)
# size = 3(seq_len) * 5(hidden_size) * 4(batch_size)
[rnn(xi) for xi in x]

The output size of RNN and BRNN is same

dcecchini · 2024-06-05T19:27:54Z

Hi @NeroBlackstone , at that time I could not finish the implementation due to the difficulty to get something to work on the designed structure for RNN cells as you can see in the history above. I would be glad to work on this, but I have little time to spare.

dcecchini added 3 commits November 27, 2021 12:21

Added the Bidirectional wrapper for RNN layers

fe14252

Changed documentation examples to jldoctest

01daa36

Removed reference to Flux.

dd6b70e

dcecchini mentioned this pull request Nov 27, 2021

Recurrent network interface updates/design #1678

Open

7 tasks

ToucheSir closed this Nov 27, 2021

ToucheSir reopened this Nov 27, 2021

mcabbott reviewed Nov 28, 2021

View reviewed changes

src/layers/recurrent.jl Outdated Show resolved Hide resolved

src/layers/recurrent.jl Outdated Show resolved Hide resolved

DhairyaLGandhi reviewed Nov 28, 2021

View reviewed changes

src/layers/recurrent.jl Outdated Show resolved Hide resolved

mcabbott reviewed Nov 28, 2021

View reviewed changes

src/layers/recurrent.jl Show resolved Hide resolved

ToucheSir reviewed Nov 28, 2021

View reviewed changes

src/layers/recurrent.jl Outdated Show resolved Hide resolved

ToucheSir reviewed Nov 28, 2021

View reviewed changes

dcecchini added 3 commits November 28, 2021 17:52

Improved Bidirectional documentation

808cabc

Removed the extra constructors and moved the modifications to the sho…

398e533

…w method to the `show.jl` file. Modified the constructor to accept `a...` and `ka...` instead of only the `in` and `out`.

Removed random numbers from Bidirectional examples

bb41190

darsnack reviewed Nov 29, 2021

View reviewed changes

dcecchini added 3 commits November 29, 2021 09:56

Added dimensions to the constructor of Bidirectional and added exampl…

c107916

…es on documentation

Added 2-arg show for Bidirectional

0ec5d82

Adjusted constructor for Bidirectional

18b406a

mcabbott added the enhancement label Jan 14, 2022

mcabbott added the RNN label Oct 5, 2022

NeroBlackstone reviewed Jun 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the wrapper `Bidirectional` for RNN layers #1790

Added the wrapper `Bidirectional` for RNN layers #1790

dcecchini commented Nov 27, 2021

ToucheSir Nov 28, 2021 •

edited

Loading

dcecchini Nov 28, 2021

ToucheSir Nov 28, 2021

mcabbott Nov 29, 2021 •

edited

Loading

ToucheSir Nov 29, 2021

mcabbott Nov 29, 2021

ToucheSir Nov 29, 2021

mcabbott Nov 29, 2021

ToucheSir Nov 29, 2021

dcecchini Nov 29, 2021

ToucheSir commented Nov 28, 2021 •

edited

Loading

dcecchini commented Nov 28, 2021

ToucheSir commented Nov 28, 2021

darsnack Nov 29, 2021

ToucheSir commented Nov 29, 2021 via email

dcecchini commented Nov 29, 2021 •

edited

Loading

ToucheSir commented Dec 13, 2021

NeroBlackstone commented Jun 3, 2024

NeroBlackstone commented Jun 3, 2024 •

edited

Loading

NeroBlackstone commented Jun 3, 2024

NeroBlackstone Jun 4, 2024

NeroBlackstone commented Jun 4, 2024 •

edited

Loading

NeroBlackstone commented Jun 4, 2024 •

edited

Loading

NeroBlackstone commented Jun 4, 2024 •

edited

Loading

dcecchini commented Jun 5, 2024

	return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=1)); dims=1))
	return vcat(m.forward(x), reverse(m.backward(reverse(x; dims=3)); dims=3))



		"""
		Bidirectional{A,B}

Added the wrapper Bidirectional for RNN layers #1790

Are you sure you want to change the base?

Added the wrapper Bidirectional for RNN layers #1790

Conversation

dcecchini commented Nov 27, 2021

PR Checklist

Description

ToucheSir Nov 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott Nov 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToucheSir commented Nov 28, 2021 • edited Loading

dcecchini commented Nov 28, 2021

ToucheSir commented Nov 28, 2021

Choose a reason for hiding this comment

ToucheSir commented Nov 29, 2021 via email

dcecchini commented Nov 29, 2021 • edited Loading

Edit ----

ToucheSir commented Dec 13, 2021

NeroBlackstone commented Jun 3, 2024

NeroBlackstone commented Jun 3, 2024 • edited Loading

NeroBlackstone commented Jun 3, 2024

Choose a reason for hiding this comment

NeroBlackstone commented Jun 4, 2024 • edited Loading

NeroBlackstone commented Jun 4, 2024 • edited Loading

NeroBlackstone commented Jun 4, 2024 • edited Loading

dcecchini commented Jun 5, 2024

Added the wrapper `Bidirectional` for RNN layers #1790

Added the wrapper `Bidirectional` for RNN layers #1790

ToucheSir Nov 28, 2021 •

edited

Loading

mcabbott Nov 29, 2021 •

edited

Loading

ToucheSir commented Nov 28, 2021 •

edited

Loading

dcecchini commented Nov 29, 2021 •

edited

Loading

NeroBlackstone commented Jun 3, 2024 •

edited

Loading

NeroBlackstone commented Jun 4, 2024 •

edited

Loading

NeroBlackstone commented Jun 4, 2024 •

edited

Loading

NeroBlackstone commented Jun 4, 2024 •

edited

Loading