Fix #62 by jondeuce · Pull Request #70 · FluxML/Optimisers.jl

jondeuce · 2022-04-22T18:26:05Z

This adds a couple small changes on top of this draft PR in order to fix #62:

Wrap offset indices in a dummy struct Offset to fix the issue mentioned in Attempt to fix #62 #63 for array of arrays. For example, the offset structure for x = [[1.0, 2.0]] is now something like o = [Offset(4)] which is not leaflike, compared to o = [4] previously. This also opens the door to storing more information in this wrapper struct (original array size? eltype?), but that doesn't seem necessary at this time
y = backing(re(y)) allows for functor(x) to return children which aren't its own fields: y is first restructured to match the structure of x, and then the NamedTuple backing for re(y) is extracted and passed to Tangent. It has the added benefit of adding some symmetry with _trainable_biwalk which naturally restructures the output of _trainmap, whereas _Tangent_biwalk previously did not

Closes #63 (replaces).

…`s of offsets (also simplifying `_aux_children`); fix broken test for issue FluxML#62

… wrapper

src/destructure.jl

mcabbott · 2022-04-22T20:20:15Z

src/destructure.jl

  if p isa ProjectTo  # e.g. Array, NamedTuple
    p(y)
  else  # p === identity for unknown structs
+    y = backing(re(y)) # extract NamedTuple backing from re(y); required if x has children which aren't its own fields


Note to self, this I need to think about. Some of this complication was working around things that are now fixed in CRC.jl, if I remember right.

Yeah, admittedly this line took some trial and error and is a little bit above my pay-grade. I managed to convince myself, but perhaps there's something cleaner.

Ok, I think I finally understand what's going on. Sorry it took a while.

re constructs another Skip containing the gradient, and backing turns that into a NamedTuple with the same field names, which is what Tangent wants.

The only way I can see this failing is this: If the primal type's constructor is fussy about what types it can accept, then it may not be happy to accept something which is valid as its gradient. E.g. if there is only Skip(::AbstractLayer), and re tries to make one with a Tangent.

No worries! Yes, I struggled with that edge case too. Unfortunately I think it's quite tricky to work around. For example, suppose you have a user-defined functor(m::MyModel) = (m.w,), w -> .... Then:

In general there's no way to reconstruct MyModel (or even a NamedTuple of fields/values) without re, as you do not know the corresponding field name given only (m.w,), but

As you say, if the primal constructor isn't sufficiently generic then it won't be able to store Tangent/Nothing/etc. values in it's fields and will error before backing can unpack it again

Avoiding re would be ideal, but I think that would require functor to always return NamedTuples on custom structs. I noticed that this is the default in @functor, though, so maybe it's not such a painful requirement? In the mean time I can at least add a branch that would avoid re for structs that are functored to NamedTuples.

In fact there's another problem I didn't spot before, what a mess:

julia> ac = TwoThirds([1.0, 2.0], [3.0], [4.0, 5.0]); # from tests: a,c are functor-ed, and only a is trainable julia> v2, re2 = destructure(ac) ([1.0, 2.0], Restructure(TwoThirds, ..., 2)) julia> gradient(ac) do x # with Tangent{typeof(x), typeof(y)}(y) w2, _ = destructure(x) w2[2]^2 end ((a = [0.0, 4.0], b = nothing, c = [4.0, 5.0]),) # Same, with z = backing(re(y)) : julia> gradient(ac) do x w2, _ = destructure(x) w2[2]^2 end ┌ Info: last case │ x = TwoThirds([1.0, 2.0], [3.0], [4.0, 5.0]) │ y = (a = [0.0, 4.0], c = [4.0, 5.0]) └ z = NamedTuple{(:a, :b, :c), Tuple{Any, Any, Any}}(([0.0, 4.0], [3.0], [4.0, 5.0])) ((a = [0.0, 4.0], b = [3.0], c = [4.0, 5.0]),)

Oh yikes. That's a good example, hits all the pain points at once. If I'm understanding correctly, the gradient should be ((a = [0.0, 4.0], b = nothing, c = nothing),), right?

I think the problem is the _trainmap above; it populates the nothing values from _trainable (non-trainable fields) with the primal values, when they should be NoT. That's how the b and/or c values get back in there.

Yes, I think _trainmap needs to do something isnothing(t) ? NoT : f(t, a) here. That's where c = [4.0, 5.0] is coming from.

But b = [3.0] is coming from this PR's trick of calling the reconstructor made by @functor:

julia> ch, re = Functors.functor(ac) ((a = [1.0, 2.0], c = [4.0, 5.0]), var"#1#2"{TwoThirds}(TwoThirds([1.0, 2.0], [3.0], [4.0, 5.0]))) julia> re((a = [10, 20], c = nothing)) TwoThirds([10, 20], [3.0], nothing)

Gotcha. So on top of the modified _trainmap to fix c, one would still have to filter backing(re(y)) to replace repopulated primal values which aren't functor-ed with NoT in order to fix b.

EDIT: But, based on the output of Tangent{typeof(x), typeof(y)}(y), maybe the modified _trainmap alone would be enough and backing(re(y)) isn't needed after all, as Tangent will assign NoT to omitted fields in y automatically.

EDIT 2: Never mind, that would still fail for children which aren't fields, like Skip.

Alright pushed something that works for both Skip and your TwoThirds example (modified _trainmap + filtering backing(re(y))). But since it uses re it would still fail for fussy constructors.

…h are not `trainable`; filter primal values from `backing(re(y))`

mcabbott and others added 5 commits March 25, 2022 21:18

attempt 62

57a3fa2

next idea

81e41fe

Merge branch 'master' of github.com:jondeuce/Optimisers.jl into offset

aa97f7f

Offset wrapper to avoid confusing isleaf with `AbstractArray{Int}…

6a0ba9f

…`s of offsets (also simplifying `_aux_children`); fix broken test for issue FluxML#62

test no longer fails; offset structure is not leaflike using Offset…

927e095

… wrapper

mcabbott reviewed Apr 22, 2022

View reviewed changes

jondeuce added 3 commits April 22, 2022 14:05

delete error message for gradient of destructure, which is working now

ba909d2

remove _aux_children

abf8738

modified _trainmap which returns NoT for functor-ed values whic…

415b597

…h are not `trainable`; filter primal values from `backing(re(y))`

mcabbott added the gradients label Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #62#70

Fix #62#70
jondeuce wants to merge 8 commits intoFluxML:masterfrom
jondeuce:offset

jondeuce commented Apr 22, 2022 •

edited by mcabbott

Loading

Uh oh!

Uh oh!

mcabbott Apr 22, 2022

Uh oh!

jondeuce Apr 22, 2022

Uh oh!

mcabbott Apr 30, 2022

Uh oh!

jondeuce Apr 30, 2022

Uh oh!

mcabbott May 1, 2022

Uh oh!

jondeuce May 1, 2022

Uh oh!

mcabbott May 1, 2022

Uh oh!

jondeuce May 1, 2022 •

edited

Loading

Uh oh!

jondeuce May 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jondeuce commented Apr 22, 2022 • edited by mcabbott Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jondeuce May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jondeuce commented Apr 22, 2022 •

edited by mcabbott

Loading

jondeuce May 1, 2022 •

edited

Loading