Unify lower level API for EfficientNet and MobileNet model families#200
Unify lower level API for EfficientNet and MobileNet model families#200darsnack merged 8 commits intoFluxML:masterfrom theabhirath:midlevel
Conversation
1. Expose `pretrain` option for all models 2. Make it easier to initialise models with config options at the mid level by providing an additional dispatch 3. Some cleanup + documentation
1. Unify MobileNet and EfficientNet APIs into a single lower level builder function 2. Further unification and consolidation of the mid level API 3. Some cleanup
| conv_norm((1, 1), inplanes, squeeze_planes, activation; | ||
| norm_layer)..., | ||
| conv_norm((1, 1), squeeze_planes, inplanes, | ||
| gate_activation; norm_layer)...), .*) |
There was a problem hiding this comment.
TIL that (.*) returns a Base.Broadcast.BroadcastFunction. Maybe we should create an alias for it in NNlib...
ToucheSir
left a comment
There was a problem hiding this comment.
Fewer comments than I thought for a first pass :)
1. Rename `DropPath` to `StochasticDepth` to better reflect the paper 2. `_prob` is more accurate than `_rate`. 3. Add stochastic depth to the `mbconv` builders
|
This PR now also adds an additional |
|
|
||
| # TODO look into "row" mode for stochastic depth | ||
| """ | ||
| DropPath(p; [rng = rng_from_array(x)]) |
There was a problem hiding this comment.
To make this non/less breaking, adding a deprecation for DropPath would work.
There was a problem hiding this comment.
But the Layers API is unexported, so does this still count as a breaking change?
There was a problem hiding this comment.
It really should be exported in 0.8, but for now it isn't so I suppose it might work out in our favour 😄
There was a problem hiding this comment.
I assumed it was why you added the label, but that is true. Built docs list it as "public", but I think that's a false positive and JuliaHub + GH search turn up no results.
There was a problem hiding this comment.
Oh no, the breaking change is that EfficientNet literally returns a different model 😅
|
This PR is complete I think - any further docs improvements would be best handled in a separate PR after landing this one and then #199 |
darsnack
left a comment
There was a problem hiding this comment.
Just a few final naming things. Also, we can tackle more thorough docs including docstrings separately.
src/convnets/builders/mbconv.jl
Outdated
| stochastic_depth_prob = nothing, norm_layer = BatchNorm, | ||
| divisor::Integer = 8, kwargs...) |
There was a problem hiding this comment.
Do we need to have unused keywords? As far as I can tell, right now it is silently ignored, but not including extraneous keywords would throw a MethodError (as it should).
There was a problem hiding this comment.
These are being used though, to pass stuff like the se_round_fn in for MobileNetv3
There was a problem hiding this comment.
I specifically mean stuff like stochastic_depth_prob
There was a problem hiding this comment.
I specifically mean stuff like
stochastic_depth_prob
This particular one needs to be there because it will cause a MethodError for dwsep_conv_norm if passed through (and the default model builder passes it through). See also #200 (comment)
|
Bump? |
| stochastic_depth_prob = nothing, norm_layer = BatchNorm, | ||
| divisor::Integer = 8, kwargs...) | ||
| width_mult, depth_mult = scalings | ||
| block_repeats = [ceil(Int, block_configs[idx][end - 1] * depth_mult) |
There was a problem hiding this comment.
For Mobilenet the config list is
const MOBILENETV2_CONFIGS = [
(mbconv, 3, 16, 1, 1, 1, nothing, relu6),
(mbconv, 3, 24, 6, 2, 2, nothing, relu6),
(mbconv, 3, 32, 6, 2, 3, nothing, relu6),
(mbconv, 3, 64, 6, 2, 4, nothing, relu6),
(mbconv, 3, 96, 6, 1, 3, nothing, relu6),
(mbconv, 3, 160, 6, 2, 3, nothing, relu6),
(mbconv, 3, 320, 6, 1, 1, nothing, relu6),
]
so won't block_repeats just be a list of nothing ?
and as the name suggest, it tells how many time a particular block of layers is repeated?
This PR unifies the lower level API for the EfficientNet and MobileNet model families into a single
irmodelbuilderfunction. This followed as a natural consequence of #198 - a considerable amount of the calculations, model structure and configurations have now been abstracted out enough into pieces that compose, meaning that most of the models in these two families can be almost completely identified by their configuration dict and a couple of other arguments. Amongst other things, what this meant wasDocumentation TODO
Miscellaneous Notes
In a step closer to a uniform API, the higher level model APIs now all accept the
pretrainkeyword argument.