Skip to content

Commit 3c842e0

Browse files
committed
more tweaks & typos
1 parent d8faea6 commit 3c842e0

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

docs/src/guide/models/basics.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ The first entry is `∂f/∂x` as before, but the second entry is more interesti
157157
For `poly2`, we get `∂f/∂θ` as `grad2[2]` directly.
158158
It is a vector, because `θ` is a vector, and has elements `[∂f/∂θ[1], ∂f/∂θ[2], ∂f/∂θ[3]]`.
159159

160-
For `poly3`, however, we get a `NamedTuple` whose fields correspond to those of the struct `Poly3`.
160+
For `poly3s`, however, we get a `NamedTuple` whose fields correspond to those of the struct `Poly3`.
161161
This is called a *structural gradient*. And the nice thing about them is that they work for
162162
arbitrarily complicated structures, for instance:
163163

@@ -286,6 +286,8 @@ This is because we anticipate composing several instances of this thing,
286286
with independent parameter arrays, of different sizes and different
287287
random initial parameters.
288288

289+
Let's try this out, and look at its gradient:
290+
289291
```jldoctest poly; output = false, filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
290292
x = Float32[0.1, 0.2, 0.3] # input
291293
@@ -303,8 +305,8 @@ Within it, the gradient with respect to `W` is a matrix of seemingly random numb
303305
Notice that there is also an entry for `act`, which is `nothing`,
304306
as this field of the struct is not a smoothly adjustible parameter.
305307

306-
We can compose these layers just as we did the polynomials above.
307-
Here's a composition of 3, in which the last step is the function `only`
308+
We can compose these layers just as we did the polynomials above, in `poly4`.
309+
Here's a composition of 3 functions, in which the last step is the function `only`
308310
which takes a 1-element vector and gives us the number inside:
309311

310312
```jldoctest poly; output = false, filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
@@ -323,7 +325,8 @@ This gradient is starting to be a complicated nested structure.
323325
But it works just like before: `grad.outer.inner.W` corresponds to `model1.outer.inner.W`.
324326

325327
We don't have to use `` (which makes a `ComposedFunction` struct) to combine layers.
326-
Instead, we could define our own container struct, or use a closure:
328+
Instead, we could define our own container struct, or use a closure.
329+
This `model2` will work the same way (although its fields have different names):
327330

328331
```jldoctest poly; output = false, filter = r"[+-]?([0-9]*[.])?[0-9]+(f[+-]*[0-9])?"
329332
model2 = let
@@ -367,7 +370,7 @@ How does this `model3` differ from the `model1` we had before?
367370

368371
* Flux's [`Chain`](@ref Flux.Chain) works left-to-right, the reverse of Base's ``.
369372
Its contents is stored in a tuple, thus `model3.layers[1].weight` is an array.
370-
* Flux's layer [`Dense`](@ref Flux.Dense) has only minor differences:
373+
* Flux's layer [`Dense`](@ref Flux.Dense) has only minor differences from our `struct Layer`:
371374
- Like `struct Poly3{T}` above, it has type parameters for its fields -- the compiler does not know exactly what type `layer3s.W` will be, which costs speed.
372375
- Its initialisation uses not `randn` (normal distribution) but [`glorot_uniform`](@ref) by default.
373376
- It reshapes some inputs (to allow several batch dimensions), and produces more friendly errors on wrong-size input.
@@ -376,7 +379,8 @@ How does this `model3` differ from the `model1` we had before?
376379
and has a rule telling Zygote how to differentiate it efficiently.
377380
* Flux overloads `Base.show` so to give pretty printing at the REPL prompt.
378381
Calling [`Flux.@layer Layer`](@ref Flux.@layer) will add this, and some other niceties.
379-
* All Flux layers accept a batch of samples: Instead of mapping one sample `x::Vector` to one output `y::Vector`, they map columns of a matrix `xs::Matrix` to columns of the output. This looks like `f(xs) ≈ stack(f(x) for x in eachcol(xs))` but is done more efficiently.
382+
383+
All Flux layers accept a batch of samples: Instead of mapping one sample `x::Vector` to one output `y::Vector`, they map columns of a matrix `xs::Matrix` to columns of the output. This looks like `f(xs) ≈ stack(f(x) for x in eachcol(xs))` but is done more efficiently.
380384

381385
If what you need isn't covered by Flux's built-in layers, it's easy to write your own.
382386
There are more details [later](@ref man-advanced), but the steps are invariably those shown for `struct Layer` above:

0 commit comments

Comments
 (0)