@@ -79,36 +79,27 @@ $\frac{\partial J}{\partial \theta_1}$,
7979$\frac{\partial J}{\partial \theta_2}$, etc. Each of these gradients can
8080be calculated via the chain rule. Here is the chain rule written out for
8181the gradients for $\theta_1$ and $\theta_2$:
82-
8382\[
83+ \newcommand{\sharedterm}{%
84+ \colorbox{shared_term_color}{%
85+ $\displaystyle
86+ \frac{\partial J}{\partial \mathbf{x}_ L}
87+ \frac{\partial \mathbf{x}_ L}{\partial \mathbf{x}_ {L-1}}
88+ \cdots
89+ \frac{\partial \mathbf{x}_ 3}{\partial \mathbf{x}_ 2}
90+ $%
91+ }%
92+ }
8493\begin{aligned}
85- \frac{\partial J}{\partial \theta_1} &=
86- \mathchoice%
87- {\colorbox{shared_term_color}{\ensuremath{\displaystyle
88- \frac{\partial J}{\partial \mathbf{x}_ L}
89- \frac{\partial \mathbf{x}_ L}{\partial \mathbf{x}_ {L-1}}
90- \cdots
91- \frac{\partial \mathbf{x}_ 3}{\partial \mathbf{x}_ 2}}}}%
92- {\colorbox{shared_term_color}{\ensuremath{\textstyle
93- \frac{\partial J}{\partial \mathbf{x}_ L}
94- \frac{\partial \mathbf{x}_ L}{\partial \mathbf{x}_ {L-1}}
95- \cdots
96- \frac{\partial \mathbf{x}_ 3}{\partial \mathbf{x}_ 2}}}}%
94+ \frac{\partial J}{\partial \theta_1}
95+ &=
96+ \sharedterm\,
9797\frac{\partial \mathbf{x}_ 2}{\partial \mathbf{x}_ 1}
9898\frac{\partial \mathbf{x}_ 1}{\partial \theta_1}
9999\\
100- \frac{\partial J}{\partial \theta_2} &=
101- \mathchoice%
102- {\colorbox{shared_term_color}{\ensuremath{\displaystyle
103- \frac{\partial J}{\partial \mathbf{x}_ L}
104- \frac{\partial \mathbf{x}_ L}{\partial \mathbf{x}_ {L-1}}
105- \cdots
106- \frac{\partial \mathbf{x}_ 3}{\partial \mathbf{x}_ 2}}}}%
107- {\colorbox{shared_term_color}{\ensuremath{\textstyle
108- \frac{\partial J}{\partial \mathbf{x}_ L}
109- \frac{\partial \mathbf{x}_ L}{\partial \mathbf{x}_ {L-1}}
110- \cdots
111- \frac{\partial \mathbf{x}_ 3}{\partial \mathbf{x}_ 2}}}}%
100+ \frac{\partial J}{\partial \theta_2}
101+ &=
102+ \sharedterm\,
112103\frac{\partial \mathbf{x}_ 2}{\partial \theta_2}
113104\end{aligned}
114105\]
0 commit comments