Skip to content

Commit e83ae8f

Browse files
authored
Merge pull request #881 from stan-dev/update-simplex-to-docs-invilr
Update simplex to docs invilr
2 parents 1c85126 + 56ebbc8 commit e83ae8f

File tree

1 file changed

+57
-126
lines changed

1 file changed

+57
-126
lines changed

src/reference-manual/transforms.qmd

Lines changed: 57 additions & 126 deletions
Original file line numberDiff line numberDiff line change
@@ -662,171 +662,102 @@ $$
662662
663663
### Unit simplex inverse transform {.unnumbered}
664664
665-
Stan's unit simplex inverse transform may be understood using the following
666-
stick-breaking metaphor.[^transforms-1]
667665
668-
[^transforms-1]: For an alternative derivation of the same transform using
669-
hyperspherical coordinates, see [@Betancourt:2010].
670666
671-
1. Take a stick of unit length (i.e., length 1).
672-
2. Break a piece off and label it as $x_1$, and set it aside, keeping what's
673-
left.
674-
3. Next, break a piece off what's left, label it $x_2$, and set it aside,
675-
keeping what's left.
676-
4. Continue breaking off pieces of what's left, labeling them, and setting them
677-
aside for pieces $x_3,\ldots,x_{K-1}$.
678-
5. Label what's left $x_K$.
679-
680-
The resulting vector $x = [x_1,\ldots,x_{K}]^{\top}$ is a unit simplex because
681-
each piece has non-negative length and the sum of the stick lengths is one by
682-
construction.
683-
684-
This full inverse mapping requires the breaks to be represented as the fraction
685-
in $(0,1)$ of the original stick that is broken off. These break ratios are
686-
themselves derived from unconstrained values in $(-\infty,\infty)$ using the
687-
inverse logit transform as described above for unidimensional variables with
688-
lower and upper bounds.
689-
690-
More formally, an intermediate vector $z \in \mathbb{R}^{K-1}$, whose
691-
coordinates $z_k$ represent the proportion of the stick broken off in step $k$,
692-
is defined elementwise for $1 \leq k < K$ by
667+
The length-$K$ unit simplex inverse transform is given by the softmax of a sum-to-zero vector of length $K$.
693668
669+
Let $y$ represent the unconstrained $K - 1$ values in $(-\infty, \infty)$. The intermediate sum-to-zero vector $z = \text{sum\_to\_zero\_transform}(y)$ is length $K$. The unit simplex is then given by
694670
$$
695-
z_k = \mathrm{logit}^{-1} \left( y_k
696-
+ \log \left( \frac{1}{K - k}
697-
\right)
698-
\right).
671+
x_i = \text{softmax}(z) = \frac{\exp(z_i)}{\sum_{i = 1}^K \exp(z_i)}
699672
$$
700673
701-
The logit term
702-
$\log\left(\frac{1}{K-k}\right) (i.e., \mathrm{logit}\left(\frac{1}{K-k+1}\right)$)
703-
in the above definition adjusts the transform so that a zero vector $y$ is
704-
mapped to the simplex $x = (1/K,\ldots,1/K)$. For instance, if $y_1 = 0$, then
705-
$z_1 = 1/K$; if $y_2 = 0$, then $z_2 = 1/(K-1)$; and if $y_{K-1} = 0$, then
706-
$z_{K-1} = 1/2$.
674+
The sum-to-zero vector transform is described in further detail at the [sum-to-zero vector section of the *Reference Manual*](#sum-to-zero-vector-transform).
707675
708-
The break proportions $z$ are applied to determine the stick sizes and resulting
709-
value of $x_k$ for $1 \leq k < K$ by
676+
::: {.callout-note}
677+
All versions of Stan pre-2.37 used the stick-breaking transform.
678+
This is documented at [Stan 2.36 *Reference Manual: Simplex Transform*](https://mc-stan.org/docs/2_36/reference-manual/transforms.html#simplex-transform.section).
679+
:::
710680
711-
$$
712-
x_k =
713-
\left( 1 - \sum_{k'=1}^{k-1} x_{k'} \right) z_k.
714-
$$
681+
#### Absolute Jacobian determinant of the unit-simplex inverse transform {-}
715682
716-
The summation term represents the length of the original stick left at stage
717-
$k$. This is multiplied by the break proportion $z_k$ to yield $x_k$. Only $K-1$
718-
unconstrained parameters are required, with the last dimension's value $x_K$ set
719-
to the length of the remaining piece of the original stick,
683+
The Jacobian $J$ of the inverse unit-simplex transform is found by
684+
restricting $J$ to the subspace spanned by the sum-to-zero vector $z$.
685+
The Jacobian is given as the $(K - 1) \times (K - 1)$ matrix $J$ where
720686
721687
$$
722-
x_K = 1 - \sum_{k=1}^{K-1} x_k.
688+
J_{ij} = \frac{\partial x_i}{\partial z_j} =
689+
\frac{\partial}{\partial z_i} \left( \frac{\exp(z_i)}{{\sum_{i = 1}^K \exp(z_i)}} \right)
723690
$$
691+
and $i,j \in 1, \ldots, K - 1$.
724692
725-
### Absolute Jacobian determinant of the unit-simplex inverse transform {.unnumbered}
726-
727-
The Jacobian $J$ of the inverse transform $f^{-1}$ is lower-triangular, with
728-
diagonal entries
693+
The diagonal and off-diagonal derivatives are found using the derivative
694+
quotient rule and algebraic simplification
729695
730696
$$
731-
J_{k,k}
732-
=
733-
\frac{\partial x_k}{\partial y_k}
734-
=
735-
\frac{\partial x_k}{\partial z_k} \,
736-
\frac{\partial z_k}{\partial y_k},
697+
J_{ij} =
698+
\begin{cases}
699+
x_i (1 - x_i), & \text{if } i = j, \\
700+
-x_i x_j, & \text{if } i \neq j.
701+
\end{cases}
737702
$$
738703
739-
where
704+
In matrix form this can be expressed as
740705
741706
$$
742-
\frac{\partial z_k}{\partial y_k}
743-
= \frac{\partial}{\partial y_k}
744-
\mathrm{logit}^{-1} \left(
745-
y_k + \log \left( \frac{1}{K-k}
746-
\right)
747-
\right)
748-
= z_k (1 - z_k),
707+
J = \text{diag}(x) - x x^\top
749708
$$
750709
751-
and
710+
The determinant of this matrix can be found using the Matrix Determinant Lemma:
752711
753712
$$
754-
\frac{\partial x_k}{\partial z_k}
713+
\det\bigl(A + u v^{\top}\bigr)
755714
=
756-
\left(
757-
1 - \sum_{k' = 1}^{k-1} x_{k'}
758-
\right)
759-
.
715+
\det(A)\,\bigl(1 + v^{\top}A^{-1}u\bigr).
760716
$$
761717
762-
This definition is recursive, defining $x_k$ in terms of $x_{1},\ldots,x_{k-1}$.
763-
764-
Because the Jacobian $J$ of $f^{-1}$ is lower triangular and positive, its
765-
absolute determinant reduces to
718+
Here,
766719
767720
$$
768-
\left| \, \det J \, \right|
769-
\ = \
770-
\prod_{k=1}^{K-1} J_{k,k}
771-
\ = \
772-
\prod_{k=1}^{K-1}
773-
z_k
774-
\,
775-
(1 - z_k)
776-
\
777-
\left(
778-
1 - \sum_{k'=1}^{k-1} x_{k'}
779-
\right)
780-
.
721+
A \;=\; \operatorname{diag}(x_{1},\ldots, x_{K-1}),
722+
\quad
723+
u \;=\; -\bigl(x_1,\ldots, x_{K-1}\bigr)^{\!\top},
724+
\quad
725+
v \;=\; \bigl(x_{1}, \ldots, x_{K-1}\bigr)^{\!\top}.
781726
$$
782-
783-
Thus the transformed variable $Y = f(X)$ has a density given by
727+
Therefore,
784728
785729
$$
786-
p_Y(y)
787-
= p_X(f^{-1}(y))
788-
\,
789-
\prod_{k=1}^{K-1}
790-
z_k
791-
\,
792-
(1 - z_k)
793-
\
794-
\left(
795-
1 - \sum_{k'=1}^{k-1} x_{k'}
796-
\right)
797-
.
730+
\begin{aligned}
731+
\det(J)
732+
&=
733+
\bigg(\prod_{i=1}^{K-1} x_i \bigg)
734+
\bigg(1 + (x_{1},\ldots, x_{K-1})\,\mathrm{diag}\bigl(x_{1}^{-1},\ldots,x_{K-1}^{-1}\bigr)\,
735+
\big(-x_{1},\ldots,-x_{K-1}\big)^{\top}
736+
\bigg) \\
737+
&=
738+
\bigg(\prod_{i=1}^{K-1} x_{i}\bigg)
739+
\bigg(1 - \sum_{i=1}^{K-1} x_{i}\bigg)
740+
=
741+
\bigg(\prod_{i=1}^{K-1} x_{i}\bigg) x_{K} \\
742+
&=
743+
\prod_{i=1}^{K} x_{i}.
744+
\end{aligned}
798745
$$
799746
800-
Even though it is expressed in terms of intermediate values $z_k$, this
801-
expression still looks more complex than it is. The exponential function need
802-
only be evaluated once for each unconstrained parameter $y_k$; everything else
803-
is just basic arithmetic that can be computed incrementally along with the
804-
transform.
747+
### Unit simplex transform {-}
805748
806-
### Unit simplex transform {.unnumbered}
807-
808-
The transform $Y = f(X)$ can be derived by reversing the stages of the inverse
809-
transform. Working backwards, given the break proportions $z$, $y$ is defined
810-
elementwise by
749+
The transform $Y = f(X)$ can be derived by reversing the stages of the
750+
inverse transform,
811751
812752
$$
813753
y_k
814-
= \mathrm{logit}(z_k)
815-
- \mbox{log}\left(
816-
\frac{1}{K-k}
817-
\right)
754+
= H^\top \bigg(\log(x_k)
755+
- \frac{1}{K}\sum_{i=1}^K\log(x_i) \bigg)
818756
.
819757
$$
820758
821-
The break proportions $z_k$ are defined to be the ratio of $x_k$ to the length
822-
of stick left after the first $k-1$ pieces have been broken off,
823-
824-
$$
825-
z_k
826-
= \frac{x_k}
827-
{1 - \sum_{k' = 1}^{k-1} x_{k'}}
828-
.
829-
$$
759+
The matrix $H$ is the orthogonal basis matrix the sum-to-zero vector uses.
760+
Since the matrix is orthonormal, the transpose is the same as the inverse.
830761
831762
## Stochastic Matrix {#stochastic-matrix-transform.section}
832763
@@ -1520,4 +1451,4 @@ $$
15201451
\sum_{i > j}
15211452
\log \left( 1 - \sum_{j' < j} x_{i,j'}^2 \right)
15221453
.
1523-
$$
1454+
$$

0 commit comments

Comments
 (0)