Model architecture does not have H*W*Q size tensors for (a,b) probability

As mentioned in the [paper](https://arxiv.org/pdf/1603.08511.pdf), the loss function uses **Z, Z^** which are of shape `H*W*Q`. However the model architecture computes the `(a,b)` probability distribution in a tensor of shape `H/4 * W/4 * Q`. How are we computing **Z^** then? Is it not supposed to be predicted by the model and we use it for calculation of the loss?