You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2data-number="8.2" class="anchored" data-anchor-id="the-data"><spanclass="header-section-number">8.2</span> The data</h2>
388
388
<p>The data used to fit the models are the results of all matches from 2022-2023 and the budget of each team (for the 2nd model only). Our data therefore consists of two tables: one with one row per match, containing the home and away teams and the goals scored by each; another with one row per team, containing the team and its budget.</p>
<li>Field. The field identifier. Two teams play in each game, one being the home team, the other the away one. We use <spanclass="math inline">\(f\)</span> as the index indicating the field, which can take only two values <spanclass="math inline">\(h\)</span> or <spanclass="math inline">\(a\)</span>.</li>
476
476
<li>Arbitrary index. For theoretical concepts, we use <spanclass="math inline">\(i\)</span> to indicate an arbitrary index.</li>
<p>There are even more examples of predictive tasks where this particular model can be of use. However, it is important to keep in mind that this model predicts the number of goals scored. Its results can be used to estimate probabilities of victory and other derived quantities, but calculating the likelihood of these derived quantities may not be straightforward. And as you can see above, there isn’t <em>one</em> unique predictive task: it all depends on the specific question you’re interested in. As often in statistics, the answer to these questions lies <em>outside</em> the model, <em>you</em> must tell the model what to do, not the other way around.</p>
497
497
<p>Even though we know that the predictive task is ambiguous, we will start trying to calculate <code>az.loo</code> with <code>idata_base</code> and then work on the examples above and a couple more to show how would this kind of tasks be performed with ArviZ. But before that, let’s see what ArviZ says when you naively ask it for the LOO of a multi-likelihood model:</p>
<p>with <spanclass="math inline">\(i\)</span> being both the match indicator (<spanclass="math inline">\(m\)</span>, which varies with <spanclass="math inline">\(i\)</span>) and the field indicator (<spanclass="math inline">\(f\)</span>, here always fixed at <spanclass="math inline">\(h\)</span>). These are precisely the values stored in the <code>home_goals</code> of the <code>log_likelihood</code> group of <code>idata_base</code>.</p>
515
515
<p>We can tell ArviZ to use these values using the argument <code>var_name</code>.</p>
<p>with <spanclass="math inline">\(i\)</span> being equal to the match indicator <spanclass="math inline">\(m\)</span>. Therefore, we have <spanclass="math inline">\(M\)</span> observations like in the previous example, but each observation has two components.</p>
676
676
<p>We can calculate the product as a sum of logarithms and store the result in a new variable inside the <code>log_likelihood</code> group.</p>
<p>Therefore, unlike in previous cases, we have <spanclass="math inline">\(2M\)</span> observations.</p>
782
782
<p>We can obtain the pointwise log likelihood corresponding to this case by concatenating the pointwise log likelihoods of <code>home_goals</code> and <code>away_goals</code>. Then, like in the previous case, store the result in a new variable inside the <code>log_likelihood</code> group.</p>
<p>In this situation, we could describe the cross validation as excluding a team. When we exclude a team, we will exclude all the matches played by the team, not only the goals scored by the team but the whole match. Here is the illustration:</p>
<p>In the first column, we are excluding “Levante U.D.” which in the rows shown only appears once. In the second one, we are excluding “Athletic Club” which appears two times. This goes on following the order of appearance in the away team column.</p>
<pre><code>/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
<pre><code>/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
0 commit comments