Modifying ranking components #40
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Aim
The VRS now uses a full VRS-regulated period, with VRS mandated invites and rules, as well as now having had a full major cycle and qualification period. Teams have wisened up and are operating in a manner with the VRS in higher regard, with more of both correct decision making and attempting to game the system. With higher quality output data and a more clear picture available on the future VRS ecosystem, this PR attempts to potentially provide an improved model that matches the current day context by slight adjustments to the component balance. It's important to note that the current VRS model has been shown to work effectively and has done an exceptional job at allocating invites for the major, this is more of a philosophical reinterpretation of what components and the balance that makes up the distribution compared to major change.
Current VRS situation
Figure 1 - Current winrate fit using matchdata up to date as of 06/10/25 and current model
The current fit, using the
matchdata_sample_20250510.json, is shown in Figure 1 above. While a worse fit than the original graph uploaded in the README as well as a reduction in performance compared to previously evaluated months, this isnt necessarily indicative of an issue. When calculating Brier scores at the event-level, there's a clear pattern of the high brier score events as events that are open LANs, notably Birch Cup (brier=0.4083). The reason why Birch Cup likely has the highest Brier Score is due to the large number of BO1's throughout the format as well as new teams with limited ranking information.Additionally at the same time as the recent LAN rush there was a void of typical teams in online events due to clashing schedules which then led to a further trickle down in online invites and a greater variety in teams. From this and during the period there was also noticeably higher brier score values for online events too likely caused by the above
Due to the recent major rush, the recent match period is far more varied than usual with opponents who don't typically meet and has led to unpredictable results from both the VRS as well as other prediction metrics. This fluidity has enabled more extreme matchups than usual, both where teams of differing ranks can interact as well as opportunities for brand new teams to meet established teams. A somewhat worse fit with volatile data isn't necessarily indicative of a model issue.
While the difference in fit isn't particularly indicative, the balance of the model has considerably shifted. Particularly focused on the LAN Wins component and its balance compared to other components.
Figure 2 - LAN Wins vs Global Rank (2023 Data using current model)
Figure 3 - LAN Wins vs Global Rank (2025 April Rankings)
Figure 4 - LAN Wins vs Global Rank (October Data with current model)
Shown in Figures 2,3 & 4 above. There has been a considerable shift in the saturation of LAN wins. LAN wins has gone from being a relatively lower impact and lower saturation component in 2023 to being the most saturated component and dominating contribution to team average at both the Tier 1 level and below.
When comparing the ranking makeup from the Austin Invite period to the Budapest Invite period it is possible to visualise how significant this change has been in just a few months as shown in Figure 5 below.
Figure 5 - Ranking modifier factor makeup comparing Austin to Budapest
LAN wins have had a very significant increase in saturation and there's nothing to suggest that this saturation will ever return to its pre Budapest cycle levels. Teams have realised the strength of LAN Wins and look to utilise local LANs to gain in the rankings. This PR looks at rebalancing the components to not necessarily necessitate maxing out on LAN wins and ensuring the balance between dependent and independent factors.
Minor Changes
Language
Across the repo there is some incorrect language used relating to the current application of the model. This mostly relates to artefacts of the past relating to how the model had previously worked. As teams look to more optimally perform through the VRS model I think there are certain areas which could do with updated language to accurately represent the current implementation
Hidden seed factors
While this PR primarily looks at the addition of a new seed modifier factor, it introduces the presentation of zero value factors (ownNetwork (existing) and ownPerformance(new)). While this will not relate to a team's final rank, this information is useful to present for teams utilising the /details/ of a ranking to aid in the ascertaining the worth of another team / potentially a tournament. Ignoring the PR, the showcase of ownNetwork in general would assist teams.
Match data sample
The appended
matchdata_sample_20250510.jsonis not an exact mirror as it is not reflective of the full format used in the true VRS matchdata json. This was data collected via the LiquipediaDB in line with the Liquipedia API usage guidelines and then tweaked to mirror the stage and event prizing that is present in VRS. The json body has remnants of the Liquipedia API embedded into it. It's not 100% accurate with some missing matches as well as differing start times. However, I think that it's important that there be usable matchdata that can be utilised to enable tweaking and testing of the VRS model. The current supplied matchdata sample will not run anymore and also does not reflect the matchdata that is seen in the VRS regulated world with VRS mandated invites. As focus grows on VRS, with much more discussion online and critique, I think it would be beneficial if the accessibility of testing the model would be improved.Performance as an additional seed modifier
As there has not been a public breakdown of the methodology / application of the VRS this relies on my own interpretation and opinion of the VRS. As it appears to me the VRS model's components are incredibly balanced and can be split in a multitude of ways. 2 Dependent factors (BCOL & OPPN) vs 2 independent factors (BOFF & LANW), bounty-related vs non bounty, as well as historically 2 smaller impact factors (OPPN & LANW) compared to the larger and quantitatively tangible components.
Figure 6 - Current Opponent Network values against Global Rank
As shown in Figure 6, Opponent Network values are relatively low, with a majority of the concentration being of a lower value compared to Bounty Collected in Figure 7 below.
Figure 7 - Current Bounty Collected values against Global Rank
Traditionally, LANW (LAN wins) has been similar to Opponent Network, being a final swing factor compared to the two dominating Bounty components. When looking at Figure 2, the concentration of high value LAN teams is far lower than current. Whilst the component does stretch to a greater value than the opponent network, its mass concentration is pretty similar. As LANW has now become far more prominent, this has offset the balance away from dependent points that rely on quality of opponent and components which have stakes applied.
LANW's is an important component. Through testing the model i did test the implementation of Tier 1 event LANW's being equal to the current 1.0 and Tier 2 events would only receive 0.5 from a LANW. This saw essentially no fit improvement whilst also reducing the ability to catch and overtake teams in the bubble. LANW's are a pretty crucial aspect to ensure there isnt significant stagnation, however the component may be too dominant and has caused an over pivot away from the quality of the opponent in match.
Instead, this PR looks at introducing another component via the bucketed approach that is dependent on the opponents achievements and has event stakes applied. As shown below.
Changes
In Phase 1, counters are introduced for a team's total rounds played and teams rounds won across all their matches played, scaled by the recency of the match the rounds were in.
In Phase 2 this is then converted into a scaled own performance relative to the reference rounds won and rounds played, with a curve function applied to ensure recently formed teams with strong recent performances aren't adversely affected by their recency
In Phase 3 this is then handled in a similar bucket approach as to opponent network and bounty collected, acting under the same decay and event stakes modification.
Results
This produces another component that is relative to opponents recent performance, matching the aim outlined in team.js in which it "rates each team highly if it can regularly win against other prestigious teams.", helping provide further evaluation on the prestige of an opponent.
Figure 8 - Expected Winrate against Observed Winrate using the proposed PR model with current data
Figure 8 is the model_fit using the PR's proposed changes, with improvements in both line and Spearman's Rho.
The opponent performance factors has an expected relation to global rank as shown in Figure 9 below
Figure 9 - Opponent Performance against Global Rank using the proposed PR model with current data
Additionally, using the assumption that a factor should have a good overall fit to the final_rank_value such that it matches both the average of other components as well as H2H the relationship is pretty good as shown in Figure 10 below:
Figure 10 - Opponent Performance against Final Rank Value using the proposed PR model with current data
With similar behaviour to what's observed with bounty offered as shown in Figure 11 below:
Figure 11 - Bounty Offered against Final Rank Value using the proposed PR model with current data
However, there is a noticeable higher weighting for Opponent Performance, with greater concentration in the upper bound of the factor.
Power vs Curve
Again, as there has not been a public breakdown of the methodology this was my own personal interpretation of whether the value should have the powerFunction applied or the curveFunction. As ownPerformance is an attained component compared to ownNetwork which is more an after effect of playing, it could potentially make more sense to align it with the bounty effects. This is because its potentially similar to the bounty application, of cash being a quantitative attained value. However, ownPerformance is still somewhat an after effect of playing further matches and is seemingly top heavy.
From that, it would make more sense to use the powerFunction, with the results as shown below.
Figure 12 - Expected Win Rate against Observed Win Rate for the PR, using powerFunction for opponentPerformance
As shown in Figure 12, using the powerFunction has a worse line then the curveFunction, however its still an improvement to the current model and does make potentially more logical sense in application of the global context.
When using the powerFunction, the weighting and saturation of the component is a lot more in line with the rest of the model as shown in Figure 13 and 14 below.
Figure 13 - Opponent Performance against Global Rank using the proposed PR model (using powerFunction) with current data
Figure 14 - Opponent Performance against Final Rank Value using the proposed PR model (using powerFunction) with current data
Effect
Whether or not the function uses power or curve there is still a rebalancing of the components. With an increase in components the overall impact of LANW drops to 20%. When running the model on the Major matchdata there is also incredibly little movement at the top level and has minimal effect on the standings.
As shown in Figure 15 and 16 below, the EU Major rankings are pretty similar, with only differences in qualified stage for border teams.
Figure 15 - EU Major Rankings using the proposed model with the powerFunction applied to opponent performance
Figure 16 - EU Major Rankings using the proposed model with the curveFunction applied to opponent performance
The improvements in fit are also not due to a concentration of probability to equal odds.
Figure 17 - Probability distribution for expected winrate for current model
Figure 18 - Probability distribution for expected winrate for the PR model (using curve Function)
As shown in Figure 17 and 18, the PR has a similar distribution to the existing model and isnt a concentration of probabilities in the middle of the range.
Overall, this PR looks at increasing the point reward from playing against teams of prestige to better balance the ratio of possible points via independent and dependent methods. This still ensures that LAN wins have significant benefit and are an attractive achievement for teams, but the reduction of LANW's weight from 25% to 20% ensures that its not a necessity to compete. roundParticipation already has a curveFunction applied to ensure initial teams have some worth akin to ownNetwork but due to this application it may be better to have this component work via the powerFunction as well as it fitting more with other applications of the model.
Further considerations
With the LAN component being high, there are potentially other ways to reduce its impact without adjusting the whole model's balance. Personally, I don't think scaling it to Tier or directly to opponent quality is the correct approach due to how it leads to ranking stagnation. When testing stake modification of LAN Wins the predominant outcome is that it becomes very hard to penetrate the top ranks and enables top level teams to fall off slower. Tier 1 teams that have been on a bad streak this cycle, which ended up missing out on the major, would have made it if there were stakes for LANW's due to the protective buffer this would create.
However, there could be a softer stakes application. Currently, Open LANs are only required to be announced with up to two weeks of notice. It's incredibly hard to plan around this margin, but almost necessary due to how lucrative they are. Instead of potentially harming local TO's by increasing the announcement date, where the VRS ranking is more of a benefit to the domestic scene instead of major determining, the flat 1.0 LAN win gain could be scaled to the announcement period. A lot of local LANs of the past have managed to announce their dates well in advance, and LANW's could have a step level scaling due to the notice period, i.e 0.5 for 2 weeks, 0.75 for 2 months, 1.0 for 6 months or any variation. As long as its the announcement dates and visa location for the event it's pretty reasonable for the smaller scale TO's as well as benefitting teams with more planning time for these key events.
Additionally, in the benefit of clarity for both TO's and teams it would potentially be beneficial to clarify the wildcard invite situation for event champions. Assuming it is rosters that hold the invites and not orgs, a solution in code could be completed in which at the same time as the rankings are updated a list of teams that are wildcard eligible are updated using in-built logic for the required conditions. Currently it's a bit of a grey area as to who is eligible for wildcard invites with none of the data being available publicly in one place. Liquipedia is the only location for ranking tiers, but Liquipedia holds no mandate and its VRS ranking categorization does not fall in line with what is used officially as shown in Figure 19 below. HLTV is the official source, but does not include event Tier and there's no indication as to whether it's the first, final or most played roster that would hold the invite.
Figure 19 - The only publicly accessible list of Ranked Tier 2 events on Liquipedia, but highlighted in red are events listed as ranked due to meeting the Liquipedia classification but weren't actually VRS ranked events
Conclusion
Overall, the current ranking model and its methodology is incredibly good. As raised in the Minor Changes section there are some small issues with outdated explanations as well as lack of accessibility to utilise, test and tweak the model but there are niche cases which only apply to a small amount of people. It's of my personal opinion that the current LAN component is too strong compared to the other components and somewhat diminishes the benefit of beating strong teams at either their respective level or above in which a team could generate similar benefit against low level opponents. This is dependent on coverage but occurrences have been observed through this major cycle.
However, it's important to note that the current LANW application is somewhat necessary. LANW's are essentially movers / chance creators in this ranking system. When reducing the worth or scaling them, the ability to break into that top bubble is incredibly difficult. We see strings of events using the same ranking invite, making penetration very difficult and if LANW's impact was significantly halted it would be very hard for upcoming teams to break into and grind to the threshold and it would provide too great a cushion for the top level teams. Roughly ~35% of future events between now and the end of 2026 incorporate non-wildcard non-LAN open qualifiers and breaking through one of these open Qualifiers and then going on to earn points is a tall order. Particularly with periods of Tier 1 events all using the same ranking invite, so the benefit of an Open Qualifier run could be rather diminished.
While I do think that the LANW component is too strong, it is an important equalizer and provides opportunity, I do recognise that if over punished it could make the rankings rather rigid.
Thanks for your time, the current ranking does a great job and has shown it does an incredible job at getting the best teams to the major.