Timing and value discrepancies found in LSS files (>=1.6.0)

### Timing and value discrepancies found in LSS files (>=1.6.0) 

Some necessary context first. 

I occasionally dabble in speedrunning, albeit poorly and under a different alias, and wanted more insights on my progress. Long story short, this ran out of hand and I ended up mapping out the entire LSS file format as part of a Python library called [`saltysplits`](https://github.com/jaspersiebring/saltysplits).

One of its features is that it makes it very easy to access and validate each element and attribute of a given LSS file (individually and in relation to each other). And although I'm finding it difficult not to make this sound like self-promotion (it's really not), I do believe that I stumbled upon three possible discrepancies that are worth sharing with the Livesplit team:

1. It is currently possible to have `AttemptHistory.Attempt` entries without `SegmentHistory.Time` entries *and* vice versa
2. `attempt_count` does not always match the number of actual attempts
3. Time information from a run's `AttemptHistory.Attempt` and `SegmentHistory.Time` entries don't add up to the same timedelta

I will be providing some code examples to illustrate these findings below. This assumes you've installed [`saltysplits`](https://github.com/jaspersiebring/saltysplits?tab=readme-ov-file#installation) and cloned the [`livesplit-core`](https://github.com/LiveSplit/livesplit-core) repository (we'll be using some of the LSS files included in this repository). If you want, you can run the same code examples with your own LSS files (>=1.6.0). If it uses an older format, just drag them in LiveSplit and export them again with [`Save Splits As...`](https://github.com/jaspersiebring/saltysplits?tab=readme-ov-file#exporting-your-lss-file-from-livesplit).


```python
import saltysplits as ss
import pandas as pd
from saltysplits import TimeType

# change these paths to wherever you cloned the livesplit-core repo 
CELESTE_PATH = "YOUR_REPOS/livesplit-core/tests/run_files/Celeste - Any% (1.2.1.5).lss"
SM64_PATH = "YOUR_REPOS/livesplit-core/tests/run_files/clean_sum_of_best.lss"
```

#### 1. It is currently possible to have `AttemptHistory.Attempt` entries without `SegmentHistory.Time` entries *and* vice versa


```python
# after passing validation, we can access all Livesplit elements and attributes with dot notation (see https://github.com/jaspersiebring/saltysplits/blob/main/src/saltysplits/models.py)
splits = ss.read_lss(lss_path = SM64_PATH)

# gather run IDs across all `AttemptHistory.Attempt` and `SegmentHistory.Time` entries
run_ids_from_attempts = set([attempt.id for attempt in splits.attempt_history])
run_ids_from_segments = set([time_entry.id for segment in splits.segments for time_entry in segment.segment_history ])

# find all `AttemptHistory.Attempt` entries without `SegmentHistory.Time` entries *and* vice versa
attempts_without_times = run_ids_from_attempts - run_ids_from_segments
times_without_attempts = run_ids_from_segments - run_ids_from_attempts

if attempts_without_times:
    print(f"{len(attempts_without_times)} AttemptHistory.Attempt entries without SegmentHistory.Time entries found with the following run IDs:\n{sorted(attempts_without_times, key=int)}")
if times_without_attempts:
    print(f"{len(times_without_attempts)} SegmentHistory.Time entries without AttemptHistory.Attempt entries found with the following run IDs:\n{sorted(times_without_attempts, key=int)}")

```

    109 AttemptHistory.Attempt entries without SegmentHistory.Time entries found with the following run IDs:
    ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '106', '108', '110', '112', '113', '114', '117', '118', '123']
    2 SegmentHistory.Time entries without AttemptHistory.Attempt entries found with the following run IDs:
    ['-1', '0']
    

This shows that the `clean_sum_of_best.lss` has run attempts without splits **and** splits without run attempts. The former makes intuitive sense, it simply means that a run attempt was started and stopped before making it through the first segment (i.e. an early reset). The latter however does not. How can you have a split, but not have it be part of an attempt? 

Looking at the printed IDs, one can imagine that the `SegmentHistory.Time` entries associated with run `-1` just belong to run `1` and that the sign has somehow been written erroneously (this would already constitute a bug of sorts but not a major one). This however does not explain `SegmentHistory.Time` entries associated with run `0`. Where do these come from?

You can verify these findings by inspecting the LSS file itself in any text editor (although you might want to format it as XML, makes it much easier to read). Here, notice how the `AttemptHistory` element has no `Attempt` elements with run IDs `-1` and `0`?

    <?xml version="1.0" encoding="UTF-8"?>
    <Run version="1.8.0">
        <GameIcon/>
        <GameName>SM64: 120 Star</GameName>
        <CategoryName>100%</CategoryName>
        <Metadata>
            <Run id=""/>
            <Platform usesEmulator="False"/>
            <Region/>
            <SpeedrunComVariables/>
            <CustomVariables/>
        </Metadata>
        <LayoutPath/>
        <Offset>00:00:00.000000000</Offset>
        <AttemptCount>102</AttemptCount>
        <AttemptHistory>
            <Attempt id="1"/>
            <Attempt id="2">
                <RealTime>02:31:30.375976600</RealTime>
            </Attempt>

And here, one example of a `Segments.Segment` element with `SegmentHistory.Time` elements for runs `-1` and `0`, despite there not existing any `AttemptHistory.Attempt` elements for these runs.

    <Segment>
        <Name>HMC (52)</Name>
        <Icon/>
        <SplitTimes>
            <SplitTime name="Personal Best">
                <RealTime>01:12:40.402200400</RealTime>
                <GameTime>01:12:40.402200400</GameTime>
            </SplitTime>
        </SplitTimes>
        <BestSegmentTime>
            <RealTime>00:11:01.725424300</RealTime>
            <GameTime>00:09:11.204527300</GameTime>
        </BestSegmentTime>
        <SegmentHistory>
            <Time id="-1">
                <GameTime>00:20:35.827574200</GameTime>
            </Time>
            <Time id="0">
                <GameTime>00:09:11.204527300</GameTime>
            </Time>

#### 2. `attempt_count` does not always match the number of actual attempts

Accessing data through dot notation (as shown above) is nice if you're already familiar with the LSS 
structure. But a more spreadsheet-like representation is also available through `saltysplits`'s `to_df` method.


```python
splits = ss.read_lss(lss_path = CELESTE_PATH)

# example of dot notation access:
splits.attempt_count #this just prints the value associated with the attempt_count attribute, it is *not* (re)computed

# example of dataframe access:
splits_dataframe = splits.to_df()
display(splits_dataframe.iloc[:5, :5]) # display the first 5 "segments" for the first 5 "runs"

# the index of the dataframe are the segment names and the columns are the run IDs
print(f"The first five segment names: {splits_dataframe.index.tolist()[:5]}")
print(f"The first five run IDs: {splits_dataframe.columns.tolist()[:5]}")
```


<div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>1</th>
      <th>3</th>
      <th>6</th>
      <th>7</th>
      <th>10</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Prologue</th>
      <td>0 days 00:00:50.685000</td>
      <td>0 days 00:00:51.323000</td>
      <td>0 days 00:00:48.631000</td>
      <td>0 days 00:00:49.326000</td>
      <td>0 days 00:00:48.903000</td>
    </tr>
    <tr>
      <th>-Crossing</th>
      <td>0 days 00:00:48.657000</td>
      <td>0 days 00:00:39.141000</td>
      <td>0 days 00:00:41.118000</td>
      <td>0 days 00:00:37.362000</td>
      <td>0 days 00:00:35.691000</td>
    </tr>
    <tr>
      <th>-Chasm</th>
      <td>0 days 00:00:50.744000</td>
      <td>0 days 00:00:36.429000</td>
      <td>0 days 00:00:37.781000</td>
      <td>0 days 00:00:33.885000</td>
      <td>0 days 00:00:34.434000</td>
    </tr>
    <tr>
      <th>Forsaken City</th>
      <td>0 days 00:00:40.815000</td>
      <td>0 days 00:00:40.650000</td>
      <td>0 days 00:00:30.692000</td>
      <td>0 days 00:00:48.892000</td>
      <td>0 days 00:00:37.156000</td>
    </tr>
    <tr>
      <th>-Intervention</th>
      <td>0 days 00:01:17.270000</td>
      <td>0 days 00:01:10.910000</td>
      <td>0 days 00:01:06.674000</td>
      <td>0 days 00:01:08.429000</td>
      <td>0 days 00:01:05.673000</td>
    </tr>
  </tbody>
</table>
</div>


    The first five segment names: ['Prologue', '-Crossing', '-Chasm', 'Forsaken City', '-Intervention']
    The first five run IDs: ['1', '3', '6', '7', '10']
    

As shown before, "runs" don't necessarily have to have time *and* attempt entries associated with it (run IDs can come from either one). Because of this, what constitutes an "attempt" is somewhat ambiguous. That's ultimately why we have the `allow_empty` and `allow_partial` flags in `to_df`, so people can control what they consider a "run" (and an "attempt" at one).


```python
# here's how you'd compute the attempt_count for runs with 0 or more time entries
attempt_count_with_0_or_more_time_entries = splits.to_df(allow_empty=True, allow_partial=True).columns.size

# here's how you'd compute the attempt_count for runs with 1 or more time entries
attempt_count_with_1_or_more_time_entries = splits.to_df(allow_empty=False, allow_partial=True).columns.size

# here's how you'd compute the total number of completed runs (which would then no longer be attempts))
#completed_run_count = splits.to_df(allow_empty=False, allow_partial=False).columns.size

print(f"The attempt_count attribute value in the LSS file: {splits.attempt_count}")
print(f"The attempt_count for \"attempts\" with 0 or more time entries: {attempt_count_with_0_or_more_time_entries}")
print(f"The attempt_count for \"attempts\" with 1 or more time entries: {attempt_count_with_1_or_more_time_entries}")
```

    The attempt_count attribute value in the LSS file: 32
    The attempt_count for "attempts" with 0 or more time entries: 31
    The attempt_count for "attempts" with 1 or more time entries: 25
    

As shown above for the `Celeste - Any% (1.2.1.5).lss` file, none of the "attempt" definitions return the same `attempt_count` attribute value as written to the LSS file by LiveSplit, and thus don't appear to reflect the actual attempts made.

#### 3. Time information from a run's `AttemptHistory.Attempt` and `SegmentHistory.Time` entries don't add up to the same timedelta

Given all LSS elements and attributes for a completed run, here's three ways to find its completion time:
- You can take the run's `AttemptHistory.Attempt` entry and just find the cumulative time in either its `RealTime` or `GameTime` element
- You can go through all `Segments.Segment` entries, collect all `SegmentHistory.Time` entries that share this run's id and sum up their RealTime or GameTime elements.
- You can take the run's `AttemptHistory.Attempt` entry and subtract its `started` attribute from its `ended` attribute (does not include nanoseconds, won't be used here)

Ideally and logically speaking, you'd expect and want at least the first two methods to produce the same answer. Turns out, that's rarely the case (at least for the handful of >=1.6.0 LSS files that I sampled from [splits.io](https://splits.io/)).

Maybe there's time between creating that run's last `SegmentHistory.Time` element and updating the relevant attributes/elements in its associated `AttemptHistory.Attempt` element?


```python
splits = ss.read_lss(lss_path = CELESTE_PATH)
splits_dataframe = splits.to_df(allow_partial=False, time_type=TimeType.REAL_TIME)
run_ids = splits_dataframe.columns.to_list() 

# `to_df` already collects all `SegmentHistory.Time` entries per run ID, we only have to sum them up
times_from_segments = splits_dataframe.sum(axis=0)

# retrieving run times from each `AttemptHistory.Attempt` through dot notation and list comprehension
times_from_attempts = pd.Series({attempt.id: attempt.real_time for attempt in splits.attempt_history if attempt.id in run_ids}, index=run_ids)

# computing their absolute differences and combining everything into a single dataframe
time_differences = times_from_attempts - times_from_segments
time_differences = time_differences.abs()
times_dataframe = pd.DataFrame([times_from_attempts, times_from_segments, time_differences], index=["Time from Attempts", "Time from Segments", "Differences"]).T

display(times_dataframe)

print(f"On average, each run's total time differs by {time_differences.std().total_seconds()} seconds depending on whether you retrieve it from AttemptHistory.Attempt or the SegmentHistory.Time entries")
```


<div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Time from Attempts</th>
      <th>Time from Segments</th>
      <th>Differences</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>0 days 01:04:31.170000</td>
      <td>0 days 01:04:14.118999</td>
      <td>0 days 00:00:17.051001</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0 days 00:59:28.965000</td>
      <td>0 days 00:59:12.280998</td>
      <td>0 days 00:00:16.684002</td>
    </tr>
    <tr>
      <th>6</th>
      <td>0 days 00:47:58.618000</td>
      <td>0 days 00:47:58.617995</td>
      <td>0 days 00:00:00.000005</td>
    </tr>
    <tr>
      <th>7</th>
      <td>0 days 00:50:12.448000</td>
      <td>0 days 00:50:12.447998</td>
      <td>0 days 00:00:00.000002</td>
    </tr>
    <tr>
      <th>10</th>
      <td>0 days 00:47:10.743000</td>
      <td>0 days 00:47:10.742998</td>
      <td>0 days 00:00:00.000002</td>
    </tr>
    <tr>
      <th>11</th>
      <td>0 days 00:44:14.143000</td>
      <td>0 days 00:44:14.142997</td>
      <td>0 days 00:00:00.000003</td>
    </tr>
    <tr>
      <th>14</th>
      <td>0 days 00:42:55.145000</td>
      <td>0 days 00:42:55.144998</td>
      <td>0 days 00:00:00.000002</td>
    </tr>
    <tr>
      <th>19</th>
      <td>0 days 00:41:28.546000</td>
      <td>0 days 00:41:28.545998</td>
      <td>0 days 00:00:00.000002</td>
    </tr>
    <tr>
      <th>28</th>
      <td>0 days 00:40:44.782000</td>
      <td>0 days 00:40:44.782000</td>
      <td>0 days 00:00:00</td>
    </tr>
    <tr>
      <th>31</th>
      <td>0 days 00:39:12.517000</td>
      <td>0 days 00:39:12.516999</td>
      <td>0 days 00:00:00.000001</td>
    </tr>
  </tbody>
</table>
</div>


    On average, each run's total time differs by 7.112488 seconds depending on whether you retrieve it from AttemptHistory.Attempt or the SegmentHistory.Time entries
    

Lastly, I wanted to emphasize here that we don't compute any attributes or elements, we just mapped them out with appropriate [types](https://github.com/jaspersiebring/saltysplits/blob/main/src/saltysplits/models.py) and [annotations](https://github.com/jaspersiebring/saltysplits/blob/main/src/saltysplits/annotations.py). In fact, [we test](https://github.com/jaspersiebring/saltysplits/blob/main/tests/test_models.py) to ensure that the encoding and decoding of all (standardized) elements and attributes is lossless (i.e. if you were to dump any `saltysplits` models back to XML, they'd be identical to their original XML representations).

All of this to say, the values and claims here accurately reflect what's currently possible content-wise in LSS files (>=1.6.0).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timing and value discrepancies found in LSS files (>=1.6.0) #890