Skip to content

feat: focus on safe, compliant, local-first, non-expiring service worker #840

@SgtPooki

Description

@SgtPooki

e.g. remove serviceWorkerRegistrationTTL/timebomb/expiry... a continuation of the original thought behind #724, without as much verbal communication of the issue, and more documentation of my reasoning.

We chatted about this at yesterday's Helia WG and so I wanted to come back to this with a more thorough argument (that I've voiced before, but never written down).

So.. built some logic tables to map out how service worker updates are actually provided to the user. The tables below compare service worker update scenarios with and without the registration TTL/timebomb mechanism.

Service Worker Update Logic Tables

This was generated with the help of an LLM, and then cleaned up and verified. The summary section at the bottom is 99% human-generated.

Variables

  • userOnlineStatus: Online/Offline
  • isSwUpdateAvailable: Yes/No (new version exists)
  • userNavigationBehavior: Navigates/Stays on same page
  • timeSinceLastUpdateCheck: < 24h / ≥ 24h (since last update check that browsers do) see FF and Chromium
  • ttlExpired: Yes/No (TTL has expired, only relevant when TTL is used)
  • doesUserGetUpdatedSw: Yes/No (final outcome)

Table 1: WITHOUT TTL (Pure Browser Updates)

Scenario Online Update Available Navigation Time Since Check Gets Updated SW Notes
1 < 24h Normal update on navigation
2 ≥ 24h Update check triggered after 24h
3 < 24h No navigation = no update check
4 ≥ 24h No navigation = no update check
5 < 24h No update needed, current SW works
6 ≥ 24h No update needed, current SW works
7 < 24h No update needed, current SW works
8 ≥ 24h No update needed, current SW works
9 < 24h Offline = no update check possible
10 ≥ 24h Offline = no update check possible
11 < 24h Offline + no navigation = no update
12 ≥ 24h Offline + no navigation = no update
13 < 24h No update needed, current SW works
14 ≥ 24h No update needed, current SW works
15 < 24h No update needed, current SW works
16 ≥ 24h No update needed, current SW works

Table 2: WITH TTL (Browser Updates + TTL Expiration)

Scenario Online Update Available Navigation Time Since Check TTL Expired Gets Updated SW Notes
1 < 24h Normal update on navigation
2 ≥ 24h Update check triggered after 24h
3 < 24h TTL expires → SW unregisters → new SW installs
4 ≥ 24h TTL expires → SW unregisters → new SW installs
5 < 24h No navigation = no update check
6 ≥ 24h No navigation = no update check
7 < 24h TTL expires but no navigation to trigger re-registration
8 ≥ 24h TTL expires but no navigation to trigger re-registration
9 < 24h No update needed, current SW works
10 ≥ 24h No update needed, current SW works
11 < 24h TTL expires → SW unregisters → same SW reinstalls
12 ≥ 24h TTL expires → SW unregisters → same SW reinstalls
13 < 24h No update needed, current SW works
14 ≥ 24h No update needed, current SW works
15 < 24h TTL expires → SW unregisters → same SW reinstalls
16 ≥ 24h TTL expires → SW unregisters → same SW reinstalls
17 < 24h Offline = no update check possible
18 ≥ 24h Offline = no update check possible
19 < 24h TTL disabled when offline
20 ≥ 24h TTL disabled when offline
21 < 24h Offline + no navigation = no update
22 ≥ 24h Offline + no navigation = no update
23 < 24h TTL disabled when offline
24 ≥ 24h TTL disabled when offline
25 < 24h No update needed, current SW works
26 ≥ 24h No update needed, current SW works
27 < 24h TTL disabled when offline, but no update needed
28 ≥ 24h TTL disabled when offline, but no update needed
29 < 24h No update needed, current SW works
30 ≥ 24h No update needed, current SW works
31 < 24h TTL disabled when offline, but no update needed
32 ≥ 24h TTL disabled when offline, but no update needed

Key Differences: TTL vs No TTL

Scenarios Where TTL Actually Makes a Difference

Looking at the logic tables, the TTL mechanism only provides a benefit in one specific scenario:

Scenario Without TTL With TTL Actual Difference
User goes offline with old SW → update available while offline → user stays offline indefinitely ✅ SW remains active, cached content accessible ❌ SW unregisters when TTL expires, cached content becomes inaccessible TTL removes access to cached content (this could be badbits, but probably not..)

Scenarios Where TTL Provides No Benefit

All other scenarios show identical outcomes between TTL and no-TTL:

Scenario Without TTL With TTL Difference
User offline → update available → comes back online ✅ Gets update on navigation ✅ Gets update on navigation None
User online with update available ✅ Gets update on navigation ✅ Gets update on navigation None
User stays on same page for extended period ❌ No update (no navigation) ❌ No update (no navigation) None
No update available ✅ Current SW works ✅ Current SW works None

TTL Limitations

  • TTL check only runs during fetch events: If service worker is idle, TTL expiration won't be detected until next navigation
  • TTL disabled when offline: When navigator.onLine === false, TTL check returns true (no expiration)
  • Requires navigation: Even when TTL expires, user must navigate to trigger service worker re-registration

Analysis

Assuming we implement badbits blocking in the service worker gateway:

The Only Real Benefit of TTL

Content Takedown for Offline Users: The TTL mechanism can remove access to cached content (including potentially problematic content) for users who go offline and never come back online. Without TTL, these users would retain access to cached content indefinitely. But they need to explicitly attempt to load the content again for it to fire.

The Real Cost of TTL

Degraded Offline Experience: Users who rely on the service worker for offline functionality may lose access to cached content when the TTL expires, even if they're still offline and the content is still valid.

Key Insight

The serviceWorkerRegistrationTTL is a content takedown mechanism that works by unregistering the service worker after a time period, which removes access to all cached content. This is useful for:

  1. Legal compliance: Removing access to content that has been flagged as problematic

Summary

The "bad deployment recovery" argument is false

TTL doesn't help with nasty bugs because users still won't get updates unless they meet the same browser update requirements (online + navigation). The only way to deploy a non-updatable service worker is by changing the sw path in registration, which we've already solved with the namespaced ipfs-sw-sw.js path. (don't ever change this.. there is a test that confirms it exists in the output. we could probably override types for navigator.serviceWorker.register as well, and in that type override, link to this issue).

TTL degrades experience for legitimate users

Since we have no way to know if a user has accessed badbits content, TTL will break offline functionality for users accessing legitimate IPFS content.

More thoughts...

IANAL -- but if we provide the updated code, the above logic table should be sufficient for proving that a user is keeping access to that content intentionally, and that if they do chose to do so, TTL will only break them if they explicitly reload the page, and we have little control over that. And by breaking users who are explicitly choosing to keep bad content, we are breaking 99% of users who are not doing so.


The below is very related to #72.

I think that it's also worth noting that badbits are already filtered out with the default config (trustless-gateway), though we are still doing network requests to peers that may not be blocking badbits.

We should find some solution to block badbits in the service-worker gateway, and then remove TTL. Still, with badbits check in the service worker, the users can still choose to block their service worker from ever updating, and deny any future updates to what is blocked. I think this would be acceptable

Some potential solutions for badbits in the service worker:

  • We could implement badbits on the hosting layer so that subdomains and path requests to blocked content always returns an error. This allows us to quickly "hotfix" the badbits list prior to the public badlist getting updated.. but users can still get around this if they explicitly want to.
  • We could host a badbits server that makes cachable requests/responses for badbits checks.
  • We could implement badbits in the service worker directly.
    • This requires a smart Xor filter + MPHF + fingerprint.
    • The list is ~30Mb now, we can't fit that in the service worker..
    • A rough estimate of a combined xor filter + mphf + 32-bit fingerprint (for 500k entries in the badbits list) would be about 2.5MiB (S_mib(numKeys) = numKeys * (xorBitsPerKey/*9*/ + mphfBitsPerKey/*2.62*/ + fingerprintBitsPerKey/*32*/) / (8 * 2^20)) --
      • This setup would give us, based on 614M (see non-unique) requests to IPFS gateways daily, roughly a 18.4% probability of at least one false positive in a year (i.e. blocking an item thats not in the badbits list).
      • bumping to 40-bit fingerprint (bump SW size to 2.94MiB) would give us a 0.0796% probability of at least one FP in a year.
      • bumping to 64-bit fingerprint (bump SW size to 4.32MiB) would give us a 0.0000000000475% probability of at least one FP in a year.
    • Note that inbrowser. does not receive this high of traffic, and that the badbits list will grow.
  • extend badbits check into a consensus service provided by kubo and other nodes... this would be a little overkill I think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium: Good to have, but can wait until someone steps upeffort/weeksEstimated to take multiple weeksexp/intermediatePrior experience is likely helpfulkind/discussionTopical discussion; usually not changes to codebaseneed/analysisNeeds further analysis before proceeding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions