-
Notifications
You must be signed in to change notification settings - Fork 22
Description
e.g. remove serviceWorkerRegistrationTTL/timebomb/expiry... a continuation of the original thought behind #724, without as much verbal communication of the issue, and more documentation of my reasoning.
We chatted about this at yesterday's Helia WG and so I wanted to come back to this with a more thorough argument (that I've voiced before, but never written down).
So.. built some logic tables to map out how service worker updates are actually provided to the user. The tables below compare service worker update scenarios with and without the registration TTL/timebomb mechanism.
Service Worker Update Logic Tables
This was generated with the help of an LLM, and then cleaned up and verified. The summary section at the bottom is 99% human-generated.
Variables
- userOnlineStatus: Online/Offline
- isSwUpdateAvailable: Yes/No (new version exists)
- userNavigationBehavior: Navigates/Stays on same page
- timeSinceLastUpdateCheck: < 24h / ≥ 24h (since last update check that browsers do) see FF and Chromium
- ttlExpired: Yes/No (TTL has expired, only relevant when TTL is used)
- doesUserGetUpdatedSw: Yes/No (final outcome)
Table 1: WITHOUT TTL (Pure Browser Updates)
| Scenario | Online | Update Available | Navigation | Time Since Check | Gets Updated SW | Notes |
|---|---|---|---|---|---|---|
| 1 | ✅ | ✅ | ✅ | < 24h | ✅ | Normal update on navigation |
| 2 | ✅ | ✅ | ✅ | ≥ 24h | ✅ | Update check triggered after 24h |
| 3 | ✅ | ✅ | ❌ | < 24h | ❌ | No navigation = no update check |
| 4 | ✅ | ✅ | ❌ | ≥ 24h | ❌ | No navigation = no update check |
| 5 | ✅ | ❌ | ✅ | < 24h | ❌ | No update needed, current SW works |
| 6 | ✅ | ❌ | ✅ | ≥ 24h | ❌ | No update needed, current SW works |
| 7 | ✅ | ❌ | ❌ | < 24h | ❌ | No update needed, current SW works |
| 8 | ✅ | ❌ | ❌ | ≥ 24h | ❌ | No update needed, current SW works |
| 9 | ❌ | ✅ | ✅ | < 24h | ❌ | Offline = no update check possible |
| 10 | ❌ | ✅ | ✅ | ≥ 24h | ❌ | Offline = no update check possible |
| 11 | ❌ | ✅ | ❌ | < 24h | ❌ | Offline + no navigation = no update |
| 12 | ❌ | ✅ | ❌ | ≥ 24h | ❌ | Offline + no navigation = no update |
| 13 | ❌ | ❌ | ✅ | < 24h | ❌ | No update needed, current SW works |
| 14 | ❌ | ❌ | ✅ | ≥ 24h | ❌ | No update needed, current SW works |
| 15 | ❌ | ❌ | ❌ | < 24h | ❌ | No update needed, current SW works |
| 16 | ❌ | ❌ | ❌ | ≥ 24h | ❌ | No update needed, current SW works |
Table 2: WITH TTL (Browser Updates + TTL Expiration)
| Scenario | Online | Update Available | Navigation | Time Since Check | TTL Expired | Gets Updated SW | Notes |
|---|---|---|---|---|---|---|---|
| 1 | ✅ | ✅ | ✅ | < 24h | ❌ | ✅ | Normal update on navigation |
| 2 | ✅ | ✅ | ✅ | ≥ 24h | ❌ | ✅ | Update check triggered after 24h |
| 3 | ✅ | ✅ | ✅ | < 24h | ✅ | ✅ | TTL expires → SW unregisters → new SW installs |
| 4 | ✅ | ✅ | ✅ | ≥ 24h | ✅ | ✅ | TTL expires → SW unregisters → new SW installs |
| 5 | ✅ | ✅ | ❌ | < 24h | ❌ | ❌ | No navigation = no update check |
| 6 | ✅ | ✅ | ❌ | ≥ 24h | ❌ | ❌ | No navigation = no update check |
| 7 | ✅ | ✅ | ❌ | < 24h | ✅ | ❌ | TTL expires but no navigation to trigger re-registration |
| 8 | ✅ | ✅ | ❌ | ≥ 24h | ✅ | ❌ | TTL expires but no navigation to trigger re-registration |
| 9 | ✅ | ❌ | ✅ | < 24h | ❌ | ❌ | No update needed, current SW works |
| 10 | ✅ | ❌ | ✅ | ≥ 24h | ❌ | ❌ | No update needed, current SW works |
| 11 | ✅ | ❌ | ✅ | < 24h | ✅ | ❌ | TTL expires → SW unregisters → same SW reinstalls |
| 12 | ✅ | ❌ | ✅ | ≥ 24h | ✅ | ❌ | TTL expires → SW unregisters → same SW reinstalls |
| 13 | ✅ | ❌ | ❌ | < 24h | ❌ | ❌ | No update needed, current SW works |
| 14 | ✅ | ❌ | ❌ | ≥ 24h | ❌ | ❌ | No update needed, current SW works |
| 15 | ✅ | ❌ | ❌ | < 24h | ✅ | ❌ | TTL expires → SW unregisters → same SW reinstalls |
| 16 | ✅ | ❌ | ❌ | ≥ 24h | ✅ | ❌ | TTL expires → SW unregisters → same SW reinstalls |
| 17 | ❌ | ✅ | ✅ | < 24h | ❌ | ❌ | Offline = no update check possible |
| 18 | ❌ | ✅ | ✅ | ≥ 24h | ❌ | ❌ | Offline = no update check possible |
| 19 | ❌ | ✅ | ✅ | < 24h | ✅ | ❌ | TTL disabled when offline |
| 20 | ❌ | ✅ | ✅ | ≥ 24h | ✅ | ❌ | TTL disabled when offline |
| 21 | ❌ | ✅ | ❌ | < 24h | ❌ | ❌ | Offline + no navigation = no update |
| 22 | ❌ | ✅ | ❌ | ≥ 24h | ❌ | ❌ | Offline + no navigation = no update |
| 23 | ❌ | ✅ | ❌ | < 24h | ✅ | ❌ | TTL disabled when offline |
| 24 | ❌ | ✅ | ❌ | ≥ 24h | ✅ | ❌ | TTL disabled when offline |
| 25 | ❌ | ❌ | ✅ | < 24h | ❌ | ❌ | No update needed, current SW works |
| 26 | ❌ | ❌ | ✅ | ≥ 24h | ❌ | ❌ | No update needed, current SW works |
| 27 | ❌ | ❌ | ✅ | < 24h | ✅ | ❌ | TTL disabled when offline, but no update needed |
| 28 | ❌ | ❌ | ✅ | ≥ 24h | ✅ | ❌ | TTL disabled when offline, but no update needed |
| 29 | ❌ | ❌ | ❌ | < 24h | ❌ | ❌ | No update needed, current SW works |
| 30 | ❌ | ❌ | ❌ | ≥ 24h | ❌ | ❌ | No update needed, current SW works |
| 31 | ❌ | ❌ | ❌ | < 24h | ✅ | ❌ | TTL disabled when offline, but no update needed |
| 32 | ❌ | ❌ | ❌ | ≥ 24h | ✅ | ❌ | TTL disabled when offline, but no update needed |
Key Differences: TTL vs No TTL
Scenarios Where TTL Actually Makes a Difference
Looking at the logic tables, the TTL mechanism only provides a benefit in one specific scenario:
| Scenario | Without TTL | With TTL | Actual Difference |
|---|---|---|---|
| User goes offline with old SW → update available while offline → user stays offline indefinitely | ✅ SW remains active, cached content accessible | ❌ SW unregisters when TTL expires, cached content becomes inaccessible | TTL removes access to cached content (this could be badbits, but probably not..) |
Scenarios Where TTL Provides No Benefit
All other scenarios show identical outcomes between TTL and no-TTL:
| Scenario | Without TTL | With TTL | Difference |
|---|---|---|---|
| User offline → update available → comes back online | ✅ Gets update on navigation | ✅ Gets update on navigation | None |
| User online with update available | ✅ Gets update on navigation | ✅ Gets update on navigation | None |
| User stays on same page for extended period | ❌ No update (no navigation) | ❌ No update (no navigation) | None |
| No update available | ✅ Current SW works | ✅ Current SW works | None |
TTL Limitations
- TTL check only runs during fetch events: If service worker is idle, TTL expiration won't be detected until next navigation
- TTL disabled when offline: When
navigator.onLine === false, TTL check returnstrue(no expiration) - Requires navigation: Even when TTL expires, user must navigate to trigger service worker re-registration
Analysis
Assuming we implement badbits blocking in the service worker gateway:
The Only Real Benefit of TTL
Content Takedown for Offline Users: The TTL mechanism can remove access to cached content (including potentially problematic content) for users who go offline and never come back online. Without TTL, these users would retain access to cached content indefinitely. But they need to explicitly attempt to load the content again for it to fire.
The Real Cost of TTL
Degraded Offline Experience: Users who rely on the service worker for offline functionality may lose access to cached content when the TTL expires, even if they're still offline and the content is still valid.
Key Insight
The serviceWorkerRegistrationTTL is a content takedown mechanism that works by unregistering the service worker after a time period, which removes access to all cached content. This is useful for:
- Legal compliance: Removing access to content that has been flagged as problematic
Summary
The "bad deployment recovery" argument is false
TTL doesn't help with nasty bugs because users still won't get updates unless they meet the same browser update requirements (online + navigation). The only way to deploy a non-updatable service worker is by changing the sw path in registration, which we've already solved with the namespaced ipfs-sw-sw.js path. (don't ever change this.. there is a test that confirms it exists in the output. we could probably override types for navigator.serviceWorker.register as well, and in that type override, link to this issue).
TTL degrades experience for legitimate users
Since we have no way to know if a user has accessed badbits content, TTL will break offline functionality for users accessing legitimate IPFS content.
More thoughts...
IANAL -- but if we provide the updated code, the above logic table should be sufficient for proving that a user is keeping access to that content intentionally, and that if they do chose to do so, TTL will only break them if they explicitly reload the page, and we have little control over that. And by breaking users who are explicitly choosing to keep bad content, we are breaking 99% of users who are not doing so.
The below is very related to #72.
I think that it's also worth noting that badbits are already filtered out with the default config (trustless-gateway), though we are still doing network requests to peers that may not be blocking badbits.
We should find some solution to block badbits in the service-worker gateway, and then remove TTL. Still, with badbits check in the service worker, the users can still choose to block their service worker from ever updating, and deny any future updates to what is blocked. I think this would be acceptable
Some potential solutions for badbits in the service worker:
- We could implement badbits on the hosting layer so that subdomains and path requests to blocked content always returns an error. This allows us to quickly "hotfix" the badbits list prior to the public badlist getting updated.. but users can still get around this if they explicitly want to.
- We could host a badbits server that makes cachable requests/responses for badbits checks.
- We could implement badbits in the service worker directly.
- This requires a smart Xor filter + MPHF + fingerprint.
- The list is ~30Mb now, we can't fit that in the service worker..
- A rough estimate of a combined xor filter + mphf + 32-bit fingerprint (for 500k entries in the badbits list) would be about 2.5MiB (
S_mib(numKeys) = numKeys * (xorBitsPerKey/*9*/ + mphfBitsPerKey/*2.62*/ + fingerprintBitsPerKey/*32*/) / (8 * 2^20)) --- This setup would give us, based on 614M (see non-unique) requests to IPFS gateways daily, roughly a 18.4% probability of at least one false positive in a year (i.e. blocking an item thats not in the badbits list).
- bumping to 40-bit fingerprint (bump SW size to 2.94MiB) would give us a 0.0796% probability of at least one FP in a year.
- bumping to 64-bit fingerprint (bump SW size to 4.32MiB) would give us a 0.0000000000475% probability of at least one FP in a year.
- Note that inbrowser. does not receive this high of traffic, and that the badbits list will grow.
- extend badbits check into a consensus service provided by kubo and other nodes... this would be a little overkill I think.