You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When iOS playback fails, the AVPlayer numeric error code alone is rarely enough to guess the root cause. The same code is overloaded across domains (e.g. -12642 means kCMFormatDescriptionError_ValueNotAvailable in CoreMedia and"No matching mediaFile from playlist" when AVPlayer raises it from the HLS path), some codes are completely undocumented (-12860, -12888, -12174), and several of our worst incidents were cascades — one server-side defect produced a sequence of codes, with the first symptom carrying the most diagnostic signal but the loudest log line being a downstream stale-playlist complaint.
We test against legal/well-formed streams, but spec-violating streams exist in the wild — we should be better at recognizing the effect of broken/malformed manifests on the player from error-code patterns alone.
What we know — go-live (server) bugs that produced iOS error codes
Codes we've actually seen and what we currently know about them
Code
Label / "errorComment" string
Cause we've identified
-12174
URLAsset os_log warning (no AVError mapping)
206 Partial Content returned with Transfer-Encoding: chunked and no Content-Length after a Range: request (#137); newer-sim variant: HTTP/1.1 Connection: close (#145)
-12642
"No matching mediaFile from playlist" (HLS context) — overloaded with kCMFormatDescriptionError_ValueNotAvailable
Missing EXT-X-DISCONTINUITY-SEQUENCE, live window too narrow vs forward buffer, or wrong content served (#149, #110)
-12646
"playlist parse error"
v7-vs-v9 mismatch, HOLD-BACK underflow, or bad tag insertion order (#151)
-12860
CoreMedia decode error (no public symbol)
Oversized/malformed playlist (#109); also genuine bad codec config in master
-12888
"playlist unchanged for 1.5× target duration"
Almost always downstream — player stopped polling after an earlier -12860/-12646 and the audio playlist now appears stale; or genuine go-live worker stall
PlaybackDiagnostics.swift:1150-1262 has the full reverse-engineered switch covering -11800…-11865 (AVError) and -12640…-12900 (CoreMedia families). Note the labels there are header-derived, not behavioral — -12642's real meaning in our incidents has nothing to do with "ValueNotAvailable".
The core problem
A bare numeric code is one signal. To classify cause we need three:
With these three, a rule classifier (code, comment_regex, predicates) → (probable_cause, confidence) turns -12642 into one of:
MISSING_DISCONTINUITY_SEQUENCE (high) — comment matches "No matching mediaFile" + a discontinuity wrap occurred in the last few seconds + variant switch was in flight
WINDOW_FELL_OFF_OLDEST_EDGE (medium) — comment matches "No matching mediaFile" + player position was within ~1× TARGETDURATION of MEDIA-SEQUENCE head
CM_FORMAT_VALUE_NOT_AVAILABLE (low) — error came from a non-HLS path
Server-side (analytics sidecar Analytics sidecar: ClickHouse + Grafana for cross-session event analysis & historical replay #336) — richer ruleset that runs over the HAR + SSE log retrospectively. Can use cross-session context (e.g. "all sessions on this content in the last 5 minutes hit -12860 → server-side regression"), update freely without app release, and backfill labels onto historical incidents.
Disambiguation cheat sheet — checked-in doc (e.g. apple/InfiniteStreamPlayer/IOS_ERROR_CODES.md or PRD section) capturing the (code, comment, context) → cause mappings as we learn them, alongside the issue refs that proved each cause. Both classifiers source rules from the same catalogue.
Concrete first steps
Add the missing observed codes (-12174, -12888) and HLS-context labels to interpretCoreMediaErrorCode / interpretAVErrorCode in PlaybackDiagnostics.swift. Cheap, immediate dashboard win.
Capture the context fingerprint at error-emission time: extend LocalHTTPProxy's existing per-segment hook (Track per-segment identity via LocalHTTPProxy (iOS) #157) to keep a small ring buffer of the last N (~10) responses (URL, status, framing, size, byterange) so an errorLog event can attach the immediately-preceding HTTP context.
Define the rule schema (likely YAML/JSON: {code, comment_regex, predicates[], cause, confidence, refs[]}) and seed it with the known cases from this issue.
Wire the client-side classifier into the metric payload (player_metrics_probable_cause) and the dashboard error lane.
Why
When iOS playback fails, the AVPlayer numeric error code alone is rarely enough to guess the root cause. The same code is overloaded across domains (e.g.
-12642means kCMFormatDescriptionError_ValueNotAvailable in CoreMedia and "No matching mediaFile from playlist" when AVPlayer raises it from the HLS path), some codes are completely undocumented (-12860,-12888,-12174), and several of our worst incidents were cascades — one server-side defect produced a sequence of codes, with the first symptom carrying the most diagnostic signal but the loudest log line being a downstream stale-playlist complaint.We test against legal/well-formed streams, but spec-violating streams exist in the wild — we should be better at recognizing the effect of broken/malformed manifests on the player from error-code patterns alone.
What we know — go-live (server) bugs that produced iOS error codes
EXT-X-DISCONTINUITY— ~620 B → ~4.8 KB)-12860decode error → poll stops →-12888"playlist unchanged for 1.5× target duration" on audio → playback dies-12860on 540p → ABR upshift to 2160p → repeated-12642"No matching mediaFile" →-12888cascadeEXT-X-DISCONTINUITY-SEQUENCE(RFC 8216 §4.3.3.3 violation) +MAX_LIVE_WINDOW_DURATIONtoo tight (36 s vs 20 s buffer)-12642+ AVPlayer "Cannot Open" on cross-discontinuity ABR upshift; oldest-edge fall-offEXT-X-VERSION:7playlist; orHOLD-BACK < 3 × TARGETDURATION; orEXT-X-STARTinserted between#EXTM3Uand#EXT-X-VERSION-12646"playlist parse error" — entire playlist rejectedHOLD-BACKnot preserved across stall recovery (client-fix in iOS app)LLHLSGeneratorper 200 ms tick, double-lock duplicate MPD generation)-12888/stalls206 Partial ContentwithContent-LengthstrippedURLAsset err=-12174flood on byte-range fetches-12174after #137 fix; sim-only AVFoundation tighteningURLSessionDataTasknot cancelled when AVPlayer abandoned clientECANCELED(POSIX 89) flood — wasted upstream trafficCodes we've actually seen and what we currently know about them
-12174URLAssetos_log warning (no AVError mapping)206 Partial Contentreturned withTransfer-Encoding: chunkedand noContent-Lengthafter aRange:request (#137); newer-sim variant: HTTP/1.1Connection: close(#145)-12642kCMFormatDescriptionError_ValueNotAvailableEXT-X-DISCONTINUITY-SEQUENCE, live window too narrow vs forward buffer, or wrong content served (#149, #110)-12646HOLD-BACKunderflow, or bad tag insertion order (#151)-12860-12888-12860/-12646and the audio playlist now appears stale; or genuine go-live worker stallPlaybackDiagnostics.swift:1150-1262has the full reverse-engineered switch covering-11800…-11865(AVError) and-12640…-12900(CoreMedia families). Note the labels there are header-derived, not behavioral —-12642's real meaning in our incidents has nothing to do with "ValueNotAvailable".The core problem
A bare numeric code is one signal. To classify cause we need three:
AVPlayerItem.error.code/errorLog().events.last.errorStatusCode.errorComment/localizedDescription. This is what disambiguates the overloaded codes (-12642→ "No matching mediaFile" vs CM-domain).Content-Lengthvs chunked TE),Content-TypeEXT-X-DISCONTINUITY-SEQUENCEvalue seenWith these three, a rule classifier
(code, comment_regex, predicates) → (probable_cause, confidence)turns-12642into one of:MISSING_DISCONTINUITY_SEQUENCE(high) — comment matches "No matching mediaFile" + a discontinuity wrap occurred in the last few seconds + variant switch was in flightWINDOW_FELL_OFF_OLDEST_EDGE(medium) — comment matches "No matching mediaFile" + player position was within ~1× TARGETDURATION ofMEDIA-SEQUENCEheadCM_FORMAT_VALUE_NOT_AVAILABLE(low) — error came from a non-HLS pathUNKNOWN(fallback)Proposed direction (going forwards)
Hybrid client + server classifier:
PlaybackDiagnosticsfor the ~5 codes we've actually triggered. Stampsprobable_cause+confidenceinto the metric + on-device 911 HAR (Add '911' / user-marked button to all player apps + auto-HAR-snapshot on server #308) / freeze HAR (Auto-snapshot HAR on player freeze / auto-recovery #273). Slow to update (ships with the app) but lets the dashboard label incidents in real time.-12860→ server-side regression"), update freely without app release, and backfill labels onto historical incidents.apple/InfiniteStreamPlayer/IOS_ERROR_CODES.mdor PRD section) capturing the (code, comment, context) → cause mappings as we learn them, alongside the issue refs that proved each cause. Both classifiers source rules from the same catalogue.Concrete first steps
-12174,-12888) and HLS-context labels tointerpretCoreMediaErrorCode/interpretAVErrorCodeinPlaybackDiagnostics.swift. Cheap, immediate dashboard win.LocalHTTPProxy's existing per-segment hook (Track per-segment identity via LocalHTTPProxy (iOS) #157) to keep a small ring buffer of the last N (~10) responses (URL, status, framing, size, byterange) so anerrorLogevent can attach the immediately-preceding HTTP context.{code, comment_regex, predicates[], cause, confidence, refs[]}) and seed it with the known cases from this issue.player_metrics_probable_cause) and the dashboard error lane.Open questions
probable_causebe a free-form string or an enum? Enum is cleaner for analytics; free-form is more honest about partial knowledge.References
apple/InfiniteStreamPlayer/InfiniteStreamPlayer/PlaybackDiagnostics.swift:1150-1262go-live/pkg/generator/range_hls.go:183-200,go-live/pkg/generator/ll_hls.go:20,129,go-live/internal/api/handlers.go:2010,go-proxy/cmd/server/main.go:4765,4802