feat: harvest LID mappings from GetUserDevices usync response#1
Open
Prodigy90 wants to merge 43 commits into
Open
feat: harvest LID mappings from GetUserDevices usync response#1Prodigy90 wants to merge 43 commits into
Prodigy90 wants to merge 43 commits into
Conversation
Co-authored-by: Tulir Asokan <[email protected]>
Upstream changes pulled in: - proto: bumps to v1039406452 (also v1037076227, v1037753511, v1038187123, v1038709472, v1038839325) — keeps WA web compat current - client: cstoken support (tulir#1102), full tctoken lifecycle (tulir#1081) - retry: catch panics in retry receipt goroutine (415df31), optional semaphore for retry receipts (a5594bf) — supersedes our local panic-recovery commit - msgsecret: include params when failing to decrypt, more secret encrypted message types, vote encryption sender selection - pair: macOS pair client constant, QR code format update, PairClientType to string, fix platform ID type - store: new default client payload fields, deleted device protection, GetManyIdentities/PutManyIdentities batch ops also present upstream now - security: constant-time byte comparisons in appstate/handshake/pair, noise certificate validity check, ephemeral key length check - appstate: send sync error event more consistently - handshake: check noise certificate validity - send: edit attribute on pin-in-chat messages, on-demand history sync attrs - download: sticker pack fetching, upload: media delete method - notification: fix handling own device list changes - dependencies: update + Go 1.26.2 toolchain Conflict resolved in receipt.go: kept upstream's `tryHandleRetryReceipt` helper (panic recovery + optional semaphore — strict upgrade over our local cdabef9) while preserving our c542190 `handleGroupedReceipt` changes for WhatsApp's new status broadcast receipt format (message_id attr fallback + per-user receipt-type parsing). All other local features verified intact: Participants field for status recipients, ExcludeDevices + typed 429 fields (RejectedParticipant, BackoffSeconds, RawErrorResponse), parallel encryption with GOMAXPROCS semaphore + identity/session caches, HistorySyncNotification blob release. Build: go build ./... exit 0. Vet: clean.
Two findings from the post-sync audit, applying the same lens tulir's tryHandleRetryReceipt (415df31 + a5594bf) raised over our cdabef9: P1 (user.go) — OnNoDeviceContacts callback ran in an unwrapped goroutine. A panic in application code would crash the entire worker pod. Same failure mode tulir's wrapper just protected against for retry receipts. Fix: - Extract to tryOnNoDeviceContacts named helper - defer recover() with debug.Stack() so deep callback panics are diagnosable from a single log line - Recover handler references only the panic value and stack — no data that could itself be the panic cause (avoids second-fault hazard) - Add opt-in cli.noDeviceContactsSema (*semaphore.Weighted) + setter SetMaxParallelNoDeviceContactsCallbacks mirroring the existing SetMaxParallelRetryReceiptHandling pattern. Defaults nil (unlimited) for backward compat. P2 (send.go) — parallel encryption recover logged only the panic value (%v), discarding the stack. When libsignal panics deep (e.g. nil SenderKeyDistributionMessage.Serialize, issue tulir#927), the recovered error was "panic encrypting for X: runtime error: invalid memory address" with no call chain. Now appends debug.Stack() so we can localize the actual offending line. Build clean, vet clean. Both changes are surface-only safety — no behavior change on the success path.
When the usync device-query response carries `<lid val="...@lid">` children on `<user>` nodes, extract them and persist via Store.LIDs.PutManyLIDMappings. Mirrors what GetUserInfo already does at user.go:232-237. Closes the gap where send-path GetUserDevices calls would route status broadcasts to a recipient's LID-AD device but never persist the implied PN<->LID partnership. With the mapping uncaptured, downstream LID-form receipts couldn't be canonicalized to PN, inflating per-message delivered_count by ~1.5x (same human counted twice, once via LID-server ack, once via PN-server ack). Baileys does the equivalent in src/Socket/messages-send.ts (storeLIDPNMappings on the usync result filter for entries with `lid` set), confirming the response shape carries this data on the message-namespace usync. Info log when mappings>0 for in-prod verification — expect one log per non-trivial status send showing the harvest count.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
In the
GetUserDevicesusync parsing loop (user.go:481-492), extract<lid val="...@lid">children from each<user>response node and persist viaStore.LIDs.PutManyLIDMappingsafter the loop. Mirrors the existing pattern inGetUserInfo(user.go:232-237).Why
WA's usync device-query response on the
messagenamespace carries the requester's PN↔LID mapping inline for every user whose identity has migrated to LID. The data flows through this function on every status send (prepareMessageNode→GetUserDevices(participants)) but is currently discarded.The downstream consequence: when WA delivers a status to one of these recipients' devices and the device acks back in LID form, the application's
Store.LIDs.GetPNForLIDlookup misses, the receipt is attributed to the LID JID string, and the same human's subsequent PN-form acks get counted as a separate human. This inflates per-message delivered counts by roughly 1.5× on any account with a meaningful LID-migrated audience.Baileys does the equivalent in
src/Socket/messages-send.ts(storeLIDPNMappingsover the usync result entries withlidset), confirming the response shape carries this data on the message-namespace usync.How
lidMappingsslice pre-allocated before the loopuser.GetChildByTag("lid")and append to the slice whenvalis non-emptyPutManyLIDMappingscall. INFO log when the slice is non-empty so operators can verify the harvest works in prodVerification
After deploy, expect an INFO log like
GetUserDevices: harvested N LID mappings from M input JIDson every status send with non-trivial recipient overlap with the LID-migrated audience. Operators can alsoSELECT COUNT(*) FROM whatsmeow_lid_mapbefore/after a status send to confirm growth.Backend pair
Paired with backend PR (
Prodigy90/whatsmeow_bot) that bumps this submodule and adds a receipt-handler canonicalization to dedupe LID/PN form-dupes into a single Redis set entry. Together they restorestatus_send_history.delivered_countaccuracy.Test plan
go build ./...cleango test ./...(no regressions; package has no test files foruser.goso coverage is mainly indirect)whatsmeow_lid_maprow count before/after one full status send for a known instrumented user