Skip to content

feat: robust WA version override + fallback chain with 405 retry#2487

Open
kaikybrofc wants to merge 5 commits intoWhiskeySockets:masterfrom
kaikybrofc:feat/version-override-fallback-405
Open

feat: robust WA version override + fallback chain with 405 retry#2487
kaikybrofc wants to merge 5 commits intoWhiskeySockets:masterfrom
kaikybrofc:feat/version-override-fallback-405

Conversation

@kaikybrofc
Copy link
Copy Markdown

@kaikybrofc kaikybrofc commented Apr 19, 2026

Summary

This PR adds native version resilience for fast WhatsApp Web protocol windows by introducing:

  • Manual version override (versionOverride)
  • Deterministic version resolution chain
  • lastKnownGoodVersion persistence (memory + disk)
  • Explicit version source logging
  • One-shot automatic retry with fallback on 405

Related issue: #2485

Why this helps

When WA rolls out quickly, clients may temporarily hit 405 due to version mismatch/rejection.
This change lets operators hotfix version selection immediately (without waiting for a new library release), while also giving Baileys a safer automatic fallback path.

What was implemented

1) New SocketConfig capabilities

  • versionOverride?: [number, number, number]
  • enableVersionFallbackRetry: boolean
  • versionCachePath?: string

2) Version resolution strategy

The connection version is resolved with strict priority:

  1. versionOverride (env/manual)
  2. fetchLatestWaWebVersion() (latest, primary)
  3. fetchLatestBaileysVersion() (latest, secondary fallback)
  4. lastKnownGoodVersion (lastKnownGood, memory/disk)
  5. DEFAULT_CONNECTION_CONFIG.version (default)

3) Last known good persistence

After successful login, the connected version is persisted as lastKnownGoodVersion and reused when latest resolution fails or is rejected.

4) Operational logging

Each connection attempt logs both resolved version and source:

  • env/manual
  • latest
  • lastKnownGood
  • default

5) Automatic retry on 405

If handshake/stream failure returns 405, Baileys performs one controlled retry using resolved fallback version (unless versionOverride is explicitly set).

6) Backward compatibility

If callers already set legacy config.version, it is promoted internally to versionOverride in the user-facing socket entrypoint.

Files changed

  • src/Types/Socket.ts
  • src/Defaults/index.ts
  • src/Socket/index.ts
  • src/Socket/socket.ts
  • src/Utils/versioning.ts (new)
  • src/__tests__/Utils/versioning.test.ts (new)

Test & validation report

Local library validation

  • npx tsc -p tsconfig.json: ✅ pass
  • jest src/__tests__/Utils/versioning.test.ts: ✅ pass (6/6)
  • npm run build: ✅ pass
  • npm run test: ✅ pass (17 suites, 180 tests)

Note on lint status:

  • npm run lint currently fails due to pre-existing repository issues unrelated to this PR:
    • src/Signal/libsignal.ts:459 (prefer-optional-chain)
    • src/WAUSync/USyncQuery.ts:49 (prefer-optional-chain)

Real integration report (consumer project)

Date: April 19, 2026

Scope executed:

  • Dependency switched to github:kaikybrofc/Baileys#feat/version-override-fallback-405
  • Installed and applied local patch (patch-package)
  • Build + automated tests
  • Real active WA connection via PM2
  • Controlled fallback-path testing (manual/latest/lastKnownGood/default)

Results:

  • Consumer npm run build: ✅ pass
  • Consumer npm run test: ✅ pass (19 files, 101 tests)
  • PM2 app (zyra) stayed online
  • Real WA connection opened successfully (no 405 in the observed production cycle)
  • Logs confirmed new resolver path:
    • versionSource: "env/manual"
    • version: [2,3000,1035194821]
  • Controlled fallback scenarios: ✅ 5/5 passed
    • manual override priority
    • invalid/latest failure -> lastKnownGood (memory/disk)
    • no latest + no valid cache -> default

Evidence captured in consumer app:

  • .baileys-last-known-good-version.json persisted
  • lockfile pinned to test branch dependency

Observed note:

  • A transient Redis reconnect error happened once and recovered; WA connection normalized afterward.

Risk / behavior notes

  • Real production 405 is not deterministic to force; treatment paths were validated through controlled tests.
  • Retry behavior is one-shot and guarded to avoid reconnect loops.

Outcome

This improves resilience during rapid protocol/version transitions and provides immediate operator control with clearer observability.


Summary by cubic

Adds robust WhatsApp Web version handling with a manual override, a clear fallback chain, and a one-shot retry on 405. Also adds per-path cache isolation and a safer reconnect flow to cut failed connections during fast WA rollouts.

  • New Features

    • Added versionOverride, enableVersionFallbackRetry (default true), and optional versionCachePath in SocketConfig.
    • Deterministic version resolution: versionOverride > latest WA Web > latest Baileys > lastKnownGood > default.
    • Persists lastKnownGoodVersion to memory and disk (isolated per versionCachePath); saved after successful login.
    • Logs resolved version and source; sanitizes fallback warning payloads.
    • On 405 during handshake/stream, performs one controlled reconnect with a different fallback version; resets noise, ignores transient WS error/close during retry, and skips if versionOverride is set.
    • Backward compatible: legacy config.version is auto-promoted to versionOverride.
    • Tests: added coverage validating disk reload by clearing memory cache.
  • Migration

    • Prefer versionOverride over version for manual pinning.
    • Optionally set versionCachePath (or BAILEYS_VERSION_CACHE_PATH) for lastKnownGoodVersion storage.
    • Keep enableVersionFallbackRetry on (default) to benefit from the 405 fallback reconnect.

Written for commit 55ac747. Summary will update on new commits.

Summary by CodeRabbit

  • New Features
    • One-shot automatic fallback retry when the server rejects a connection (HTTP 405), plus automatic version resolution and persistence to improve reconnections.
  • Configuration
    • New options to explicitly override the protocol version, enable/disable the fallback retry, and specify a path for storing the last-known-good version.
  • Bug Fixes
    • Improved reconnection flow to avoid duplicate shutdowns during retries.
  • Tests
    • Added tests for version resolution, persistence, and fallback logic.

Review Updates (April 19, 2026)

Addressed CodeRabbit actionable comments:

  • Added guard in global WebSocket error handler to ignore errors while version fallback retry is in progress (mirrors close guard behavior).
  • Reworked in-memory lastKnownGoodVersion cache to be isolated per cache path (Map<cachePath, version>), avoiding cross-path bleed.
  • Wrapped injected latest-version fetchers in try/catch so thrown rejections no longer break the fallback chain.
  • Added tests covering per-path memory cache isolation and thrown-fetcher fallback continuity.

Validation rerun after fixes:

  • npx tsc -p tsconfig.json
  • jest src/__tests__/Utils/versioning.test.ts ✅ (8/8)

@whiskeysockets-bot
Copy link
Copy Markdown
Contributor

whiskeysockets-bot commented Apr 19, 2026

Thanks for opening this pull request and contributing to the project!

The next step is for the maintainers to review your changes. If everything looks good, it will be approved and merged into the main branch.

In the meantime, anyone in the community is encouraged to test this pull request and provide feedback.

✅ How to confirm it works

If you’ve tested this PR, please comment below with:

Tested and working ✅

This helps us speed up the review and merge process.

📦 To test this PR locally:

# NPM
npm install @whiskeysockets/baileys@kaikybrofc/Baileys#feat/version-override-fallback-405

# Yarn (v2+)
yarn add @whiskeysockets/baileys@kaikybrofc/Baileys#feat/version-override-fallback-405

# PNPM
pnpm add @whiskeysockets/baileys@kaikybrofc/Baileys#feat/version-override-fallback-405

If you encounter any issues or have feedback, feel free to comment as well.

@kaikybrofc kaikybrofc mentioned this pull request Apr 19, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

Introduce WA Web version management: new SocketConfig fields (versionOverride, enableVersionFallbackRetry, versionCachePath); add disk-backed and in-memory version resolver/persistence; map legacy config.version; socket changes to resolve/save versions, reset handshake state, and perform a one-shot fallback reconnect on HTTP 405.

Changes

Cohort / File(s) Summary
Configuration & Types
src/Defaults/index.ts, src/Types/Socket.ts
Added versionOverride?: WAVersion, enableVersionFallbackRetry: boolean, and versionCachePath?: string to defaults and SocketConfig. Default enableVersionFallbackRetry is true.
Socket Entry
src/Socket/index.ts
Detects legacy config.version and maps it to versionOverride when versionOverride is not set.
Socket Core
src/Socket/socket.ts
Resolve/persist WA version during lifecycle, make ephemeral/noise state resettable, add resetNoiseState(), implement one-shot fallback retry on HTTP 405 (gated by config and version source), avoid double shutdown during retries, and save last-known-good version after login.
Versioning Utilities
src/Utils/versioning.ts
New module implementing resolveWaVersion, getLastKnownGoodVersion, saveLastKnownGoodVersion, clearLastKnownGoodVersionMemoryCache, and VersionSource. Adds in-memory + disk caching, validation, latest-fetch fallbacks, and injectable fetchers.
Tests
src/__tests__/Utils/versioning.test.ts
New Jest tests covering override behavior, latest-fetch fallback chain, disk persistence, in-memory cache isolation per cache path, and error/failure handling of fetchers.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Socket as Socket Handler
    participant Resolver as Version Resolver
    participant Cache as Version Cache
    participant WA as WA Server

    Client->>Socket: connect(config)
    Socket->>Resolver: resolveWaVersion(allowLatestFetch=true)
    Resolver->>Cache: read cached last-known-good (if any)
    Resolver->>WA: fetch latest WA Web version
    WA-->>Resolver: latest version / error
    Resolver-->>Socket: resolved version + source
    Socket->>WA: connect with resolved version

    alt WA responds 405
        WA-->>Socket: 405
        Socket->>Socket: scheduleVersionFallbackRetry()
        Socket->>Socket: resetNoiseState(), clear timers, close socket
        Socket->>Resolver: resolveWaVersion(allowLatestFetch=maybeFalse)
        Resolver->>Cache: read fallback last-known-good
        Resolver-->>Socket: fallback version + source
        Socket->>WA: reconnect with fallback version
        WA-->>Socket: connection success
    else connection success
        WA-->>Socket: connected
        Socket->>Resolver: saveLastKnownGoodVersion()
        Resolver->>Cache: persist version to disk & memory
    end

    Socket-->>Client: ready
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 I hopped through versions, old and new,
I sniffed the cache and fetched the view,
When 405 slammed a gloomy door,
I reset my keys and tried once more,
Then saved the good version and leapt to the dew.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding robust WA version override capability plus a fallback chain with 405 retry logic, which directly aligns with the primary objectives of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Socket/socket.ts`:
- Around line 751-755: The global WebSocket error handler currently calls end()
unconditionally which can preempt per-validation listeners and prevent the 405
retry in validateConnection/awaitNextMessage; modify the global "error" event
handler (the same place the global "close" handler checks
isPerformingVersionRetry) to check isPerformingVersionRetry and skip calling
end() when true, mirroring the guard used in the global "close" handler so
per-validation error handling and the retry logic in validateConnection can run.

In `@src/Utils/versioning.ts`:
- Around line 29-30: The in-memory cache memoryLastKnownGoodVersion is global
and can leak between different cache paths; change it to a map keyed by the
cache path (e.g., use a Map<string, WAVersion>) and update all uses (including
DEFAULT_VERSION_CACHE_FILENAME, getLastKnownGoodVersion, and any read/write
helpers around lines ~29 and ~64-87, 108-110) so that getLastKnownGoodVersion
first checks the map for the specific cachePath key, and setters store per-path
entries instead of a single global variable to isolate caches per socket/path.
- Around line 131-153: resolveWaVersion currently calls the injected fetchers
fetchLatestWaWebVersionFn and fetchLatestBaileysVersionFn directly so a thrown
rejection aborts the whole fallback chain; wrap each call in a try/catch and
convert thrown errors into a safe result object like { isLatest: false, error }
so the existing conditional (isLatest && isWAVersion(...)) fails and execution
continues to the next fallback; update the blocks around allowLatestFetch to
catch errors from fetchLatestWaWebVersionFn and fetchLatestBaileysVersionFn, log
the error along with latestWaWeb/latestBaileys objects, and preserve returning
cloneVersion(...) when isLatest is true as before (symbols: resolveWaVersion,
fetchLatestWaWebVersionFn, fetchLatestBaileysVersionFn, isWAVersion,
cloneVersion, cachedLastKnownGoodVersion).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 948496f8-9200-465a-a676-0e7f5d71f237

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5093c and e4ba5b7.

📒 Files selected for processing (6)
  • src/Defaults/index.ts
  • src/Socket/index.ts
  • src/Socket/socket.ts
  • src/Types/Socket.ts
  • src/Utils/versioning.ts
  • src/__tests__/Utils/versioning.test.ts

Comment thread src/Socket/socket.ts
Comment thread src/Utils/versioning.ts Outdated
Comment thread src/Utils/versioning.ts
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/__tests__/Utils/versioning.test.ts">

<violation number="1" location="src/__tests__/Utils/versioning.test.ts:162">
P3: Clear the in-memory cache before the reload assertion, otherwise this test never exercises the disk path and can pass even if persistence is broken.</violation>
</file>

<file name="src/Utils/versioning.ts">

<violation number="1" location="src/Utils/versioning.ts:68">
P2: The global memory cache ignores `versionCachePath`, so different cache files can leak versions into each other within the same process.</violation>
</file>

<file name="src/Socket/socket.ts">

<violation number="1" location="src/Socket/socket.ts:979">
P1: The global `ws.on('error', ...)` handler calls `end()` unconditionally, which will set `closed = true` and tear down the connection before the 405 retry logic in the `ws.on('open')` catch block gets a chance to execute. The `close` handler was updated with an `isPerformingVersionRetry` guard (and could also use an `isValidatingConnection` guard), but the `error` handler lacks equivalent protection. Since Node.js EventEmitter invokes listeners synchronously in registration order, the global error handler fires before the per-validation error handler in `awaitNextMessage`, preempting the retry path.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread src/Socket/socket.ts
Comment thread src/Utils/versioning.ts Outdated
Comment thread src/__tests__/Utils/versioning.test.ts
@kaikybrofc
Copy link
Copy Markdown
Author

Addressed the CodeRabbit actionable review points in commit 34b601f:\n\n1) Added a guard in the global WebSocket error handler to skip while version fallback retry is in progress (mirrors close-handler guard).\n2) Reworked in-memory last-known-good cache to be isolated per cache path using a map.\n3) Wrapped injected latest-version fetchers in try/catch so thrown errors do not break the fallback chain.\n\nAlso added test coverage for:\n- per-path memory cache isolation\n- fallback continuity when fetchers throw\n\nValidation rerun:\n- ✅\n- ✅ (8/8)

@kaikybrofc
Copy link
Copy Markdown
Author

Addressed the CodeRabbit actionable review points in commit 34b601f:

  1. Added a guard in the global WebSocket error handler to skip end() while version fallback retry is in progress (mirrors close-handler guard).
  2. Reworked in-memory last-known-good cache to be isolated per cache path using a map.
  3. Wrapped injected latest-version fetchers in try/catch so thrown errors do not break the fallback chain.

Also added test coverage for:

  • per-path memory cache isolation
  • fallback continuity when fetchers throw

Validation rerun:

  • npx tsc -p tsconfig.json: pass
  • jest src/tests/Utils/versioning.test.ts: pass (8/8)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Utils/versioning.ts`:
- Around line 53-57: The logger calls in src/Utils/versioning.ts currently emit
untrusted objects (parsed and error/fetch envelopes); update the logger.warn
invocations that reference parsed and err (the call that logs "ignoring invalid
lastKnownGoodVersion cache payload" and the one that logs "failed reading
lastKnownGoodVersion cache from disk") to sanitize outputs by removing or
redacting full payloads and nested metadata: replace logging of parsed with a
safe summary (e.g., length, checksum, or a small set of allowed fields) and
replace logging of err with a normalized error summary (message, code) rather
than the entire object; apply the same sanitization pattern to the analogous
logger usages in the 154-171 block so no arbitrary JSON or nested request
metadata is emitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e017583-8741-4237-a45b-ad08fc2706a7

📥 Commits

Reviewing files that changed from the base of the PR and between e4ba5b7 and 34b601f.

📒 Files selected for processing (3)
  • src/Socket/socket.ts
  • src/Utils/versioning.ts
  • src/__tests__/Utils/versioning.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/tests/Utils/versioning.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Socket/socket.ts

Comment thread src/Utils/versioning.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/Utils/versioning.ts">

<violation number="1" location="src/Utils/versioning.ts:143">
P2: The raw caught `error` is embedded into the result object (`{ version: defaultVersion, isLatest: false, error }`) and then logged in its entirety via `logger.warn({ latestWaWeb }, ...)` a few lines below. Fetch errors can contain request URLs, headers, and internal metadata that shouldn't appear in logs unsanitized. Either log the error separately with a sanitized key (e.g., `{ err: error }` following the pino convention already used elsewhere in this file) or omit `error` from the result object entirely.</violation>

<violation number="2" location="src/Utils/versioning.ts:160">
P2: Same issue as the WA Web fetcher above: the raw `error` is embedded into the result and logged unsanitized via `logger.warn({ latestBaileys }, ...)`. Use the `{ err: error }` pino convention for structured error logging and omit `error` from the result object.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread src/Utils/versioning.ts Outdated
Comment thread src/Utils/versioning.ts Outdated
@kaikybrofc
Copy link
Copy Markdown
Author

Addressed the remaining bot review items around log sanitization in commit ee20d65.

What changed in src/Utils/versioning.ts:

  • Removed logging of raw parsed cache payloads.
  • Replaced raw error object logging with normalized error summaries (code, name, message).
  • Stopped embedding caught fetch errors into the latest result envelopes.
  • Logged only sanitized latest-fetch summaries (isLatest, validated version, hasError) instead of full objects.

Validation rerun:

  • npx tsc -p tsconfig.json: pass
  • jest src/__tests__/Utils/versioning.test.ts: pass (8/8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants