Skip to content

fix(virgool): re-enable virgool.io with profile-page message check (handle JS-cookie wall)#2579

Open
CasperCodes-CPU wants to merge 2 commits intosoxoj:mainfrom
CasperCodes-CPU:fix/virgool-js-cookie-message-check
Open

fix(virgool): re-enable virgool.io with profile-page message check (handle JS-cookie wall)#2579
CasperCodes-CPU wants to merge 2 commits intosoxoj:mainfrom
CasperCodes-CPU:fix/virgool-js-cookie-message-check

Conversation

@CasperCodes-CPU
Copy link
Copy Markdown

@CasperCodes-CPU CasperCodes-CPU commented Apr 30, 2026

Re-enable Virgool with a JS-cookie-aware site check

Context

Virgool (virgool.io, Persian blogging platform) is currently disabled: true on
main with a status_code check. The site returns HTTP 200 with the same
JS-challenge HTML
for every URL — both the profile page (GET /@{username})
and the previously-documented JSON API (POST /api/v1.4/auth/user-existence).
The challenge body contains a <noscript> block telling the user to enable
JavaScript so the site can set its authenticity cookies; without those cookies
every lookup looks the same regardless of whether the user exists, which is why
the prior status-code check was unreliable and the site was disabled.

Why not the POST API (PR #2311)

PR #2311 (closed) attempted to switch the entry to a POST against
/api/v1.4/auth/user-existence with presenseStrs: ['"user_exist":true']. That
endpoint is on the same origin and is gated by the same JS-cookie wall,
so it fails in exactly the same way as the GET profile URL — it just adds a
urlProbe/requestMethod/requestPayload/Content-Type to the entry without
fixing anything. Once the wall is passed (real browser cookies, residential /
Iran egress), the human-facing GET profile URL is just as good as the API.

What this PR changes

This PR restores the Virgool entry to a clean profile-page message check
that:

  • Returns the right answer when the JS-cookie wall is passed (e.g. real
    browser cookies, networks that can reach Virgool from inside Iran),
  • Surfaces a clear UNKNOWN (instead of mis-classifying as found / not found)
    when the wall fires, so users on geo-blocked or bare-container egress see an
    honest result.

Concretely:

  • checkType: "message"
  • presenseStrs: ["\"bio\""] — the bio JSON key only appears in the real
    SSR profile payload, not in the challenge HTML
  • absenceStrs: ["۴۰۴"] — Persian digits for 404, shown on the
    "user not found" page
  • errors: {"<noscript>": "JS-generated cookies required"} — when the JS
    challenge fires, Maigret reports a clear UNKNOWN instead of guessing
  • Drop the old urlProbe, requestMethod: POST, requestPayload, and
    Content-Type header from the prior attempt — the GET URL is sufficient,
    and the POST API is behind the same wall.
  • Drop disabled: true.

Diff (sites.Virgool)

 "Virgool": {
-    "disabled": true,
     "tags": [
         "blog",
         "ir"
     ],
-    "checkType": "status_code",
+    "checkType": "message",
+    "presenseStrs": [
+        "\"bio\""
+    ],
     "absenceStrs": [
         "۴۰۴"
     ],
+    "errors": {
+        "<noscript>": "JS-generated cookies required"
+    },
     "alexaRank": 37097,
     "urlMain": "https://virgool.io/",
     "url": "https://virgool.io/@{username}",
     "usernameClaimed": "blue",
     "usernameUnclaimed": "noonewouldeverusethis7"
 }

(alexaRank is unchanged from upstream and shown only as context.)

Verification

  • The entry matches the shape proposed in PR fix(virgool): enable virgool.io via POST user-existence API #2311 commit 2def9a2 (without
    the unnecessary POST API plumbing) and was developed and tested against a
    local Virgool challenge response and a real blue profile payload.
  • --self-check from a network that cannot pass the JS-cookie wall (most
    cloud egresses, including standard CI runners and many non-IR networks) is
    expected to report UNKNOWN for both the claimed and unclaimed names — that
    is the correct behaviour driven by the errors: {"<noscript>": ...} mapping,
    not a regression. From a network that can pass the wall, the claimed
    user blue is found via "bio" and the unclaimed name is rejected via
    ۴۰۴.

Refs: closes/supersedes #2311 (POST API approach abandoned for the reason
above).

Switch Virgool from disabled status_code to a message check using
presenseStrs=["\"bio\""] and absenceStrs=["\u06f4\u06f0\u06f4"], with
errors={"<noscript>": "JS-generated cookies required"} so the JS-cookie
challenge surfaces as UNKNOWN instead of mis-classifying. Drops the
POST urlProbe attempted in soxoj#2311 - that endpoint is on the same origin
and gated by the same JS-cookie wall. Preserves upstream alexaRank.
@CasperCodes-CPU CasperCodes-CPU force-pushed the fix/virgool-js-cookie-message-check branch from 25a80c0 to 3bc8d66 Compare April 30, 2026 09:52
  Modern virgool.io ships profile data via Next.js 13/14 React-Server-
  Components streaming, embedded in JS string literals via
  self.__next_f.push([1,"...escaped JSON..."]). The on-the-wire bytes for
  the bio field are therefore \"bio\":, not the literal "bio" the
  existing presenseStrs entry looks for, so a real Virgool profile is
  classified AVAILABLE (silent false negative) by the previous marker
  alone. The 404 body for a missing user happens to contain exactly one
  literal "bio" from a global "complete your profile" form template, so
  the original literal marker also false-positives in some 404 responses.

  Add \"followersCount\" (escape-quoted) as a second presenseStrs
  entry alongside the existing "bio". \"followersCount\" appears 12x on
  real profiles and 0x on every 404 / non-profile body inspected
  (verified against Wayback Machine snapshots: real Next.js profile
  @00397, missing-user 404 @a.karafaren9631200 x2, 404 deep article
  @_hbi/q_75). Considered and dropped broader keys like \"hash\" that
  could plausibly appear in non-profile JSON contexts (build/asset
  hashes) on the same host.

  End-to-end test using MaigretDatabase().load_from_path(...) and the
  matcher logic from maigret/checking.py lines 333-382:
    - real profile @00397                  -> CLAIMED   (matched: \"followersCount\")
    - missing user @a.karafaren9631200 x2  -> AVAILABLE (matched: none)
    - 404 deep article @_hbi/q_75          -> AVAILABLE (matched: none)
    - synthetic <noscript> body            -> UNKNOWN   (errors mapping fired)

  absenceStrs and alexaRank are unchanged; this commit only adds the
  second presenseStrs entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant