fix(virgool): re-enable virgool.io with profile-page message check (handle JS-cookie wall)#2579
Open
CasperCodes-CPU wants to merge 2 commits intosoxoj:mainfrom
Open
Conversation
Switch Virgool from disabled status_code to a message check using
presenseStrs=["\"bio\""] and absenceStrs=["\u06f4\u06f0\u06f4"], with
errors={"<noscript>": "JS-generated cookies required"} so the JS-cookie
challenge surfaces as UNKNOWN instead of mis-classifying. Drops the
POST urlProbe attempted in soxoj#2311 - that endpoint is on the same origin
and gated by the same JS-cookie wall. Preserves upstream alexaRank.
25a80c0 to
3bc8d66
Compare
Modern virgool.io ships profile data via Next.js 13/14 React-Server-
Components streaming, embedded in JS string literals via
self.__next_f.push([1,"...escaped JSON..."]). The on-the-wire bytes for
the bio field are therefore \"bio\":, not the literal "bio" the
existing presenseStrs entry looks for, so a real Virgool profile is
classified AVAILABLE (silent false negative) by the previous marker
alone. The 404 body for a missing user happens to contain exactly one
literal "bio" from a global "complete your profile" form template, so
the original literal marker also false-positives in some 404 responses.
Add \"followersCount\" (escape-quoted) as a second presenseStrs
entry alongside the existing "bio". \"followersCount\" appears 12x on
real profiles and 0x on every 404 / non-profile body inspected
(verified against Wayback Machine snapshots: real Next.js profile
@00397, missing-user 404 @a.karafaren9631200 x2, 404 deep article
@_hbi/q_75). Considered and dropped broader keys like \"hash\" that
could plausibly appear in non-profile JSON contexts (build/asset
hashes) on the same host.
End-to-end test using MaigretDatabase().load_from_path(...) and the
matcher logic from maigret/checking.py lines 333-382:
- real profile @00397 -> CLAIMED (matched: \"followersCount\")
- missing user @a.karafaren9631200 x2 -> AVAILABLE (matched: none)
- 404 deep article @_hbi/q_75 -> AVAILABLE (matched: none)
- synthetic <noscript> body -> UNKNOWN (errors mapping fired)
absenceStrs and alexaRank are unchanged; this commit only adds the
second presenseStrs entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-enable Virgool with a JS-cookie-aware site check
Context
Virgool (
virgool.io, Persian blogging platform) is currentlydisabled: trueonmainwith astatus_codecheck. The site returns HTTP 200 with the sameJS-challenge HTML for every URL — both the profile page (
GET /@{username})and the previously-documented JSON API (
POST /api/v1.4/auth/user-existence).The challenge body contains a
<noscript>block telling the user to enableJavaScript so the site can set its authenticity cookies; without those cookies
every lookup looks the same regardless of whether the user exists, which is why
the prior status-code check was unreliable and the site was disabled.
Why not the POST API (PR #2311)
PR #2311 (closed) attempted to switch the entry to a POST against
/api/v1.4/auth/user-existencewithpresenseStrs: ['"user_exist":true']. Thatendpoint is on the same origin and is gated by the same JS-cookie wall,
so it fails in exactly the same way as the GET profile URL — it just adds a
urlProbe/requestMethod/requestPayload/Content-Typeto the entry withoutfixing anything. Once the wall is passed (real browser cookies, residential /
Iran egress), the human-facing GET profile URL is just as good as the API.
What this PR changes
This PR restores the
Virgoolentry to a clean profile-pagemessagecheckthat:
browser cookies, networks that can reach Virgool from inside Iran),
UNKNOWN(instead of mis-classifying as found / not found)when the wall fires, so users on geo-blocked or bare-container egress see an
honest result.
Concretely:
checkType: "message"presenseStrs: ["\"bio\""]— thebioJSON key only appears in the realSSR profile payload, not in the challenge HTML
absenceStrs: ["۴۰۴"]— Persian digits for404, shown on the"user not found" page
errors: {"<noscript>": "JS-generated cookies required"}— when the JSchallenge fires, Maigret reports a clear
UNKNOWNinstead of guessingurlProbe,requestMethod: POST,requestPayload, andContent-Typeheader from the prior attempt — the GET URL is sufficient,and the POST API is behind the same wall.
disabled: true.Diff (sites.Virgool)
"Virgool": { - "disabled": true, "tags": [ "blog", "ir" ], - "checkType": "status_code", + "checkType": "message", + "presenseStrs": [ + "\"bio\"" + ], "absenceStrs": [ "۴۰۴" ], + "errors": { + "<noscript>": "JS-generated cookies required" + }, "alexaRank": 37097, "urlMain": "https://virgool.io/", "url": "https://virgool.io/@{username}", "usernameClaimed": "blue", "usernameUnclaimed": "noonewouldeverusethis7" }(
alexaRankis unchanged from upstream and shown only as context.)Verification
2def9a2(withoutthe unnecessary POST API plumbing) and was developed and tested against a
local Virgool challenge response and a real
blueprofile payload.--self-checkfrom a network that cannot pass the JS-cookie wall (mostcloud egresses, including standard CI runners and many non-IR networks) is
expected to report
UNKNOWNfor both the claimed and unclaimed names — thatis the correct behaviour driven by the
errors: {"<noscript>": ...}mapping,not a regression. From a network that can pass the wall, the claimed
user
blueis found via"bio"and the unclaimed name is rejected via۴۰۴.Refs: closes/supersedes #2311 (POST API approach abandoned for the reason
above).