-
Notifications
You must be signed in to change notification settings - Fork 173
Adding bypass for Google Cookie Consent if present #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,37 @@ | ||
| from typing import overload | ||
|
|
||
| from primp import Client | ||
| from selectolax.lexbor import LexborHTMLParser | ||
|
|
||
| from .integrations.base import Integration | ||
| from .parser import MetaList, parse | ||
| from .querying import Query | ||
|
|
||
| URL = "https://www.google.com/travel/flights" | ||
| CONSENT_SAVE_URL = "https://consent.google.com/save" | ||
|
|
||
|
|
||
| def _is_consent_page(html: str) -> bool: | ||
| return "consent.google.com/save" in html and "Before you continue" in html | ||
|
|
||
|
|
||
| def _submit_consent(client: Client, html: str) -> None: | ||
| """Parse the consent page, submit the 'Reject all' form so Google sets | ||
| the SOCS cookie on the shared cookie jar, allowing retries to skip the wall. | ||
| """ | ||
| parser = LexborHTMLParser(html) | ||
| reject_form = None | ||
| for form in parser.css("form"): | ||
| inputs = {i.attributes.get("name"): i.attributes.get("value", "") for i in form.css("input")} | ||
| # Reject-all is the form with set_eom=true and no set_sc/set_aps | ||
| if inputs.get("set_eom") == "true" and "set_sc" not in inputs: | ||
| reject_form = inputs | ||
| break | ||
|
|
||
| if reject_form is None: | ||
| raise RuntimeError("Could not find consent 'Reject all' form in Google consent page") | ||
|
|
||
| client.post(CONSENT_SAVE_URL, data=reject_form) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don’t silently trust the POST; check status and guard against a consent-retry loop.
Consider:
🔧 Proposed fix client.post(CONSENT_SAVE_URL, data=reject_form) res = client.get(URL, params=params)
- if _is_consent_page(res.text):
- _submit_consent(client, res.text)
- res = client.get(URL, params=params)
+ if _is_consent_page(res.text):
+ _submit_consent(client, res.text)
+ res = client.get(URL, params=params)
+ if _is_consent_page(res.text):
+ raise RuntimeError("Google consent wall persisted after submitting 'Reject all'")
return res.text🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| @overload | ||
|
|
@@ -90,6 +115,9 @@ def fetch_flights_html( | |
| params = {"q": q} | ||
|
|
||
| res = client.get(URL, params=params) | ||
| if _is_consent_page(res.text): | ||
| _submit_consent(client, res.text) | ||
| res = client.get(URL, params=params) | ||
| return res.text | ||
|
|
||
| else: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consent detection is English-only and will silently miss non-English locales.
The
"Before you continue"substring only matches when Google serves the consent wall in English. Users in other locales (e.g., French"Avant de continuer", German"Bevor Sie fortfahren", etc.) will bypass this check,_submit_consentwill not run, and they’ll continue hitting the consent wall. Prefer language-independent signals — the response URL (redirect toconsent.google.com) or the form action — rather than localized body text.🔧 Suggested approach
Or check
res.url/ response history from theclient.getcall, since Google typically 302-redirects toconsent.google.com/...before serving the wall.📝 Committable suggestion
🤖 Prompt for AI Agents