Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions fast_flights/fetcher.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,37 @@
from typing import overload

from primp import Client
from selectolax.lexbor import LexborHTMLParser

from .integrations.base import Integration
from .parser import MetaList, parse
from .querying import Query

URL = "https://www.google.com/travel/flights"
CONSENT_SAVE_URL = "https://consent.google.com/save"


def _is_consent_page(html: str) -> bool:
return "consent.google.com/save" in html and "Before you continue" in html
Comment on lines +14 to +15

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consent detection is English-only and will silently miss non-English locales.

The "Before you continue" substring only matches when Google serves the consent wall in English. Users in other locales (e.g., French "Avant de continuer", German "Bevor Sie fortfahren", etc.) will bypass this check, _submit_consent will not run, and they’ll continue hitting the consent wall. Prefer language-independent signals — the response URL (redirect to consent.google.com) or the form action — rather than localized body text.

🔧 Suggested approach
-def _is_consent_page(html: str) -> bool:
-    return "consent.google.com/save" in html and "Before you continue" in html
+def _is_consent_page(html: str) -> bool:
+    # Language-independent: the consent wall always posts to consent.google.com/save.
+    return "consent.google.com/save" in html

Or check res.url / response history from the client.get call, since Google typically 302-redirects to consent.google.com/... before serving the wall.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _is_consent_page(html: str) -> bool:
return "consent.google.com/save" in html and "Before you continue" in html
def _is_consent_page(html: str) -> bool:
# Language-independent: the consent wall always posts to consent.google.com/save.
return "consent.google.com/save" in html
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fast_flights/fetcher.py` around lines 14 - 15, The _is_consent_page function
only checks for the English phrase "Before you continue" and will miss
non-English consent walls; update _is_consent_page (and any callers) to use
language-independent signals instead — e.g., detect a redirect/response URL
containing "consent.google.com" from the client.get response history or parse
the HTML for a form/action pointing to "consent.google.com/save" (or similar
consent domains) rather than relying on localized body text; ensure the change
is applied where _is_consent_page is used so _submit_consent still runs for
non-English locales.



def _submit_consent(client: Client, html: str) -> None:
"""Parse the consent page, submit the 'Reject all' form so Google sets
the SOCS cookie on the shared cookie jar, allowing retries to skip the wall.
"""
parser = LexborHTMLParser(html)
reject_form = None
for form in parser.css("form"):
inputs = {i.attributes.get("name"): i.attributes.get("value", "") for i in form.css("input")}
# Reject-all is the form with set_eom=true and no set_sc/set_aps
if inputs.get("set_eom") == "true" and "set_sc" not in inputs:
reject_form = inputs
break

if reject_form is None:
raise RuntimeError("Could not find consent 'Reject all' form in Google consent page")

client.post(CONSENT_SAVE_URL, data=reject_form)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t silently trust the POST; check status and guard against a consent-retry loop.

client.post(...) return value is discarded, so a failed rejection (non-2xx, captcha, rate limit) goes unnoticed and the subsequent client.get(URL, ...) on line 120 may still be a consent page — whose HTML will then be returned to parse() as if it were a flights page, yielding confusing downstream errors.

Consider:

  • Capturing the POST response and raising on non-success status.
  • After the retry GET on line 120, re-checking _is_consent_page(res.text) and raising a clear error if the wall persists, rather than returning consent HTML as flight HTML.
🔧 Proposed fix
     client.post(CONSENT_SAVE_URL, data=reject_form)
         res = client.get(URL, params=params)
-        if _is_consent_page(res.text):
-            _submit_consent(client, res.text)
-            res = client.get(URL, params=params)
+        if _is_consent_page(res.text):
+            _submit_consent(client, res.text)
+            res = client.get(URL, params=params)
+            if _is_consent_page(res.text):
+                raise RuntimeError("Google consent wall persisted after submitting 'Reject all'")
         return res.text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fast_flights/fetcher.py` at line 34, Capture and check the response from
client.post(CONSENT_SAVE_URL, data=reject_form) and raise or handle when the
status is not successful (non-2xx) so failures (captcha/rate-limit) aren’t
ignored; after the retry GET (the call that currently feeds parse()), call
_is_consent_page(res.text) again and raise a clear error (or return an explicit
failure) if the consent wall still persists instead of passing consent HTML into
parse(), ensuring you reference client.post, CONSENT_SAVE_URL, reject_form, the
retry client.get call, _is_consent_page, and parse when making these changes.



@overload
Expand Down Expand Up @@ -90,6 +115,9 @@ def fetch_flights_html(
params = {"q": q}

res = client.get(URL, params=params)
if _is_consent_page(res.text):
_submit_consent(client, res.text)
res = client.get(URL, params=params)
return res.text

else:
Expand Down