Scrape product listings, detailed specs, seller offers, and customer reviews from Google Shopping. These scripts reverse-engineer Google's async pagination and the hidden /async/oapv product detail endpoint to get data that never appears in the initial HTML. Uses Scrape.do for proxy rotation and anti-bot bypass.
Find the full technical guide here. π
searchResults.py: Scrapes product cards from Google Shopping search with async pagination β title, price, image, seller, rating, reviews. Handles session token extraction, hex-escaped responses, and deduplication. Saves to CSV.singleProductDetail.py: Parses a single/async/oapvresponse for brand, description, images, reviews, forums, and seller offers. Also used as a module byconsistentScraper.py.consistentScraper.py: The full pipeline β multi-pass search extraction, then fetches extended details for every product with availabledata-*attributes. Outputs a uniform JSON schema regardless of whether detail data was available.serpApiShopping.py: Calls Scrape.do's AI Mode API for AI-generated product recommendations, shopping results, and references.
- Python 3.7+
requestsandbeautifulsoup4pip install requests beautifulsoup4- A Scrape.do API token (free 1000 credits/month)
-
Set your token and query (via script or env var):
SCRAPE_DO_TOKEN = "<your_token>" QUERY = "wireless gaming headset"
-
Run:
python searchResults.py
Output β google_shopping_search.csv with title, price, image_url, seller_name, rating, review_count.
-
Copy a
/async/oapvURL from Chrome DevTools (Network tab β click a product β find theoapvrequest):SCRAPE_DO_TOKEN = "<your_token>" DETAIL_URL = "https://www.google.com/async/oapv?..."
-
Run:
python singleProductDetail.py
Prints parsed product details (brand, rating, description, offers, reviews, forums) as JSON.
-
Set your token and query:
SCRAPE_DO_TOKEN = "<your_token>" QUERY = "pc wireless gaming headset"
Also configurable via env vars:
SCRAPE_DO_TOKEN,GOOGLE_QUERY,OUT_JSON,MAX_PAGES,PAUSE_SECONDS. -
Run:
python consistentScraper.py
Output β google_shopping_results.json β each product has card-level fields (title, price, image, seller) plus detail fields when available (brand, description, offers, reviews, forums).
-
Set your token and query:
token = "<your_token>" query = "wireless gaming headset"
-
Run:
python serpApiShopping.py
Output β serp-api-shopping-results.json with AI-curated shopping results, text blocks, and references.
Google Shopping uses udm=28 (Universal Design Mode 28). The initial HTML contains a skeleton with barely any products. The real data loads via async JavaScript requests to /search?async=.... This is why you can't just parse the HTML β you need to extract session tokens and replay the async requests.
The initial page response contains three tokens buried in script blocks that you need for pagination:
eiβ a session identifier from thekEIJavaScript variablebasejs,basecss,basecombβ asset identifiers from thegoogle.xjsobject
Without these, the async pagination URLs return empty responses.
Detailed product data (brand, reviews, seller offers, forum discussions) lives behind Google's OAPV (Open Async Product View) endpoint at /async/oapv. Getting there requires five hidden parameters that are embedded in product card data-* attributes:
catalogid,gpcid,headlineOfferDocid,imageDocid,mid
Not every product card includes all five. Promoted listings and aggregated offers often lack them. consistentScraper.py handles both cases β fetching details when available, keeping card-level data when not, and ensuring every product in the output has the same schema.
Google Shopping responses come in two flavors:
- Async pagination: JSON-wrapped HTML snippets with hex-encoded characters (
\x3dβ=,\x22β", etc.) that need unescaping before parsing - Product details: JSPB (JSON Serialized Protocol Buffer) accessed via
ProductDetailsResult, with data scattered across deeply nested array indices (brand at[2], rating at[3], reviews at[99]...)
consistentScraper.py doesn't stop after one pass. Google Shopping surfaces slightly different product slices on each async request, so the script keeps running extraction passes until no new products appear. This consistently surfaces more products than a single-pass approach.
- Empty initial page: Expected. Products load via async requests β the scripts handle this automatically.
- Missing
data-*attributes: Google doesn't always include the five required detail parameters on every card. Promoted and aggregated listings are especially inconsistent. Card-level data still gets extracted. - Expired
eitokens: These are session-specific and short-lived. Don't hardcode async URLs β use the scripts to build them from fresh tokens each run. - Fewer products than expected: Try setting
PAGE_SIZE = 1to surface more product slices per async page. - 403/429 errors: Check your token and credits. Use the
PAUSE_SECONDSsetting to add delays between requests.
Scrape.do handles proxy rotation, TLS fingerprinting, and anti-bot bypass. Get your free API token (1000 credits/month).