Amazon Review Scraper

The API handles proxy rotation, headless browser rendering, geo-targeting, and the JavaScript scrolling needed to load Amazon's lazy review widget, your code stays focused on what you actually want to do with the data.

What is an Amazon review scraper?

An Amazon review scraper is a program that collects publicly visible review data from Amazon product pages, the reviewer's name, star rating, date, headline, and review body. Knowing how to scrape Amazon reviews lets you run sentiment analysis at the product, brand, or category level, watch competitor feedback over time, build training datasets for review classification, or feed dashboards that surface buyer complaints early.

The hard part is not parsing the HTML. It is loading the lazy review widget reliably in a headless browser, rotating IPs so Amazon does not block you, and matching the right regional storefront. Scraping Amazon reviews through an API like ScrapingBee removes all three problems and leaves you with one HTTP request per page.

How it works

You send a GET request to the ScrapingBee API with the product URL (https://www.amazon.com/dp/{ASIN}). The API:

Routes the request through a rotating proxy in the country you specify.
Renders the page with a headless browser.
Runs your js_scenario — scroll, click, wait — so the review widget loads.
Applies your CSS-selector extract_rules to the rendered DOM.
Returns the data as structured JSON.

Your code never touches HTML.

Prerequisites

Python 3.8 or later
A ScrapingBee API key — sign up for 1,000 free credits

Installation

pip install scrapingbee pandas

scrapingbee is the official Python SDK. pandas is used to write the CSV output.

Quick start

Save this as scrape_reviews.py, replace YOUR_API_KEY, edit asin_list, and run.

from scrapingbee import ScrapingBeeClient
import pandas as pd

client = ScrapingBeeClient(api_key='YOUR_API_KEY')

def amazon_reviews(asins):
    extract_rules = {
        "product_title": {
            "selector": "span.a-size-large.product-title-word-break",
            "output": "text"
        },
        "properties": {
            "selector": "#cm-cr-dp-review-list > li",
            "type": "list",
            "output": {
                "name": ".a-profile-name",
                "rating": ".review-rating > span",
                "date": ".review-date",
                "title": ".review-title span:not([class])",
                "content": ".review-text"
            }
        }
    }

    js_scenario = {
        "instructions": [
            {"wait": 2000},
            {"evaluate": "window.scrollTo(0, document.body.scrollHeight);"},
            {"wait": 2000},
        ]
    }

    all_reviews = []
    for asin in asins:
        response = client.get(
            f'https://www.amazon.com/dp/{asin}',
            params={
                "extract_rules": extract_rules,
                "js_scenario": js_scenario,
                "country_code": "us"
            },
            retries=2
        )

        product_title = response.json().get('product_title')

        title_entry = {
            "name": product_title,
            "rating": "",
            "date": "",
            "title": "",
            "content": ""
        }

        all_reviews.append(title_entry)
        reviews = response.json().get('properties', [])
        all_reviews.extend(reviews)

        print(f"{asin}: {response.status_code}, {len(reviews)} reviews extracted")

    df = pd.DataFrame(all_reviews)
    df.to_csv("all_reviews.csv", index=False)


asin_list = ["B0CTH2QF23", "B0CCDTPDTQ", "B099WTN2TR"]
amazon_reviews(asin_list)

Run it:

python scrape_reviews.py

You will see one line per ASIN in the console (B0CTH2QF23: 200, 8 reviews extracted) and a CSV called all_reviews.csv will be written to the directory.

What you get

For each ASIN, the script writes one product-title row followed by one row per review.

Column	Source selector	Example
`name`	`.a-profile-name`	`Jane D.`
`rating`	`.review-rating > span`	`5.0 out of 5 stars`
`date`	`.review-date`	`Reviewed in the United States on January 12, 2026`
`title`	`.review-title span:not([class])`	`Better than expected`
`content`	`.review-text`	`The build quality is solid...`
`product_title`	`span.a-size-large.product-title-word-break`	(Filled on the divider row between products)

How the script works

Four things do the work.

extract_rules. A declarative spec that tells ScrapingBee what to pull from the rendered page. product_title is a single element. properties is typed list, so the API iterates over every <li> inside #cm-cr-dp-review-list and returns one structured object per review. No HTML parsing on your side.

js_scenario. Amazon loads the review widget lazily, so the script tells the headless browser to wait 2 seconds, scroll to the bottom of the page, then wait 2 more seconds before extract rules run. Without the scroll, the widget would not be in the DOM.

country_code. Routes the request through a US IP. Amazon's review content varies by country — set this to the locale you care about. The full list of supported countries is in the API docs.

retries=2. If the request fails, the SDK retries up to two times before raising. Useful for transient blocks or slow page loads.

Configuration reference

extract_rules

extract_rules is a JSON object where each key is a field name and each value is either a selector string or a rule object. It is the heart of how this scraper works without any HTML parsing.

Shorthand syntax:

{"title": "h1", "subtitle": "#subtitle"}

Full rule object:

Property	Type	Description
`selector`	string, required	CSS or XPath selector. XPath is auto-detected when the selector starts with `/`.
`selector_type`	string	Force `"css"` or `"xpath"` instead of auto-detection.
`output`	string or object	What to extract. See below.
`type`	string	`"item"` (default — first match) or `"list"` (all matches).
`clean`	boolean	Strips whitespace by default. Set `false` to preserve formatting.

Output formats:

`output` value	Returns
`text` (default)	Visible text content
`text_relevant`	Text with scripts, CSS, headers, and footers removed
`markdown_relevant`	Markdown with irrelevant content trimmed
`html`	Inner HTML
`@attribute_name`	An HTML attribute, e.g. `@href` for a link's URL
`table_json`	Parses a `<table>` into JSON objects
`table_array`	Parses a `<table>` into nested arrays

Nested rules — extract a list of structured objects:

{
  "reviews": {
    "selector": "#cm-cr-dp-review-list > li",
    "type": "list",
    "output": {
      "name": ".a-profile-name",
      "rating": ".review-rating > span",
      "link": {"selector": "a.review-title", "output": "@href"}
    }
  }
}

Attribute shorthand — "link": "a@href" is equivalent to {"selector": "a", "output": "@href"}.

js_scenario

js_scenario is a list of instructions executed in order before extraction. Maximum runtime per scenario is 40 seconds.

Instruction	Syntax	Purpose
`wait`	`{"wait": 2000}`	Pause for N milliseconds
`wait_for`	`{"wait_for": ".selector"}`	Pause until an element exists
`wait_for_and_click`	`{"wait_for_and_click": ".selector"}`	Wait, then click
`click`	`{"click": "#buttonId"}`	Click an element
`scroll_x`	`{"scroll_x": 1000}`	Horizontal scroll in pixels
`scroll_y`	`{"scroll_y": 1000}`	Vertical scroll in pixels
`fill`	`{"fill": ["#input", "value"]}`	Type into an input
`evaluate`	`{"evaluate": "window.scrollTo(0, document.body.scrollHeight);"}`	Run arbitrary JS
`infinite_scroll`	`{"infinite_scroll": {"max_count": 0, "delay": 1000}}`	Auto-scroll until page end

All selectors accept CSS or XPath. Set "strict": false on the scenario to allow individual instructions to fail without aborting the whole run.

Common request parameters

These belong on the params argument of client.get(...).

Parameter	Type	Default	Description
`extract_rules`	dict	—	Extraction spec (above).
`js_scenario`	dict	—	Scenario spec (above).
`country_code`	string	—	Two-letter ISO code. `"us"`, `"de"`, `"gb"`.
`premium_proxy`	bool	`false`	Route through residential proxies. Use when datacenter IPs get blocked.
`stealth_proxy`	bool	`false`	Toughest-target proxy tier.
`render_js`	bool	`true`	Run a headless browser. Always on for review scraping.
`wait`	int	—	Milliseconds to wait after the page loads, before extraction.
`wait_for`	string	—	Wait until a selector exists, then extract.

Scrape reviews across Amazon regions

Set country_code to the locale and change the URL host to match the regional storefront. The two should agree.

# Germany
response = client.get(
    f'https://www.amazon.de/dp/{asin}',
    params={
        "extract_rules": extract_rules,
        "js_scenario": js_scenario,
        "country_code": "de"
    },
    retries=2
)

# United Kingdom
response = client.get(
    f'https://www.amazon.co.uk/dp/{asin}',
    params={
        "extract_rules": extract_rules,
        "js_scenario": js_scenario,
        "country_code": "gb"
    },
    retries=2
)

CSS selectors are the same across regional storefronts, so the extract_rules block does not change. Only the URL and country_code move.

Load more reviews with infinite scroll

The base script returns whatever reviews Amazon renders on the product page after the bottom-scroll — usually the 8–10 "most helpful". To load more before extraction runs, swap the simple scroll for infinite_scroll:

js_scenario = {
    "instructions": [
        {"wait": 2000},
        {"infinite_scroll": {"max_count": 5, "delay": 1500}},
        {"wait": 2000},
    ]
}

max_count is the number of scroll cycles (0 means scroll until the page stops growing). delay is the wait between scrolls in milliseconds. Higher numbers cost more in credits per request because the headless browser runs longer.

Use cases

Sentiment analysis. Aggregate star ratings and run NLP over the review bodies to score satisfaction at the product, brand, or category level.
Competitor monitoring. Watch how a rival's product is rated week over week. Flag drops.
Voice of customer. Surface the phrases real buyers use — input for ad copy, landing-page rewrites, and feature prioritisation.
Catalogue QA. Watch your own listings for review-volume changes that signal a fake-review attack or a quality regression.
Training data. Build classification, summarisation, or fine-tuning datasets from honest, unprompted reviews.
Academic research. Source structured review data for studies in marketing, NLP, or e-commerce.

Why ScrapingBee

One API call per page. No proxy pool to maintain, no CAPTCHA solver to wire up, no browser automation framework to manage.
CSS-selector extraction. Get clean JSON without writing HTML parsers that break on every DOM change.
JavaScript scenarios. Click, scroll, wait, and run custom JS before extraction — needed for Amazon's lazy review widget and most other dynamic pages.
Geo-targeting. country_code and premium_proxy parameters return the regional storefront, currency, and stock a real buyer in that location would see.
Built-in retries. The Python SDK retries failed requests for you.
1,000 credits free. No credit card required to evaluate.

Best practices

Pace your requests. Do not send hundreds of requests per second. The free plan throttles at 5 concurrent requests, which is a reasonable starting ceiling even on paid plans.
Retry transient failures. The SDK's retries=2 argument is enough for most cases. Do not retry 4xx errors — those mean the request was wrong, not unlucky.
Match country_code to the URL. A US IP requesting amazon.de is a fingerprint Amazon will spot.
Never log in. This repo scrapes public review data only. Authenticated scraping is against the ToS and out of scope.
Cache results. Reviews rarely change in 24 hours. Cache locally and only re-scrape what has actually moved.

Legal note

Scraping publicly visible Amazon data is generally legal in many jurisdictions, but Amazon's terms of service restrict automated access. A few practical rules:

Only collect public, non-authenticated content.
Keep request rates reasonable.
Personal data scraped from reviews (names, opinions) is subject to GDPR and CCPA. Treat it the way you would any personal data — minimise collection, secure storage, honour deletion requests.

This repository is not legal advice. Review Amazon's ToS and the regulations that apply to your jurisdiction before running anything in production.

FAQ

Can I scrape Amazon reviews without getting blocked?

Yes. The ScrapingBee Web Scraping API automatically manages proxies and headers to avoid blocks, even from JavaScript-heavy pages like Amazon reviews.

Do I need a headless browser to scrape Amazon?

Yes, but not on your machine. ScrapingBee has built-in JavaScript rendering, so you do not need to install or maintain Puppeteer, Playwright, or Selenium locally.

How many reviews can I scrape from Amazon?

As many as you need. Provide the list of ASINs to the script and it will target each product page. Stay within your plan's rate and credit limits.

Can I use this method for different Amazon countries (UK, DE, FR)?

Yes. Set country_code in the API request (e.g. "country_code": "de" for Germany) and change the URL host to match (amazon.de). If a country does not return the expected results, switch on premium_proxy=True for residential IPs.

How do I scrape all the reviews for a product, not just the first page?

Use the infinite_scroll instruction in js_scenario (see Load more reviews). Each scroll cycle loads more reviews into the DOM before extraction runs.

Why am I getting empty results?

Two common causes: the country_code does not match the Amazon regional domain in the URL, or the JavaScript scenario timing is too short for a slow-loading page. Bump the wait values from 2000 to 3500 ms and verify the review widget actually renders.

How much does each request cost?

A standard request with JavaScript rendering and a basic scenario is 25 credits. Using premium_proxy doubles that, and stealth_proxy raises it further. The free 1,000-credit tier gets you roughly 40 of these requests for evaluation.

Can I scrape reviews from third-party sellers?

This script targets the product detail page, so it returns the product's reviews regardless of which seller owns the buy box at request time. Seller-specific review pages are a separate URL pattern and a separate scrape.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Review Scraper

Contents

What is an Amazon review scraper?

How it works

Prerequisites

Installation

Quick start

What you get

How the script works

Configuration reference

extract_rules

js_scenario

Common request parameters

Scrape reviews across Amazon regions

Load more reviews with infinite scroll

Use cases

Why ScrapingBee

Best practices

Legal note

FAQ

Can I scrape Amazon reviews without getting blocked?

Do I need a headless browser to scrape Amazon?

How many reviews can I scrape from Amazon?

Can I use this method for different Amazon countries (UK, DE, FR)?

How do I scrape all the reviews for a product, not just the first page?

Why am I getting empty results?

How much does each request cost?

Can I scrape reviews from third-party sellers?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Amazon Review Scraper

Contents

What is an Amazon review scraper?

How it works

Prerequisites

Installation

Quick start

What you get

How the script works

Configuration reference

extract_rules

js_scenario

Common request parameters

Scrape reviews across Amazon regions

Load more reviews with infinite scroll

Use cases

Why ScrapingBee

Best practices

Legal note

FAQ

Can I scrape Amazon reviews without getting blocked?

Do I need a headless browser to scrape Amazon?

How many reviews can I scrape from Amazon?

Can I use this method for different Amazon countries (UK, DE, FR)?

How do I scrape all the reviews for a product, not just the first page?

Why am I getting empty results?

How much does each request cost?

Can I scrape reviews from third-party sellers?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages