Hybrid Product & Collection Data Scraper Tool for Shopify Dawn

## Overview
Implement a robust, extensible product data scraper in Python that works with Shopify stores built on Dawn. The tool should:
- Scrape single product pages for title, price, description, images, SKU, brand, variants, and raw JSON-LD.
- Crawl collection pages, including multi-page pagination, and extract all product URLs and details.
- Support both static HTML scraping (requests + BeautifulSoup) and JavaScript-rendered pages (Playwright integration).
- Output results to JSON or CSV.
- Use selectors or schema.org detection for collection/product identification.
- Include rate limiting, retry logic, and error handling.

## Features
- CLI and library API
- Single product scraper (requests+BS, fallback to Playwright if needed)
- Collection crawler (pagination detection via rel="next", Next anchor, selectors)
- Concurrency and optional delay
- Output format: JSON/CSV
- Site adapter for Shopify Dawn (structured extraction of variants/pricing/SKU)
- Logs and progress reporting

## Motivation
Shopify Dawn stores often use dynamic rendering and structured data (JSON-LD, schema.org). Manual extraction is error-prone and slow. This tool will automate data extraction for bulk operations, analytics, and migrations.

## Labels
- Category: Enhancement
- 📁 Section: Featured Product
- 🗂️ Template: Collection

## Acceptance Criteria
- Scrapes all key product data (including variants and images) from product pages
- Crawls collection pages and follows pagination automatically
- Works on both static and JS-heavy Shopify Dawn pages
- CLI exposes options for concurrency, selectors, and output format
- Adapters for Shopify Dawn structure
- Error logs for failed pages

## Out of Scope
- Scraping of non-Shopify storefronts
- Bypassing CAPTCHAs or login walls

---
Let me know if you want Playwright integration or adapters for other platforms to be part of MVP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hybrid Product & Collection Data Scraper Tool for Shopify Dawn #3869

Overview

Features

Motivation

Labels

Acceptance Criteria

Out of Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hybrid Product & Collection Data Scraper Tool for Shopify Dawn #3869

Description

Overview

Features

Motivation

Labels

Acceptance Criteria

Out of Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions