Web scraping fallback for retailers without APIs

## Summary

Add a web scraping fallback using BeautifulSoup (and optionally Playwright from #458) to extract product prices from retailer websites that lack public APIs — extending coverage beyond Best Buy and SerpApi.

## Motivation

Many retailers (Amazon, Walmart, Target, Newegg) don't offer free public product APIs. Web scraping fills this gap, enabling the DealAgent to track prices across a wider range of sources. This builds on the BrowserToolsMixin (#458) being developed in the v0.17.0 milestone.

## Design

### Scraper Architecture

```python
# src/gaia/agents/deals/tools/scraper_tools.py
from abc import ABC, abstractmethod

class RetailerScraper(ABC):
    """Base class for retailer-specific scrapers."""
    name: str = ""
    base_url: str = ""

    @abstractmethod
    def search(self, query: str, max_results: int = 10) -> List[ProductResult]: ...

    @abstractmethod
    def get_price(self, url: str) -> ProductResult: ...

class AmazonScraper(RetailerScraper):
    name = "amazon"
    base_url = "https://www.amazon.com"
    # Uses BeautifulSoup for static extraction
    # Falls back to Playwright for JS-rendered content

class WalmartScraper(RetailerScraper):
    name = "walmart"
    base_url = "https://www.walmart.com"

class NeweggScraper(RetailerScraper):
    name = "newegg"
    base_url = "https://www.newegg.com"

class ScraperRegistry:
    """Registry of available scrapers."""
    scrapers: Dict[str, RetailerScraper] = {}

    def register(self, scraper: RetailerScraper): ...
    def get(self, name: str) -> RetailerScraper: ...
    def search_all(self, query: str) -> List[ProductResult]: ...
```

### Scraper Tool

```python
class ScraperToolsMixin:
    def register_scraper_tools(self) -> None:
        from gaia.agents.base.tools import tool

        @tool
        def scrape_price(url: str) -> Dict:
            """Extract current price from a product URL.

            Args:
                url: Direct product page URL from any supported retailer
            """

        @tool
        def scrape_search(query: str, retailers: str = "all") -> Dict:
            """Search for products by scraping retailer websites (fallback when APIs unavailable).

            Args:
                query: Product search query
                retailers: Comma-separated retailer names or "all"
            """
```

### Ethical Scraping Practices

- Respect `robots.txt` — check before scraping
- Rate limit: max 1 request/second per domain
- User-Agent: identify as GAIA bot
- Cache scraped results for 1 hour to reduce load
- Terms of Service: document which sites allow scraping

### Integration with BrowserToolsMixin (#458)

If Playwright is available (from v0.17.0 BrowserToolsMixin), use it for JavaScript-heavy sites. Otherwise, fall back to `requests` + BeautifulSoup for static HTML.

```python
def _fetch_page(self, url: str) -> str:
    """Fetch page HTML, using Playwright if available, else requests."""
    try:
        from gaia.agents.base.browser_tools import BrowserToolsMixin
        return self._fetch_with_playwright(url)
    except ImportError:
        return self._fetch_with_requests(url)
```

## Acceptance Criteria

- [ ] `RetailerScraper` base class with `search()` and `get_price()` methods
- [ ] At least 2 retailer scrapers implemented (e.g., Amazon, Newegg)
- [ ] `scrape_price` extracts price from a product URL
- [ ] `scrape_search` searches across scraped retailers
- [ ] robots.txt respected before scraping
- [ ] Rate limiting: 1 req/s per domain
- [ ] Results cached for 1 hour
- [ ] Falls back gracefully if Playwright unavailable
- [ ] Results normalized to `ProductResult` matching API results
- [ ] Unit tests with saved HTML fixtures (no live scraping in CI)

## Phase

Phase 3 — Visualization & Intelligence

## Dependencies

- HTTPToolsMixin (Phase 1)
- Product search tools (Phase 1)
- (Optional) BrowserToolsMixin #458 (v0.17.0 milestone)

## Cross-References

- #458 — BrowserToolsMixin with Playwright (can leverage for JS-rendered pages)

## New Dependencies

| Package | Version | License | Purpose |
|---------|---------|---------|---------|
| `beautifulsoup4` | >=4.12 | MIT | HTML parsing for price extraction |
| `lxml` | >=4.9 | BSD | Fast HTML parser backend |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web scraping fallback for retailers without APIs #491

Summary

Motivation

Design

Scraper Architecture

Scraper Tool

Ethical Scraping Practices

Integration with BrowserToolsMixin (#458)

Acceptance Criteria

Phase

Dependencies

Cross-References

New Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Package	Version	License	Purpose
`beautifulsoup4`	>=4.12	MIT	HTML parsing for price extraction
`lxml`	>=4.9	BSD	Fast HTML parser backend

Web scraping fallback for retailers without APIs #491

Description

Summary

Motivation

Design

Scraper Architecture

Scraper Tool

Ethical Scraping Practices

Integration with BrowserToolsMixin (#458)

Acceptance Criteria

Phase

Dependencies

Cross-References

New Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions