Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/oss/python/integrations/providers/all_providers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2432,6 +2432,14 @@ Browse the complete collection of integrations available for Python. LangChain P
Web scraping API and proxy service.
</Card>

<Card
title="ScrapingBee"
href="/oss/integrations/providers/scrapingbee"
icon="link"
>
Web scraping and proxy services.
</Card>

<Card
title="SearchAPI"
href="/oss/integrations/providers/searchapi"
Expand Down
60 changes: 60 additions & 0 deletions src/oss/python/integrations/providers/scrapingbee.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: ScrapingBee
---

The ScrapingBee web scraping API handles headless browsers, rotates proxies for you, and offers AI-powered data extraction.

## Installation and Setup

<CodeGroup>
```bash pip
pip install -U langchain-scrapingbee
````

```bash uv
uv add langchain-scrapingbee
```
</CodeGroup>

You should configure credentials by setting the following environment variables:

* `SCRAPINGBEE_API_KEY`

You can get your API KEY and 1000 free credits by signing up [here](https://app.scrapingbee.com/account/register).

## Tools

ScrapingBee Integration provides access to the following tools:

* [ScrapeUrlTool](/oss/integrations/tools/scrapingbee_scrapeurl): Scrape the contents of any public website. You can also use this to extract data, capture screenshots, interact with the page before scraping, and capture the internal requests sent by the webpage.
* [GoogleSearchTool](/oss/integrations/tools/scrapingbee_googlesearch): Search Google to obtain the following types of information: regular search (classic), news, maps, and images.
* [CheckUsageTool](/oss/integrations/tools/scrapingbee_checkusage): Monitor your ScrapingBee credit or concurrency usage using this tool.
* [AmazonSearchTool](/oss/integrations/tools/scrapingbee_amazonsearch): Perform a product search on Amazon with options for localization, pagination, and advanced filtering.
* [AmazonProductTool](/oss/integrations/tools/scrapingbee_amazonproduct): Retrieve detailed information, including reviews, for a specific product on Amazon using its ASIN.
* [WalmartSearchTool](/oss/integrations/tools/scrapingbee_walmartsearch): Search for products on Walmart with parameters for sorting and price filtering.
* [WalmartProductTool](/oss/integrations/tools/scrapingbee_walmartproduct): Get specific details and reviews for a Walmart product by its ID.
* [ChatGPTTool](/oss/integrations/tools/scrapingbee_chatgpt): Send your prompt to ChatGPT with an option to enhance its responses with live web search results.
* [YouTubeMetadataTool](/oss/integrations/tools/scrapingbee_youtubemetadata): Retrieve comprehensive metadata for a YouTube video including title, description, view count, likes, channel info, publish date, duration, thumbnails, and tags.
* [YouTubeSearchTool](/oss/integrations/tools/scrapingbee_youtubesearch): Search YouTube with extensive filtering options for video quality (HD, 4K, HDR), duration, upload date, content type (video, channel, playlist), live streams, and more.
* [YouTubeTrainabilityTool](/oss/integrations/tools/scrapingbee_youtubetrainability): Check whether a YouTube video's content can be used for AI/ML training purposes based on the video's settings and permissions.
* [YouTubeTranscriptTool](/oss/integrations/tools/scrapingbee_youtubetranscript): Retrieve transcripts/captions for a YouTube video with support for multiple languages and choice between auto-generated or uploader-provided transcripts.

## Tool Options

Most ScrapingBee tools support the following options that control how results are handled:

* `return_content` (boolean, default: `False`): Controls whether the actual content is returned in the response. When set to `False`, only file information is returned to conserve AI tokens. Set to `True` when the agent needs to read and analyze the contents.
* `results_folder` (string, default: `"scraping_results"`): Base folder path where results are saved. A timestamped subfolder is automatically created for each request.

Example usage:

```python
# Returns only file information (saves tokens)
tool.invoke({"query": "example search"})

# Returns the actual content for analysis
tool.invoke({"query": "example search", "return_content": True})

# Saves results to a custom folder
tool.invoke({"query": "example search", "results_folder": "my_results", "return_content": True})
```
12 changes: 12 additions & 0 deletions src/oss/python/integrations/tools/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,18 @@ The following platforms provide access to multiple tools and services through a
<Card title="Scrapeless Crawl" icon="link" href="/oss/integrations/tools/scrapeless_crawl" arrow="true" cta="View guide" />
<Card title="Scrapeless Scraping API" icon="link" href="/oss/integrations/tools/scrapeless_scraping_api" arrow="true" cta="View guide" />
<Card title="Scrapeless Universal Scraping" icon="link" href="/oss/integrations/tools/scrapeless_universal_scraping" arrow="true" cta="View guide" />
<Card title="ScrapingBee Amazon Product" icon="link" href="/oss/integrations/tools/scrapingbee_amazonproduct" arrow="true" cta="View guide" />
<Card title="ScrapingBee Amazon Search" icon="link" href="/oss/integrations/tools/scrapingbee_amazonsearch" arrow="true" cta="View guide" />
<Card title="ScrapingBee ChatGPT" icon="link" href="/oss/integrations/tools/scrapingbee_chatgpt" arrow="true" cta="View guide" />
<Card title="ScrapingBee Check Usage" icon="link" href="/oss/integrations/tools/scrapingbee_checkusage" arrow="true" cta="View guide" />
<Card title="ScrapingBee Google Search" icon="link" href="/oss/integrations/tools/scrapingbee_googlesearch" arrow="true" cta="View guide" />
<Card title="ScrapingBee Scrape URL" icon="link" href="/oss/integrations/tools/scrapingbee_scrapeurl" arrow="true" cta="View guide" />
<Card title="ScrapingBee Walmart Product" icon="link" href="/oss/integrations/tools/scrapingbee_walmartproduct" arrow="true" cta="View guide" />
<Card title="ScrapingBee Walmart Search" icon="link" href="/oss/integrations/tools/scrapingbee_walmartsearch" arrow="true" cta="View guide" />
<Card title="ScrapingBee YouTube Metadata" icon="link" href="/oss/integrations/tools/scrapingbee_youtubemetadata" arrow="true" cta="View guide" />
<Card title="ScrapingBee YouTube Search" icon="link" href="/oss/integrations/tools/scrapingbee_youtubesearch" arrow="true" cta="View guide" />
<Card title="ScrapingBee YouTube Trainability" icon="link" href="/oss/integrations/tools/scrapingbee_youtubetrainability" arrow="true" cta="View guide" />
<Card title="ScrapingBee YouTube Transcript" icon="link" href="/oss/integrations/tools/scrapingbee_youtubetranscript" arrow="true" cta="View guide" />
<Card title="SearchApi" icon="link" href="/oss/integrations/tools/searchapi" arrow="true" cta="View guide" />
<Card title="SearxNG Search" icon="link" href="/oss/integrations/tools/searx_search" arrow="true" cta="View guide" />
<Card title="Semantic Scholar API" icon="link" href="/oss/integrations/tools/semanticscholar" arrow="true" cta="View guide" />
Expand Down
106 changes: 106 additions & 0 deletions src/oss/python/integrations/tools/scrapingbee_amazonproduct.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: ScrapingBee AmazonProductTool
---

Use this tool to retrieve detailed information for a specific Amazon product using its ASIN (Amazon Standard Identification Number).

## Overview

### Integration details

| Class | Package | Serializable | JS support | Package latest |
| :--- | :--- | :---: | :---: | :---: |
| `AmazonProductTool` | [langchain-scrapingbee](https://pypi.org/project/langchain-scrapingbee/) | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapingbee?style=flat-square&label=%20) |


## Setup

<CodeGroup>
```bash pip
pip install -U langchain-scrapingbee
````

```bash uv
uv add langchain-scrapingbee
```
</CodeGroup>

### Credentials

You should configure credentials by setting the following environment variables:

* `SCRAPINGBEE_API_KEY`

## Instantiation

All of the ScrapingBee tools only require the API Key during instantiation. If not set up in the environment variable, you can provide it directly here.

Here we show how to instantiate an instance of the ScrapingBee tools:

```python
import getpass
import os
from langchain_scrapingbee import AmazonProductTool

# if not os.environ.get("SCRAPINGBEE_API_KEY"):
# os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")

amazon_product_tool = AmazonProductTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
```

## Invocation

### Invoke directly with args

This tool accepts `query` (string, the product ASIN) and `params` (dictionary) as arguments. The `query` argument is required, and the `params` argument is optional. You can use the `params` argument to customize the request. For example, to get the HTML along with the response, you can use the following as `params`:

```
{'add_html': True}
```

For a complete list of acceptable parameters, please visit the [Amazon Product API documentation](https://www.scrapingbee.com/documentation/amazon/#amazon-product-api).

```python
amazon_product_tool.invoke({"query": "B0DPDRNSXV"})

amazon_product_tool.invoke(
{
"query": "B0DPDRNSXV",
"params": {"add_html": True},
}
)
```

## Use within an agent

```python
import os
from langchain_scrapingbee import AmazonProductTool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
raise ValueError(
"Google and ScrapingBee API keys must be set in environment variables."
)

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

amazon_product_tool = AmazonProductTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

agent = create_react_agent(llm, [amazon_product_tool])

user_input = "Get the product details for Amazon product B0DPDRNSXV and tell me the product name, price, rating, and number of reviews"

# Stream the agent's output step-by-step
for step in agent.stream(
{"messages": user_input},
stream_mode="values",
):
step["messages"][-1].pretty_print()
```

## API reference

[Amazon Product API](https://www.scrapingbee.com/documentation/amazon/#amazon-product-api)
106 changes: 106 additions & 0 deletions src/oss/python/integrations/tools/scrapingbee_amazonsearch.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: ScrapingBee AmazonSearchTool
---

Use this tool to perform product searches on Amazon with options for localization, pagination, and advanced filtering.

## Overview

### Integration details

| Class | Package | Serializable | JS support | Package latest |
| :--- | :--- | :---: | :---: | :---: |
| `AmazonSearchTool` | [langchain-scrapingbee](https://pypi.org/project/langchain-scrapingbee/) | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapingbee?style=flat-square&label=%20) |


## Setup

<CodeGroup>
```bash pip
pip install -U langchain-scrapingbee
````

```bash uv
uv add langchain-scrapingbee
```
</CodeGroup>

### Credentials

You should configure credentials by setting the following environment variables:

* `SCRAPINGBEE_API_KEY`

## Instantiation

All of the ScrapingBee tools only require the API Key during instantiation. If not set up in the environment variable, you can provide it directly here.

Here we show how to instantiate an instance of the ScrapingBee tools:

```python
import getpass
import os
from langchain_scrapingbee import AmazonSearchTool

# if not os.environ.get("SCRAPINGBEE_API_KEY"):
# os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")

amazon_search_tool = AmazonSearchTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
```

## Invocation

### Invoke directly with args

This tool accepts `query` (string, the search term) and `params` (dictionary) as arguments. The `query` argument is required, and the `params` argument is optional. You can use the `params` argument to customize the request. For example, to search on Amazon's UK site, you can use the following as `params`:

```
{'domain': 'co.uk'}
```

For a complete list of acceptable parameters, please visit the [Amazon Search API documentation](https://www.scrapingbee.com/documentation/amazon/#amazon-search-api).

```python
amazon_search_tool.invoke({"query": "iphone 16"})

amazon_search_tool.invoke(
{
"query": "laptop",
"params": {"domain": "co.uk", "country": "gb"},
}
)
```

## Use within an agent

```python
import os
from langchain_scrapingbee import AmazonSearchTool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
raise ValueError(
"Google and ScrapingBee API keys must be set in environment variables."
)

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

amazon_search_tool = AmazonSearchTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

agent = create_react_agent(llm, [amazon_search_tool])

user_input = "Search for the top 5 wireless headphones on Amazon and provide me with the product names and prices"

# Stream the agent's output step-by-step
for step in agent.stream(
{"messages": user_input},
stream_mode="values",
):
step["messages"][-1].pretty_print()
```

## API reference

[Amazon Search API](https://www.scrapingbee.com/documentation/amazon/#amazon-search-api)
Loading