langchain-ai · Crawleo · Dec 16, 2025 · Dec 16, 2025 · Dec 16, 2025 · Dec 16, 2025
@@ -0,0 +1,315 @@
+---
+title: Crawleo Crawler
+---
+
+[Crawleo](https://crawleo.dev) is a privacy-first web search and crawler API. The Crawler endpoint can be used to extract content from URLs with support for raw HTML and Markdown output.
+
+## Overview
+
+### Integration details
+
+| Class                                                            | Package                                                            | Serializable | JS support | Version |
+|:-----------------------------------------------------------------|:-------------------------------------------------------------------|:---:|:---:|:---:|
+| [CrawleoCrawler](https://github.com/crawleo/langchain-crawleo) | [langchain-crawleo](https://pypi.org/project/langchain-crawleo/) | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-crawleo?style=flat-square&label=%20) |
+
+### Tool features
+
+| [Returns artifact](/oss/langchain/tools) | Native async | Return data | Pricing |
+|:---:|:---:|:---:|:---:|
+| ❌ | ✅ | raw_content, markdown | Pay-per-use credits |
+
+## Setup
+
+The integration lives in the `langchain-crawleo` package.
+
+```bash
+pip install -qU langchain-crawleo
+```
+
+### Credentials
+
+We need to set our Crawleo API key. You can get an API key by visiting [Crawleo](https://crawleo.dev) and creating an account.
+
+```python
+import getpass
+import os
+
+if not os.environ.get("CRAWLEO_API_KEY"):
+    os.environ["CRAWLEO_API_KEY"] = getpass.getpass("Crawleo API key:\n")
+```
+
+## Instantiation
+
+The tool accepts various parameters during instantiation:
+
+- `raw_html` (optional, bool): Whether to return raw HTML content. Default is False.
+- `markdown` (optional, bool): Whether to return content in markdown format. Default is False.
+
+For a comprehensive overview of the available parameters, refer to the [Crawleo API documentation](https://crawleo.dev/docs).
+
+```python
+from langchain_crawleo import CrawleoCrawler
+
+tool = CrawleoCrawler(
+    markdown=True,
+    # raw_html=False,
+)
+```
+
+## Invocation
+
+### [Invoke directly with args](/oss/langchain/tools)
+
+The Crawleo crawler tool accepts the following arguments during invocation:
+
+- `urls` (required): A list of URLs to crawl (1-20 URLs maximum)
+- Both `raw_html` and `markdown` can also be set during invocation
+
+NOTE: The optional arguments are available for agents to dynamically set. If you set an argument during instantiation and then invoke the tool with a different value, the tool will use the value you passed during invocation.
+
+```python
+tool.invoke({"urls": ["https://python.langchain.com"]})
+```
+
+```output
+{
+    "status": "success",
+    "data": {
+        "results": [
+            {
+                "url": "https://python.langchain.com",
+                "raw_content": "...",
+                "markdown": "# LangChain\n\nLangChain is a framework for developing applications..."
+            }
+        ],
+        "credits_used": 1
+    }
+}
+```
+
+### Crawl multiple URLs
+
+```python
+from langchain_crawleo import CrawleoCrawler
+
+tool = CrawleoCrawler(markdown=True)
+
+result = tool.invoke({
+    "urls": [
+        "https://python.langchain.com",
+        "https://js.langchain.com"
+    ]
+})
+
+for item in result["data"]["results"]:
+    print(f"URL: {item['url']}")
+    print(f"Content preview: {item['markdown'][:200]}...")
+    print("---")
+```
+
+### [Invoke with ToolCall](/oss/langchain/tools)
+
+We can also invoke the tool with a model-generated ToolCall, in which case a ToolMessage will be returned:
+
+```python
+# This is usually generated by a model, but we'll create a tool call directly for demo purposes.
+model_generated_tool_call = {
+    "args": {"urls": ["https://en.wikipedia.org/wiki/Artificial_intelligence"]},
+    "id": "1",
+    "name": "crawleo_crawler",
+    "type": "tool_call",
+}
+tool_msg = tool.invoke(model_generated_tool_call)
+
+# The content is a JSON string of results
+print(tool_msg.content[:400])
+```
+
+```output
+{"status": "success", "data": {"results": [{"url": "https://en.wikipedia.org/wiki/Artificial_intelligence", "raw_content": "Artificial intelligence - Wikipedia\nJump to content\nMain menu...", "markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is the intelligence of machines...
+```
+
+## Use within an agent
+
+We can use our tools directly with an agent executor by binding the tool to the agent. This gives the agent the ability to dynamically crawl URLs.
+
+```python
+if not os.environ.get("OPENAI_API_KEY"):
+    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY:\n")
+```
+
+```python
+from langchain.chat_models import init_chat_model
+
+model = init_chat_model(model="gpt-4o", model_provider="openai", temperature=0)
+```
+
+```bash
+pip install -qU langgraph
+```
+
+```python
+from langchain_crawleo import CrawleoCrawler
+from langgraph.prebuilt import create_react_agent
+
+crawleo_crawler_tool = CrawleoCrawler(markdown=True)
+
+agent = create_react_agent(model, [crawleo_crawler_tool])
+
+user_input = "Extract the content from https://python.langchain.com and summarize what LangChain is."
+
+for step in agent.stream(
+    {"messages": [("user", user_input)]},
+    stream_mode="values",
+):
+    step["messages"][-1].pretty_print()
+```
+
+```output
+================================ Human Message =================================
+
+Extract the content from https://python.langchain.com and summarize what LangChain is.
+================================== Ai Message ==================================
+Tool Calls:
+  crawleo_crawler (call_xyz789)
+ Call ID: call_xyz789
+  Args:
+    urls: ['https://python.langchain.com']
+================================= Tool Message =================================
+Name: crawleo_crawler
+
+{"status": "success", "data": {"results": [{"url": "https://python.langchain.com", "markdown": "# LangChain..."}]}}
+================================== Ai Message ==================================
+
+Based on the extracted content, LangChain is a framework for developing applications powered by language models...
+```
+
+## Advanced Usage
+
+### Combining Search and Crawler
+
+Use CrawleoSearch to find relevant URLs, then CrawleoCrawler to extract full content:
+
+```python
+from langchain_crawleo import CrawleoSearch, CrawleoCrawler
+
+search = CrawleoSearch(max_pages=1)
+crawler = CrawleoCrawler(markdown=True)
+
+# Step 1: Search for relevant pages
+search_results = search.invoke({"query": "LangChain documentation"})
+urls = [
+    item["link"]
+    for item in search_results["data"]["pages"]["1"]["search_results"][:3]
+]
+
+# Step 2: Crawl the top results
+crawl_results = crawler.invoke({"urls": urls})
+
+for result in crawl_results["data"]["results"]:
+    print(f"URL: {result['url']}")
+    print(f"Content length: {len(result.get('markdown', ''))}")
+    print("---")
+```
+
+### Async crawling
+
+```python
+import asyncio
+from langchain_crawleo import CrawleoCrawler
+
+async def crawl_pages():
+    tool = CrawleoCrawler(markdown=True)
+
+    urls = [
+        "https://python.langchain.com",
+        "https://js.langchain.com",
+    ]
+
+    result = await tool.ainvoke({"urls": urls})
+
+    for item in result["data"]["results"]:
+        print(f"Crawled: {item['url']}")
+        print(f"Preview: {item['markdown'][:100]}...")
+
+asyncio.run(crawl_pages())
+```
+
+### Extract and summarize with LLM
+
+```python
+from langchain_crawleo import CrawleoCrawler
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import ChatPromptTemplate
+
+crawler = CrawleoCrawler(markdown=True)
+llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
+
+# Crawl a page
+result = crawler.invoke({"urls": ["https://python.langchain.com/docs/concepts/"]})
+content = result["data"]["results"][0]["markdown"]
+
+# Summarize with LLM
+prompt = ChatPromptTemplate.from_template("""
+Summarize the following web page content in 3-5 bullet points:
+
+{content}
+""")
+
+chain = prompt | llm
+summary = chain.invoke({"content": content[:10000]})  # Limit content length
+print(summary.content)
+```
+
+### Building a research pipeline
+
+```python
+from langchain_crawleo import CrawleoSearch, CrawleoCrawler
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import ChatPromptTemplate
+
+search = CrawleoSearch(max_pages=1, markdown=True)
+crawler = CrawleoCrawler(markdown=True)
+llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
+
+def research(topic: str) -> str:
+    # Step 1: Search for relevant pages
+    search_results = search.invoke({"query": topic})
+    urls = [
+        item["link"]
+        for item in search_results["data"]["pages"]["1"]["search_results"][:2]
+    ]
+
+    # Step 2: Crawl the pages
+    crawl_results = crawler.invoke({"urls": urls})
+
+    # Step 3: Combine content
+    combined_content = "\n\n---\n\n".join([
+        f"Source: {r['url']}\n{r.get('markdown', '')[:3000]}"
+        for r in crawl_results["data"]["results"]
+    ])
+
+    # Step 4: Generate research summary
+    prompt = ChatPromptTemplate.from_template("""
+    Based on the following sources, provide a comprehensive answer about: {topic}
+
+    Sources:
+    {content}
+
+    Provide a detailed answer with citations to the sources.
+    """)
+
+    chain = prompt | llm
+    return chain.invoke({"topic": topic, "content": combined_content}).content
+
+# Run research
+answer = research("What is LangChain and how does it work?")
+print(answer)
+```
+
+---
+
+## API reference
+
+For detailed documentation of all Crawleo Crawler API features and configurations head to the API reference: [crawleo.dev/docs](https://crawleo.dev/docs)
+