Skip to content

[Bug]: Unable to execute arun_many with managed browsers and cdp #1563

@medmahmoudi26

Description

@medmahmoudi26

crawl4ai version

v0.7.6

Expected Behavior

Hello ! I am trying to use crawl4ai for concurrent authenticated crawling. However, I'm running into errors when combining cdp with arun_many.

In terminal, I launched a browser using cli:

  • I created a profile using the cli and turned headless to off.
  • I run a browser using cdp and my profile directory

crwl cdp -d /home/user/.crawl4ai/profiles/

A new instance was launched, I confirmed I could interact with it using native playwright.

Crawl4ai script:

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

# Define URLs to crawl
URLS = [
    "https://example.com",
    "https://httpbin.org/html",
    "https://www.python.org",
]

async def main():
    # Configure CDP browser connection
    browser_cfg = BrowserConfig(
        browser_type="cdp",
        cdp_url="http://localhost:9222",
        verbose=True,
    )
    
    # Configure crawler settings
    crawler_cfg = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        page_timeout=60000,
        wait_until="domcontentloaded",
    )
    
    # Crawl all URLs using arun_many
    async with AsyncWebCrawler(config=browser_cfg) as crawler:
        results = await crawler.arun_many(urls=URLS, config=crawler_cfg)
        
        for result in results:
            print(f"\nURL: {result.url}")
            if result.success:
                print(f"✓ Success | Content length: {len(result.markdown)}")
            else:
                print(f"✗ Failed: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())

Expectaed behaviour: multiple tabs should open in chromium and the crawls are executed in parallel.
Current behaviour: only one tab is opened, only one crawl is finished, the rest runs into errors.

Current Behavior

Only one tab is opened, only one crawl is finished, the rest runs into errors.

I think the bug might be coming from a race condition in the browser manager, all sessions are competing over the same page.

page = pages[0]

Image

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Linux

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Labels

⏰-spill-overIssues that were picked up in past sprints, but couldn't complete⚙️ In-progressIssues, Features requests that are in Progress🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions