Skip to content

[Bug]: Cannot scrape local webfiles using crawl4ai, getting cannot access local variable 'captured_console' where it is not associated with a value #1072

Open
@saipavanmeruga7797

Description

@saipavanmeruga7797

crawl4ai version

0.6.2

Expected Behavior

When i try to run code for scraping local HTML files the expected outcome is to get the markdown text from the given HTML input.

Current Behavior

Currently, when I pass a local HTML file path to the crawler.arun(url=fileurl, config=crawler_config) it gives me the following error -- Error: cannot access local variable 'captured_console' where it is not associated with a value

Is this reproducible?

Yes

Inputs Causing the Bug

-URL(s): Local HTML File Url (with absolute path)
-In the crawler_config pass capture_console_messages = False (it is False by default)

Steps to Reproduce

1. Get a local URL file path.
2. Pass the url file path to crawler.arun() function
3. call the main function.

Code snippets

async def crawl_local_files(files: List[str], max_concurrent: int = 5):
    browser_config = BrowserConfig(
        headless=True,
        verbose=False,
        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
    )
    crawler_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS, capture_console_messages = False)
    crawler = AsyncWebCrawler(config=browser_config)
    await crawler.start()

    try:
        semaphore = asyncio.Semaphore(max_concurrent)
        async def process_local_file(fileurl: str):
            async with semaphore:
                result = await crawler.arun(
                    url=fileurl,
                    config=crawler_config
                )
                if result.success:
                    # print(result.markdown)
                    await process_and_store_document(fileurl, result.markdown)
                else:
                    print(f"Failed : {fileurl} - Error: {result.error_message}")
        await asyncio.gather(*[process_local_file(fileurl) for fileurl in files])
    finally:
        await crawler.close()


def get_file_paths() -> List[str]:
    """
    
    Get a list of file paths from the local directory.
    """
    print("Getting file paths from the local directory...")    
    file_paths = []
    for file in os.listdir("./assets/"):
        print(file)
        if file.endswith(".html"):
            file_path = os.path.join("./assets/", file)
            abs_path = os.path.abspath(file_path)
            fileurl = f"file://{abs_path}"
            file_paths.append(fileurl)
    return file_paths


async def main():

    print("Starting the main function...")
    files = get_file_paths()

    await crawl_local_files(files)


if __name__ == "__main__":
    asyncio.run(main())

OS

macOS Sonoma 14.7.4

Python version

3.11.7

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

Image

Metadata

Metadata

Assignees

Labels

⚙️ In-progressIssues, Features requests that are in Progress🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions