Skip to content
This repository was archived by the owner on May 8, 2020. It is now read-only.
This repository was archived by the owner on May 8, 2020. It is now read-only.

Connection unexpectedly closed yield another exception of asyncio.base_futures.InvalidStateError #149

@cnicodeme

Description

@cnicodeme

Hi!

I've stumbled upon this issue by accident and tried to find a solution on my own but couldn't.
Working on my code, I made a coding mistake which resulted in having Chrome close the connection abnormaly. What follows this is what is interesting for Pyppeteer I believe ; The next call hangs and the code never respond until a kill signal.

Here's a sample code showing my issue, with comments where necessary:

import asyncio
from pyppeteer import launch
from pyppeteer.navigator_watcher import NavigatorWatcher
from pyppeteer.network_manager import NetworkManager
from pyppeteer import helper

async def browse(page, source):
    # The timeouts here are low because the result is the same with longer one, and we don't have 48h in a day
    navigate_options = {'waitUntil': ['load', 'documentloaded', 'networkidle0'], 'timeout': 1500}

    navigation_requests = {}
    def set_request(req):
        print("Got requests !")
        print(req._requestId)

        # The following will fail because "requests" does not exists, it should be "navigation_requests"
        # This is not the bug I'm looking to fix (duh ;)) but this error at this point is causing a later, more important issue
        if req.url not in requests:
            navigation_requests[req._requestId] = req

    # I'm doing the following because setContent works for raw HTML
    # But Chrome doesn't wait for ressources loaded from this, to be completed (external CSS/JS)
    # By implementing this code, I was hoping it would work.
    eventListeners = [helper.addEventListener(
        page._networkManager,
        NetworkManager.Events.Request,
        set_request,
    )]

    watcher = NavigatorWatcher(page._frameManager, page.mainFrame, navigate_options.get('timeout'), navigate_options)
    await page.setContent(source)
    result = await watcher.navigationPromise()
    watcher.cancel()
    helper.removeEventListeners(eventListeners)

    # Again, this would fails. We need to use "navigation_requests", not "requests"
    request = requests.get(page.mainFrame._navigationURL)
    return await page.screenshot({'path': '/tmp/example.png'})

async def main(source):
    browser = await launch()
    context = await browser.createIncogniteBrowserContext()
    page = await context.newPage()

    task = asyncio.ensure_future(browse(page, source))
    try:
        # TODO: #NOPASS
        await asyncio.wait_for(task, timeout=2)
        return task.result()
    except Exception as e:
        # We go there because the timeout from asyncio.wait_for is reached

        if not task.cancelled():
            task.cancel()

        try:
            await task
        except Exception as sub:
            # We have the correct error here!
            # This works great
            raise sub

        raise e
    finally:
        # This is where the REAL ISSUE arise
        # Since our previous code had an issue, Chrome closed the connection abnormaly
        # So when we will call setJavaScriptEnabled, it won't work

        # But I didn't find a way to catch the inside exception
        # (I'll post the stack trace later)
        await page.setJavaScriptEnabled(False)
        # THE BIG ISSUE is that the previous line hangs indefinitely that way, and the program is stuck...
        await page.goto('about:blank')
        await page.close()
        await context.close()


try:
    asyncio.get_event_loop()
except RuntimeError:
    asyncio.set_event_loop(asyncio.new_event_loop())

asyncio.get_event_loop().run_until_complete(main('<html><body><img src="https://avatars0.githubusercontent.com/u/317142?s=40&v=4" alt="hello" />Hello World</body></html>'))

The main issue is located at await page.setJavaScriptEnabled(False). Since the connection has been closed previously, calling await page.setJavaScriptEnabled(False) will fail, but I haven't find a way to tell Pyppeteer to handle that exception.

So in that current state, the code hangs there forever.

Here's the stack trace displayed when the error occurs:

[E:pyppeteer.connection] connection unexpectedly closed
Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /home/cnicodem/www/project//env/lib/python3.6/site-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state',)>
Traceback (most recent call last):
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/pyppeteer/connection.py", line 73, in _async_send
await self.connection.send(msg)
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/websockets/protocol.py", line 334, in send
yield from self.ensure_open()
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/websockets/protocol.py", line 470, in ensure_open
raise ConnectionClosed(self.close_code, self.close_reason)
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/pyppeteer/connection.py", line 79, in _async_send
await self.dispose()
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/pyppeteer/connection.py", line 170, in dispose
await self._on_close()
File "/home/cnicodem/www/project//env/lib/python3.6/site-packages/pyppeteer/connection.py", line 153, in _on_close
f'Protocol error {cb.method}: Target closed.', # type: ignore
asyncio.base_futures.InvalidStateError: invalid state

I tried to work on the pyppeteer/connection.py file to catch the exception, but wasn't able to succeed. I must say that I'm not an expert in Asyncio too ;)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions