Skip to content

WebSocket connections fail in multi-worker environments due to missing io_loop initialisation #15

Open
@quartzar

Description

@quartzar

Problem Description

When running a Django application with Bokeh integration using multiple Gunicorn workers, WebSocket connections frequently fail to establish. With n workers, attempting to load a page that uses Bokeh autoload apps only loads successfully approximately 1/n of the time (e.g., 25% success rate with 4 workers). Tested with Django 5.1, bokeh-django 0.2.0.

Root Cause

The current implementation assumes the Tornado io_loop will always be initialised on the worker handling the WebSocket connection. However, in a multi-worker environment, there's no guarantee that the worker handling the WebSocket connection is the same one that handled the initial HTTP request where the io_loop was first initialised.

Specifically, in WSConsumer.application_context, the code raises a RuntimeError if io_loop is not set:

if self._application_context.io_loop is None:
    raise RuntimeError("io_loop should already been set")

Reproduction Steps

  • Set up a Django project with bokeh-django and an autoload Bokeh app
  • Configure Gunicorn to run with multiple Uvicorn workers (e.g., gunicorn --workers=4 --worker-class uvicorn_worker.UvicornWorker)
  • Load a page with some Bokeh autoload app(s)
  • Observe that the Bokeh apps only loads successfully a fraction of the time

Impact

This issue makes bokeh-django autoloading apps unreliable in production environments where multiple workers are necessary for performance and availability.

Proposed Solution

Initialise the io_loop using IOLoop.current() if it doesn't exist, rather than assuming it should always be initialised beforehand. This would make each worker capable of handling WebSocket connections independently, regardless of which worker handled the initial HTTP request.

The key change would be replacing the error-raising code with initialisation code:

if self._application_context._loop is None:
    self._application_context._loop = IOLoop.current()

I will be raising a PR shortly with the changes we have made in a local version of bokeh-django that has been working well for us in a production environment using Gunicorn with 4 Uvicorn workers. Suggestions are welcomed! Other than this, bokeh-django has been invaluable for us and we appreciate all the work that has gone into it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions