Using `posthog-python` with Gunicorn and `preload_app = True`

### TL;DR

When using Gunicorn with `preload_app = True`, `setup` **must** be called in Gunicorn's `post_fork` callback. Otherwise, events pushed into `Client.queue` will never been seen by `Consumer` and won't be sent to Posthog.

### Gunicorn

Gunicorn is a commonly used web server for Python projects that uses a pre-fork worker model to achieve concurrency. One of Gunicorn's settings, [`preload_app = True`](https://docs.gunicorn.org/en/latest/settings.html#preload-app), tells Gunicorn to load application code before forking worker processes.

From the Gunicorn docs:

> preload_app
>
> Command line: --preload
>
> Default: False
>
> Load application code before the worker processes are forked.
>
> By preloading an application you can save some RAM resources as well as speed up server boot times. Although, if you defer application loading to each worker process, you can reload your application code easily by restarting workers.

Python uses a copy-on-write model when forking processes, so this setting can significantly reduce memory usage in certain scenarios.

### posthog-python

**NOTE:** We're going to get into how `fork()` works in Python, which isn't an area of expertise for me, so please forgive me if anything here is wrong.

[`Client.capture`](https://github.com/PostHog/posthog-python/blob/da09639428a4f6da520e4b15b954c2d2487fd67a/posthog/client.py#L448) calls [`Client._enqueue`](https://github.com/PostHog/posthog-python/blob/da09639428a4f6da520e4b15b954c2d2487fd67a/posthog/client.py#L866) which calls [`self.queue.put`](https://github.com/PostHog/posthog-python/blob/da09639428a4f6da520e4b15b954c2d2487fd67a/posthog/client.py#L941). [`Client.queue`](https://github.com/PostHog/posthog-python/blob/da09639428a4f6da520e4b15b954c2d2487fd67a/posthog/client.py#L166) is an instance of [`queue.Queue`](https://docs.python.org/3/library/queue.html), which is a thread-safe queue provided by the Python standard library.

When a `Client` instance is created before Gunicorn forks its worker processes, the `fork()` call shares the memory of the `Client` instance, including `Client.queue`, between the master and worker processes. When a worker process writes to `Client.queue`, Python forks the memory and the worker process gets its own copy that contains the event that was added to the queue. That event **does not** get added to `Client.queue` in the master process.

`Client.__init__` creates zero or more `Consumer` instances and stores them in `Client.consumers`. `Consumer.start` is called for each instance and no other changes are made to `Consumer` until `Client.shutdown` is called. If `Client.__init__` is called before Gunicorn forks and no writes to `Client.consumers` happen after the fork, then `Client.consumers` will share memory across the master and worker processes, which means `Consumer.queue` will always be the `queue.Queue` instance created in the master process. But because writing to `Client.queue` causes Python to create a new copy of the queue for worker processes, the `queue.Queue` instance created in the master process will always be empty. No events will ever be sent to Posthog.

### Solutions

I have a few ideas for potential solutions, with varying degrees of confidence:

1. Initialize the default client by calling `setup()` inside Gunicorn's [`post_fork`](https://docs.gunicorn.org/en/latest/settings.html#post-fork) callback

    This ensures that every worker process gets its own set of `Consumer` instances and everything behaves as expected. You could also use `post_fork` to initialize a custom `Client` instance, but I haven't tried that myself.

2. Replace `preload_app = True` in Gunicorn with `preload_app = False`

    `preload_app = True` is useful in certain situations, but isn't necessary in every situation. As an example, as far as I can tell, `PostHog/posthog` uses Gunicorn but doesn't use `preload_app = True`. The cloud deployment may be different though.

    This is definitely the easiest option in situations where it works, but I don't like it as a solution because nobody wants to waste RAM.

3. Replacing `queue.Queue` with [`multiprocessing.JoinableQueue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.JoinableQueue)

    `multiprocessing.JoinableQueue` is a process-safe version of `queue.Queue` that uses pipes to communicate between processes. In the spirit of full disclosure, I haven't tried this change myself so I'm not sure it will work. This change may replace one set of problems with a different set of problems and I recommend reading [Pipes and Queues](https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues) to get an idea of what that might look like. 

    One specific thing of note is this warning from "Pipes and Queues":

    > Warning If a process is killed using [Process.terminate()](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.terminate) or [os.kill()](https://docs.python.org/3/library/os.html#os.kill) while it is trying to use a [Queue](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue), then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on.

    This is particularly relevant to this discussion because Gunicorn has ways of killing workers if it thinks they aren't working correctly or after a certain number of requests as specified by the [max_requests](https://docs.gunicorn.org/en/latest/settings.html#max-requests) and [max_requests_jitter](https://docs.gunicorn.org/en/latest/settings.html#max-requests-jitter) settings. We can get around this by using [`multiprocessing.Manager`](https://docs.python.org/3/library/multiprocessing.html#managers), but that's how we get to the idea of maybe this is just replacing one set of problems with another set of problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using `posthog-python` with Gunicorn and `preload_app = True` #290

TL;DR

Gunicorn

posthog-python

Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using posthog-python with Gunicorn and preload_app = True #290

Description

TL;DR

Gunicorn

posthog-python

Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Using `posthog-python` with Gunicorn and `preload_app = True` #290