Skip to content

Scaling JupyterHub beyond one replica #7

@minrk

Description

@minrk

Problem Statement

As a JupyterHub administrator with thousands of users, I want to be able to serve my users with JupyterHub without slowing down or crashing. JupyterHub's current architecture prohibits running more than one Hub for a given set of users, so to serve a large number of users I must split them across independent Hubs. This adds to the complexity of configurating and operating my deployment.

Proposed Solution

Update the JupyterHub architecture to allow for multiple concurrent JupyterHub instances, allowing multiple JupyterHub replicas to share the load. This will increase the number of users a single JupyterHub deployment can reasonably support, and supporting multiple replicas mitigates the impact of slow operations blocking one hub.

Proposed Implementation

This is a substantial undertaking. The first pass is to make the db session lifetime per-request (like most normal webapps!). That will mean removing all long-lived ORM objects (mainly: User in the user_dict and orm.Spawner), in favor of methods and functions that take a current session as an argument. The second step to actually allow multiple Hubs is to deal with 'ownership' of running Spawners for the purposes spawning/polling.

How will this fit in the ecosystem?

It is likely that this will have breaking consequences for Spawners, as the ORM objects Spawners access will need to change / go away. Most basic Spawners should be unaffected, but anything accessing the underling orm_spawner and/or spawner.user will likely need updating. It may also create a new area of development in 'spawn pools' for running outside the Hub.

Endorsements

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Proposed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions