Problem Statement
As a JupyterHub administrator with thousands of users, I want to be able to serve my users with JupyterHub without slowing down or crashing. JupyterHub's current architecture prohibits running more than one Hub for a given set of users, so to serve a large number of users I must split them across independent Hubs. This adds to the complexity of configurating and operating my deployment.
Proposed Solution
Update the JupyterHub architecture to allow for multiple concurrent JupyterHub instances, allowing multiple JupyterHub replicas to share the load. This will increase the number of users a single JupyterHub deployment can reasonably support, and supporting multiple replicas mitigates the impact of slow operations blocking one hub.
Proposed Implementation
This is a substantial undertaking. The first pass is to make the db session lifetime per-request (like most normal webapps!). That will mean removing all long-lived ORM objects (mainly: User in the user_dict and orm.Spawner), in favor of methods and functions that take a current session as an argument. The second step to actually allow multiple Hubs is to deal with 'ownership' of running Spawners for the purposes spawning/polling.
How will this fit in the ecosystem?
It is likely that this will have breaking consequences for Spawners, as the ORM objects Spawners access will need to change / go away. Most basic Spawners should be unaffected, but anything accessing the underling orm_spawner and/or spawner.user will likely need updating. It may also create a new area of development in 'spawn pools' for running outside the Hub.
Endorsements
Problem Statement
As a JupyterHub administrator with thousands of users, I want to be able to serve my users with JupyterHub without slowing down or crashing. JupyterHub's current architecture prohibits running more than one
Hubfor a given set of users, so to serve a large number of users I must split them across independent Hubs. This adds to the complexity of configurating and operating my deployment.Proposed Solution
Update the JupyterHub architecture to allow for multiple concurrent JupyterHub instances, allowing multiple JupyterHub replicas to share the load. This will increase the number of users a single JupyterHub deployment can reasonably support, and supporting multiple replicas mitigates the impact of slow operations blocking one hub.
Proposed Implementation
This is a substantial undertaking. The first pass is to make the db session lifetime per-request (like most normal webapps!). That will mean removing all long-lived ORM objects (mainly: User in the
user_dictand orm.Spawner), in favor of methods and functions that take a current session as an argument. The second step to actually allow multiple Hubs is to deal with 'ownership' of running Spawners for the purposes spawning/polling.How will this fit in the ecosystem?
It is likely that this will have breaking consequences for Spawners, as the ORM objects Spawners access will need to change / go away. Most basic Spawners should be unaffected, but anything accessing the underling
orm_spawnerand/orspawner.userwill likely need updating. It may also create a new area of development in 'spawn pools' for running outside the Hub.Endorsements