Open
Description
I have installed Zero to JupyterHub on EKS and it was working fine for a while. However, after some time of usage, the hub service starts failing to start servers and returns an error 500. Restarting the services seems to fix the issue temporarily.
I would like to report this issue and request a solution to keep the hub service running without any failures. Any help would be greatly appreciated.
Steps to reproduce:
Start JupyterHub and wait for some times 2 or 3 days
Try to start a server
Observe the error 500 message
Restart the hub deployment and try to start a server again
Observe that the server starts successfully
Environment:
Helm Chart version: 2.0.0
Database: Amazon Aurora MySQL
Config.yml:
singleuser:
lifecycleHooks:
postStart:
exec:
command:
- "sh"
- "-c"
- >
cp -n /home/.Rprofile /home/jovyan/.Rprofile;
chown jovyan:users /home/jovyan/.Rprofile;
mkdir -p /home/jovyan/.config/pip;
cp -n /etc/pip.conf /home/jovyan/.config/pip/pip.conf;
memory:
limit: 16G
guarantee: 1G
cpu:
limit: 4
guarantee: 0.5
defaultUrl: "/lab"
image:
name: CUSTOM IMAGE
tag: TAG
storage:
type: "static"
static:
pvcName: "efs-jhub"
subPath: "home/{username}"
extraEnv:
CHOWN_HOME: "yes"
JULIA_DEPOT_PATH: "/home/jovyan/.julia"
uid: 0
fsGid: 0
cmd: "start-singleuser.sh"
hub:
db:
type: mysql
url: URL
upgrade: true
config:
Authenticator:
admin_users:
- admin
resources:
requests:
cpu: 500m
memory: 2Gi
nodeSelector:
hub.jupyter.org/node-purpose: hub
proxy:
service:
type: NodePort
chp:
resources:
requests:
cpu: 500m
memory: 256Mi
nodeSelector:
hub.jupyter.org/node-purpose: hub
traefik:
resources:
requests:
cpu: 500m
memory: 512Mi
secretSync:
resources:
requests:
cpu: 10m
memory: 64Mi
scheduling:
userScheduler:
resources:
requests:
cpu: 30m
memory: 512Mi
nodeSelector:
hub.jupyter.org/node-purpose: hub
userPods:
nodeAffinity:
matchNodePurpose: require
prePuller:
resources:
requests:
cpu: 10m
memory: 8Mi
hook:
resources:
requests:
cpu: 10m
memory: 8Mi
cull:
enabled: true
timeout: 600
every: 120
removeNamedServers: true
debug:
enabled: true
Logs:
Uncaught exception POST /hub/api/users/geccaxpkce3wj7y/server (::ffff:34.57.72.158)
HTTPServerRequest(protocol='http', host='premium', method='POST', uri='/hub/api/users/geccaxpkce3wj7y/server', version='HTTP/1.1', remote_ip='::ffff:34.57.72.158')
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/tornado/web.py", line 1713, in _execute
result = await result
File "/usr/local/lib/python3.9/site-packages/jupyterhub/apihandlers/users.py", line 539, in post
await self.spawn_single_user(user, server_name, options=options)
File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 878, in spawn_single_user
active_counts = self.users.count_active_users()
File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 233, in count_active_users
if spawner.active:
File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 158, in active
return bool(self.pending or self.ready)
File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 148, in ready
if self.server is None:
File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 229, in server
orm_server = self.orm_spawner.server
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 482, in __get__
return self.impl.get(state, dict_)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
value = self._fire_loader_callables(state, key, passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 978, in _fire_loader_callables
return self.callable_(state, passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 872, in _load_for_state
primary_key_identity = self._get_ident_for_use_get(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 931, in _get_ident_for_use_get
return [
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 932, in <listcomp>
get_attr(state, dict_, self._equated_columns[pk], passive=passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/mapper.py", line 2983, in _get_state_attr_by_column
return state.manager[prop.key].impl.get(state, dict_, passive=passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
value = self._fire_loader_callables(state, key, passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 973, in _fire_loader_callables
return state._load_expired(state, passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
self.manager.expired_attribute_loader(self, toload, passive)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1465, in load_scalar_attributes
raise orm_exc.ObjectDeletedError(state)
sqlalchemy.orm.exc.ObjectDeletedError: Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.
[W 2023-02-05 08:01:48.637 JupyterHub base:166] Rolling back session due to database error Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.
[E 2023-02-05 08:01:48.642 JupyterHub log:178] {
"X-Forwarded-Host": "premium",
"Accept-Encoding": "gzip",
"Content-Type": "application/json",
"Authorization": "Token [secret]",
"User-Agent": "Go-http-client/1.1",
"Content-Length": "0",
"X-Amzn-Trace-Id": "Root=1-63df626c-6b46e67621f8b37f4f14ac4d",
"Host": "premium",
"X-Forwarded-Port": "80,80",
"X-Forwarded-Proto": "http,http",
"X-Forwarded-For": "192.168.65.151,::ffff:34.57.72.158",
"Connection": "close"
}
[E 2023-02-05 08:01:48.642 JupyterHub log:186] 500 POST /hub/api/users/geccaxpkce3wj7y/server (admin::ffff:34.57.72.158) 9.50ms
Thank you.