Skip to content

Hub service failure after some time of usage #3012

Open
@waaldev

Description

@waaldev

I have installed Zero to JupyterHub on EKS and it was working fine for a while. However, after some time of usage, the hub service starts failing to start servers and returns an error 500. Restarting the services seems to fix the issue temporarily.

I would like to report this issue and request a solution to keep the hub service running without any failures. Any help would be greatly appreciated.

Steps to reproduce:

Start JupyterHub and wait for some times 2 or 3 days
Try to start a server
Observe the error 500 message
Restart the hub deployment and try to start a server again
Observe that the server starts successfully

Environment:
Helm Chart version: 2.0.0
Database: Amazon Aurora MySQL

Config.yml:

singleuser:
  lifecycleHooks:
    postStart:
      exec:
        command:
          - "sh"
          - "-c"
          - >
            cp -n /home/.Rprofile /home/jovyan/.Rprofile;
            chown jovyan:users /home/jovyan/.Rprofile;
            mkdir -p /home/jovyan/.config/pip;
            cp -n /etc/pip.conf /home/jovyan/.config/pip/pip.conf;
  memory:
    limit: 16G
    guarantee: 1G
  cpu:
    limit: 4
    guarantee: 0.5
  defaultUrl: "/lab"
  image:
    name: CUSTOM IMAGE
    tag: TAG
  storage:
    type: "static"
    static:
      pvcName: "efs-jhub"
      subPath: "home/{username}"
  extraEnv:
    CHOWN_HOME: "yes"
    JULIA_DEPOT_PATH: "/home/jovyan/.julia"
  uid: 0
  fsGid: 0
  cmd: "start-singleuser.sh"
hub:
  db:
    type: mysql
    url: URL
    upgrade: true
  config:
    Authenticator:
      admin_users:
        - admin
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
  nodeSelector:
    hub.jupyter.org/node-purpose: hub

proxy:
  service:
    type: NodePort
  chp:
    resources:
      requests:
        cpu: 500m
        memory: 256Mi
    nodeSelector:
      hub.jupyter.org/node-purpose: hub
  traefik:
    resources:
      requests:
        cpu: 500m 
        memory: 512Mi
  secretSync: 
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
scheduling:
  userScheduler:
    resources:
      requests:
        cpu: 30m
        memory: 512Mi
    nodeSelector:
      hub.jupyter.org/node-purpose: hub
  userPods:
    nodeAffinity:
      matchNodePurpose: require
prePuller:
  resources:
    requests:
      cpu: 10m
      memory: 8Mi
  hook: 
    resources:
      requests:
        cpu: 10m
        memory: 8Mi

cull:
  enabled: true
  timeout: 600
  every: 120
  removeNamedServers: true

debug:
  enabled: true

Logs:

Uncaught exception POST /hub/api/users/geccaxpkce3wj7y/server (::ffff:34.57.72.158)
    HTTPServerRequest(protocol='http', host='premium', method='POST', uri='/hub/api/users/geccaxpkce3wj7y/server', version='HTTP/1.1', remote_ip='::ffff:34.57.72.158')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/tornado/web.py", line 1713, in _execute
        result = await result
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/apihandlers/users.py", line 539, in post
        await self.spawn_single_user(user, server_name, options=options)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/handlers/base.py", line 878, in spawn_single_user
        active_counts = self.users.count_active_users()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/user.py", line 233, in count_active_users
        if spawner.active:
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 158, in active
        return bool(self.pending or self.ready)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 148, in ready
        if self.server is None:
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/spawner.py", line 229, in server
        orm_server = self.orm_spawner.server
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 482, in __get__
        return self.impl.get(state, dict_)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
        value = self._fire_loader_callables(state, key, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 978, in _fire_loader_callables
        return self.callable_(state, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 872, in _load_for_state
        primary_key_identity = self._get_ident_for_use_get(
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 931, in _get_ident_for_use_get
        return [
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/strategies.py", line 932, in <listcomp>
        get_attr(state, dict_, self._equated_columns[pk], passive=passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/mapper.py", line 2983, in _get_state_attr_by_column
        return state.manager[prop.key].impl.get(state, dict_, passive=passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 942, in get
        value = self._fire_loader_callables(state, key, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 973, in _fire_loader_callables
        return state._load_expired(state, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
        self.manager.expired_attribute_loader(self, toload, passive)
      File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1465, in load_scalar_attributes
        raise orm_exc.ObjectDeletedError(state)
    sqlalchemy.orm.exc.ObjectDeletedError: Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.

[W 2023-02-05 08:01:48.637 JupyterHub base:166] Rolling back session due to database error Instance '<Spawner at 0x7ff4bce43070>' has been deleted, or its row is otherwise not present.
[E 2023-02-05 08:01:48.642 JupyterHub log:178] {
      "X-Forwarded-Host": "premium",
      "Accept-Encoding": "gzip",
      "Content-Type": "application/json",
      "Authorization": "Token [secret]",
      "User-Agent": "Go-http-client/1.1",
      "Content-Length": "0",
      "X-Amzn-Trace-Id": "Root=1-63df626c-6b46e67621f8b37f4f14ac4d",
      "Host": "premium",
      "X-Forwarded-Port": "80,80",
      "X-Forwarded-Proto": "http,http",
      "X-Forwarded-For": "192.168.65.151,::ffff:34.57.72.158",
      "Connection": "close"
    }
[E 2023-02-05 08:01:48.642 JupyterHub log:186] 500 POST /hub/api/users/geccaxpkce3wj7y/server (admin::ffff:34.57.72.158) 9.50ms

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions