Skip to content

SSH connection problems with latest asyncssh version (2.15.0) #97

@mlpgwdg

Description

@mlpgwdg

Environment

  • Covalent version: 0.232.0.post1
  • Covalent-Slurm plugin version: 0.18.0
  • Python version: 3.8
  • Operating system: Ubuntu 22.04.4 LTS

Installed with conda

What is happening?

Recently (2 days ago as of time of writing this), asyncssh updated to v. 2.15.0. Something in this update seems to have broken the Covalent SLURM plugin. In particular, attempts at submitting jobs error out at around line 524 of slurm.py, right after using scp to copy the pickle files over to the remote server. Pickle files can be found on the remote server, but no other files after this point manage to be copied, nor are any SLURM jobs started. Errors are rather cryptic and seem to change, from "SSH connection closed" to NoneType errors from a failed asyncssh conn object. Error stack trace confirms the location of the error and the source being the asyncssh library. After setting log level to debug in covalent's config option, and checking error.log for the failed SLURM executor node, this error trace appears:

Traceback (most recent call last):
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_dispatcher/_core/runner.py", line 182, in _run_task
    output, stdout, stderr, status = await executor._execute(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 695, in _execute
    return await self.execute(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 724, in execute
    result = await self.run(function, args, kwargs, task_metadata)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 592, in run
    remote_paths = await self._copy_files(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 537, in _copy_files
    await asyncssh.scp(temp_g.name, (conn, remote_py_script_filename))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 1041, in scp
    reader, writer = await _start_remote(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 190, in _start_remote
    writer, reader, _ = await conn.open_session(command, encoding=None)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4198, in open_session
    chan, session = await self.create_session(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4173, in create_session
    session = await chan.create(session_factory, command, subsystem,
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1207, in create
    result = await self._make_request(b'exec', String(command))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 740, in _make_request
    return await waiter
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1329, in data_received
    while self._inpbuf and self._recv_handler():
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1594, in _recv_packet
    processed = handler.process_packet(pkttype, seq, packet)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/packet.py", line 237, in process_packet
    self._packet_handlers[pkttype](self, pkttype, pktid, packet)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 656, in _process_request
    self._service_next_request()
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 416, in _service_next_request
    result = cast(Optional[bool], handler(packet))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1246, in _process_exit_status_request
    self._session.exit_status_received(status)
AttributeError: 'NoneType' object has no attribute 'exit_status_received'

For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)

For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.

How can we reproduce the issue?

  • Install latest versions of covalent and the covalent-slurm-plugin
  • Check that asyncssh is version 2.15.0
  • Attempt to run any simple, minimal covalent job through the SLURM plug in

What should happen?

Job should run correctly. Instead, it will error out with an SSH connection closed or mentions of "NoneType has no attribute 'exit_status_received'"

Any suggestions?

For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)

For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions