Skip to content

tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ab42ba925d0>>, <Task finished coro=<Worker.heartbeat() done #389

Open
@MSKazemi

Description

@MSKazemi

I am trying to do data analysis on the 9900 parquet files that in total they have 100GB size.
After 70K garbage collections warning:
distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)

My job killed and there is the following error.

distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 59% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 60% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 62% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 61% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 59% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 61% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 56% CPU time recently (threshold: 10%)
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ab42ba925d0>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
distributed.worker - INFO - Connection to scheduler broken.  Reconnecting...
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2b3465022590>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2adcf6fabb50>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ba64a584990>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 920, in heartbeat
    raise e
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py", line 893, in heartbeat
    metrics=await self.get_metrics(),
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 391, in retry_operation
    operation=operation,
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/utils_comm.py", line 379, in retry
    return await coro()
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 757, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/core.py", line 540, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 208, in read
    convert_stream_closed_error(self, e)
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 121, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x2ac978e74f90>>, <Task finished coro=<Worker.heartbeat() done, defined at /galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/worker.py:883> exception=CommClosedError('in <closed TCP>: ConnectionResetError: [Errno 104] Connection reset by peer')>)
Traceback (most recent call last):
  File "/galileo/home/userexternal/mseyedka/miniconda3/lib/python3.7/site-packages/distributed/comm/tcp.py", line 188, in read
    n_frames = await stream.read_bytes(8)
tornado.iostream.StreamClosedError: Stream is closed

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions