Hello,
I am running rllib on sagemaker with 8 cores, I set num_workers to 7. After a long execution I face "The actor died unexpectedly before finishing this task."
Failure # 1 (occurred at 2021-10-21_01-10-25) Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/ray/tune/trial_runner.py", line 812, in _process_trial results = self.trial_executor.fetch_result(trial) File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 767, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/usr/local/lib/python3.6/dist-packages/ray/_private/client_mode_hook.py", line 89, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/ray/worker.py", line 1623, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
but whenever I change num_worker to 1 the problem solves.
Any idea how can I fix this issue?
Hello,
I am running rllib on sagemaker with 8 cores, I set num_workers to 7. After a long execution I face "The actor died unexpectedly before finishing this task."
Failure # 1 (occurred at 2021-10-21_01-10-25) Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/ray/tune/trial_runner.py", line 812, in _process_trial results = self.trial_executor.fetch_result(trial) File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 767, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/usr/local/lib/python3.6/dist-packages/ray/_private/client_mode_hook.py", line 89, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/ray/worker.py", line 1623, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.but whenever I change num_worker to 1 the problem solves.
Any idea how can I fix this issue?