-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi, I was trying run evaluation with pretrained models in Docker but have some errors. Here is the error information:
"Loading /root/data/objaverse_houses/houses_2023_07_28/val.jsonl.gz: 16000it [00:02, 6342.99it/s]
200 tasks in queue
Starting worker 0
Starting worker 1
Loading ckpt /root/data/pretrained_models/SigLIP-ViTb-3-double-det-CHORES-S/checkpoint_final.ckpt using ckpt_prefix='model.' ...
WARNING: worker 0 failed to stop with non-None task_sampler
Process ForkServerProcess-1:
Traceback (most recent call last):
File "/root/spoc/tasks/abstract_task_sampler.py", line 131, in controller
return self.controller_type(**self.controller_args)
File "/root/spoc/environment/stretch_controller.py", line 64, in init
self.controller = Controller(**kwargs)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 545, in init
self.start(
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1545, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1268, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 57, in start_worker
worker.distribute_evaluate(agent, tasks_queue, results_queue)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 484, in distribute_evaluate
task = self.task_sampler.next_task()
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 165, in next_task
self.increment_task_and_reset_house(
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 150, in increment_task_and_reset_house
self.reset_controller_in_current_house_and_cache_house_data(
File "/root/spoc/tasks/abstract_task_sampler.py", line 168, in reset_controller_in_current_house_and_cache_house_data
self.reset_scene_with_timeout_handler()
File "/root/spoc/tasks/abstract_task_sampler.py", line 237, in reset_scene_with_timeout_handler
self.controller.reset(scene=self.current_house)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/torch/distributions/utils.py", line 112, in get
value = self.wrapped(instance)
File "/root/spoc/tasks/abstract_task_sampler.py", line 134, in controller
raise TaskSamplerInInvalidStateError("Controller has closed.")
utils.data_generation_utils.exception_utils.TaskSamplerInInvalidStateError: Controller has closed.
Logging and waiting for proccesses to finish
Loading ckpt /root/data/pretrained_models/SigLIP-ViTb-3-double-det-CHORES-S/checkpoint_final.ckpt using ckpt_prefix='model.' ...
WARNING: worker 1 failed to stop with non-None task_sampler
Process ForkServerProcess-2:
Traceback (most recent call last):
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 57, in start_worker
worker.distribute_evaluate(agent, tasks_queue, results_queue)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 484, in distribute_evaluate
task = self.task_sampler.next_task()
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 165, in next_task
self.increment_task_and_reset_house(
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 150, in increment_task_and_reset_house
self.reset_controller_in_current_house_and_cache_house_data(
File "/root/spoc/tasks/abstract_task_sampler.py", line 168, in reset_controller_in_current_house_and_cache_house_data
self.reset_scene_with_timeout_handler()
File "/root/spoc/tasks/abstract_task_sampler.py", line 237, in reset_scene_with_timeout_handler
self.controller.reset(scene=self.current_house)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/torch/distributions/utils.py", line 112, in get
value = self.wrapped(instance)
File "/root/spoc/tasks/abstract_task_sampler.py", line 131, in controller
return self.controller_type(**self.controller_args)
File "/root/spoc/environment/stretch_controller.py", line 64, in init
self.controller = Controller(**kwargs)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 545, in init
self.start(
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1545, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1237, in _start_unity_thread
command = self.unity_command(width, height, self.headless)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1147, in unity_command
raise RuntimeError(
RuntimeError: vulkaninfo failed to run, please ask your administrator to installvulkaninfo(e.g. on Ubuntu systems this requires runningsudo apt install vulkan-tools).
"
So following the default suggestion, I modify the Dockerfile add this line below to make sure vulkan-tools is successfully installed. However, after I rebuild this image, the problem is still there. Any ideas of this issue? Or should I install the vulkan-tools on the host side not in the docker side?
Thanks so much!