Skip to content

An issue about the evaluation script running in Docker #36

@kevinqyh0827

Description

@kevinqyh0827

Hi, I was trying run evaluation with pretrained models in Docker but have some errors. Here is the error information:

"Loading /root/data/objaverse_houses/houses_2023_07_28/val.jsonl.gz: 16000it [00:02, 6342.99it/s]
200 tasks in queue
Starting worker 0
Starting worker 1
Loading ckpt /root/data/pretrained_models/SigLIP-ViTb-3-double-det-CHORES-S/checkpoint_final.ckpt using ckpt_prefix='model.' ...
WARNING: worker 0 failed to stop with non-None task_sampler
Process ForkServerProcess-1:
Traceback (most recent call last):
File "/root/spoc/tasks/abstract_task_sampler.py", line 131, in controller
return self.controller_type(**self.controller_args)
File "/root/spoc/environment/stretch_controller.py", line 64, in init
self.controller = Controller(**kwargs)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 545, in init
self.start(
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1545, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1268, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 57, in start_worker
worker.distribute_evaluate(agent, tasks_queue, results_queue)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 484, in distribute_evaluate
task = self.task_sampler.next_task()
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 165, in next_task
self.increment_task_and_reset_house(
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 150, in increment_task_and_reset_house
self.reset_controller_in_current_house_and_cache_house_data(
File "/root/spoc/tasks/abstract_task_sampler.py", line 168, in reset_controller_in_current_house_and_cache_house_data
self.reset_scene_with_timeout_handler()
File "/root/spoc/tasks/abstract_task_sampler.py", line 237, in reset_scene_with_timeout_handler
self.controller.reset(scene=self.current_house)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/torch/distributions/utils.py", line 112, in get
value = self.wrapped(instance)
File "/root/spoc/tasks/abstract_task_sampler.py", line 134, in controller
raise TaskSamplerInInvalidStateError("Controller has closed.")
utils.data_generation_utils.exception_utils.TaskSamplerInInvalidStateError: Controller has closed.
Logging and waiting for proccesses to finish
Loading ckpt /root/data/pretrained_models/SigLIP-ViTb-3-double-det-CHORES-S/checkpoint_final.ckpt using ckpt_prefix='model.' ...
WARNING: worker 1 failed to stop with non-None task_sampler
Process ForkServerProcess-2:
Traceback (most recent call last):
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/miniconda3/envs/spoc/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 57, in start_worker
worker.distribute_evaluate(agent, tasks_queue, results_queue)
File "/root/spoc/online_evaluation/online_evaluator_worker.py", line 484, in distribute_evaluate
task = self.task_sampler.next_task()
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 165, in next_task
self.increment_task_and_reset_house(
File "/root/spoc/tasks/multi_task_eval_sampler.py", line 150, in increment_task_and_reset_house
self.reset_controller_in_current_house_and_cache_house_data(
File "/root/spoc/tasks/abstract_task_sampler.py", line 168, in reset_controller_in_current_house_and_cache_house_data
self.reset_scene_with_timeout_handler()
File "/root/spoc/tasks/abstract_task_sampler.py", line 237, in reset_scene_with_timeout_handler
self.controller.reset(scene=self.current_house)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/torch/distributions/utils.py", line 112, in get
value = self.wrapped(instance)
File "/root/spoc/tasks/abstract_task_sampler.py", line 131, in controller
return self.controller_type(**self.controller_args)
File "/root/spoc/environment/stretch_controller.py", line 64, in init
self.controller = Controller(**kwargs)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 545, in init
self.start(
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1545, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1237, in _start_unity_thread
command = self.unity_command(width, height, self.headless)
File "/opt/miniconda3/envs/spoc/lib/python3.10/site-packages/ai2thor/controller.py", line 1147, in unity_command
raise RuntimeError(
RuntimeError: vulkaninfo failed to run, please ask your administrator to install vulkaninfo (e.g. on Ubuntu systems this requires running sudo apt install vulkan-tools).
"

So following the default suggestion, I modify the Dockerfile add this line below to make sure vulkan-tools is successfully installed. However, after I rebuild this image, the problem is still there. Any ideas of this issue? Or should I install the vulkan-tools on the host side not in the docker side?

Thanks so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions