-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
@Bensk1 Hi, I have come across the two following questions while training the agent.
- the storage consumption of the corresponding index configuration set was exceeded marginally. Therefore, the training process was terminated.
I am confused that why this would happen because if the index chosen would violate the constraints, it is considered to be a invalid action as illustrated in your paper.
File "../swirl/stable_baselines/ppo2/ppo2.py", line 520, in _run
if self.callback.on_step() is False:
File "../swirl/stable_baselines/common/callbacks.py", line 94, in on_step
return self._on_step()
File "../swirl/stable_baselines/common/callbacks.py", line 170, in _on_step
continue_training = callback.on_step() and continue_training
File "***/swirl/stable_baselines/common/callbacks.py", line 94, in on_step
return self._on_step()
File "***/swirl/stable_baselines/common/callbacks.py", line 539, in _on_step
return_episode_rewards=True)
File "../swirl/stable_baselines/common/evaluation.py", line 41, in evaluate_policy
obs, reward, done, _info = env.step(action)
File "../swirl/stable_baselines/common/vec_env/base_vec_env.py", line 150, in step
return self.step_wait()
File "../swirl/stable_baselines/common/vec_env/vec_normalize.py", line 91, in step_wait
obs, rews, news, infos = self.venv.step_wait()
File "../swirl/stable_baselines/common/vec_env/dummy_vec_env.py", line 44, in step_wait
self.envs[env_idx].step(self.actions[env_idx])
File "***/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 13, in step
observation, reward, done, info = self.env.step(action)
File "***/swirl/gym_db/envs/db_env_v1.py", line 99, in step
init=False, new_index=new_index, old_index_size=old_index_size
File "***/gym_db/envs/db_env_v1.py", line 204, in _update_return_env_state
"Storage consumption exceeds budget: "
AssertionError: Storage consumption exceeds budget: 500.08883199999997 > 500
- the action was invalid but still chosen. Therefore, the training process was terminated.
To be specifically action[0] was chosen when it was invalid. But the mask vector was checked to be 0.
--------------------------------------
| approxkl | nan |
| clipfrac | 0.1796875 |
| explained_variance | -2.03 |
| fps | 0 |
| n_updates | 250 |
| policy_entropy | 0.14278165 |
| policy_loss | nan |
| serial_timesteps | 16000 |
| time_elapsed | 1.51e+03 |
| total_timesteps | 16000 |
| value_loss | 0.00048095174 |
--------------------------------------
Traceback (most recent call last):
File "main.py", line 141, in <module>
tb_log_name=experiment.id) # the name of the run for tensorboard log
File "../swirl/stable_baselines/ppo2/ppo2.py", line 342, in learn
rollout = self.runner.run(callback)
File "../swirl/stable_baselines/common/runners.py", line 59, in run
return self._run()
File "../swirl/stable_baselines/ppo2/ppo2.py", line 497, in _run
self.obs[:], rewards, self.dones, infos = self.env.step(clipped_actions)
File "../swirl/stable_baselines/common/vec_env/base_vec_env.py", line 150, in step
return self.step_wait()
File "../swirl/stable_baselines/common/vec_env/vec_normalize.py", line 91, in step_wait
obs, rews, news, infos = self.venv.step_wait()
File "../swirl/stable_baselines/common/vec_env/dummy_vec_env.py", line 44, in step_wait
self.envs[env_idx].step(self.actions[env_idx])
File "***/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 13, in step
observation, reward, done, info = self.env.step(action)
File "***/swirl/gym_db/envs/db_env_v1.py", line 79, in step
self._step_asserts(action)
File "***/swirl/gym_db/envs/db_env_v1.py", line 67, in _step_asserts
), f"Agent has chosen invalid action: {action}"
AssertionError: Agent has chosen invalid action: 0
Metadata
Metadata
Assignees
Labels
No labels