[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52
Open
Description
🐛 Bug
When calculating im_loss (such as in ICM, E3B, RIDE, and Pseudo-counts), the calculation
# use a random mask to select a subset of the training data
mask = th.rand(len(im_loss), device=self.device)
mask = (mask < self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked losses
im_loss = (im_loss * mask).sum() / th.max(
mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
(as seen on line 221 in icm.py) returns an error as a result of the im_loss being of size BATCH_SIZE x N_ACTIONS and mask being of size BATCH_SIZE x 1, so they cannot be multiplied.
Croip3 claimed to have a solution in RLE-Foundation/RLeXplore#21
Alternatively, I have 2 potential solutions, depending on how implementation is expected.
- Use same mask for all actions at time t.
im_mask = mask.unsqueeze(1).repeat(1, 3)
# get the masked losses
im_loss = (im_loss * im_mask).sum() / th.max(
im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
- Create unique mask values for all actions (which would be different from the fm_loss mask)
# use a random mask to select a subset of the training data
im_mask = th.rand(im_loss.shape, device=self.device)
im_mask = (im_mask < self.update_proportion).type(th.FloatTensor).to(self.device)
fm_mask = th.rand(len(im_loss), device=self.device) # or could be len(fm_loss) or fm_loss.shape
fm_mask = (fm_mask < self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked losses
im_loss = (im_loss * im_mask).sum() / th.max(
im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
fm_loss = (fm_loss * fm_mask).sum() / th.max(
fm_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
To Reproduce
RLE-Foundation/RLeXplore#21 describes a means of replicating it which may be simple. I have no simple means of replicating the issue without a large amount of code.
I was running ICM on an environment with a continuous action space of 3 actions and have had the same result with E3B.
Relevant log output / Error message
`File "/home/longarm_wsl/anaconda3/envs/metaworld3.12/lib/python3.11/site-packages/rllte/xplore/reward/icm.py", line 225, in update im_loss = (im_loss * mask).sum() / th.max( ~~~~~~~~^~~~~~ RuntimeError: The size of tensor a (8) must match the size of tensor b (256) at non-singleton dimension 1`
System Info
No response
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- I have provided a minimal working example to reproduce the bug
- I've used the markdown code blocks for both code and stack traces.