[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (#54284)#60065
[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (#54284)#60065MrKWatkins wants to merge 1 commit intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in PrioritizedEpisodeReplayBuffer that could cause a crash when an evicted episode was also present in a new batch being added. The fix is straightforward and correct, replacing an incorrect variable to prevent a ValueError. A regression test is included to cover this scenario. The changes are solid, and I've added a couple of suggestions to further improve the new test case for better robustness and clarity.
|
|
||
| # Test for issue #54284. | ||
| buffer.add(self._get_episode(id_="H", episode_len=9)) | ||
| self.assertTrue(buffer.get_num_timesteps() == 100) |
There was a problem hiding this comment.
For better test failure messages, it's recommended to use self.assertEqual(a, b) instead of self.assertTrue(a == b). assertEqual will show the differing values on failure, which is more informative.
| self.assertTrue(buffer.get_num_timesteps() == 100) | |
| self.assertEqual(buffer.get_num_timesteps(), 100) |
| buffer.add([ | ||
| self._get_episode(id_="3", episode_len=1), | ||
| self._get_episode(id_="I", episode_len=1) | ||
| ]) |
There was a problem hiding this comment.
The added test case is great for ensuring the bug that caused a crash is fixed. However, it currently only checks that the add operation doesn't raise an exception. To make the test more robust and prevent future regressions, it would be beneficial to add assertions to verify the final state of the buffer after the add call.
Based on my analysis, after the operations, the buffer should contain 9 episodes with a total of 91 timesteps. Adding assertions for the number of timesteps, number of episodes, and the set of episode IDs would make this test much stronger.
buffer.add([
self._get_episode(id_="3", episode_len=1),
self._get_episode(id_="I", episode_len=1)
])
self.assertEqual(buffer.get_num_timesteps(), 91)
self.assertEqual(buffer.get_num_episodes(), 9)
self.assertEqual(
{eps.id_ for eps in buffer.episodes},
{"4", "5", "6", "7", "8", "9", "G", "H", "I"},
)There was a problem hiding this comment.
Could we add here a test for what we would expect to be in the buffer, please?
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Thanks for the PR, I don't see an issue with the PR but I'm interested what the original bug was and how this fixes it plus how the test checks this
|
Under Ray 2.53.0 I've been getting errors like: when training with Rainbow DQN. It was trying to find an episode index in a list of IDs. It occurred when an evicted episode ID is also present in the incoming episodes. |
|
Ok, how did you test that this fixes the problem? What do you understand the fix doing? Have you inspected what the values of eps_evicted_ids and eps_evicted_idxs are and what effect this has? |
|
No worries, completely understand. I understand the fix is now using an ID to lookup in the list of IDs, whereas it was using an index to lookup before, which makes no sense. I've tested it by running it locally whilst training. I was hitting it every 20 minutes or so before the patch, I left it running overnight with the patch and didn't hit the issue. And of course I ran the existing tests. I haven't been able to inspect the values as I haven't managed to get the test running with a debugger yet. (Had quite a few setup issues to be honest) I haven't gone through the rest of the code in the method in detail though, I just focused on this part which threw the error and seemed obviously wrong to me. I can go back through it all in a bit more detail if you want? |
|
@MrKWatkins Could you rename Do you know if there is any other part of the replay buffer that uses similar variable naming? |
simonsays1980
left a comment
There was a problem hiding this comment.
Thanks a lot for this fix @MrKWatkins ! There are just two small changes requested to improve testing. Could you check, if you could implement this. We are ready to go then.
| # TODO (simon): Apply the same logic as in the MA-case. | ||
| len_to_subtract = len( | ||
| episodes[new_episode_ids.index(eps_evicted_idxs[-1])] | ||
| episodes[new_episode_ids.index(eps_evicted_ids[-1])] |
|
|
||
| # Test for issue #54284. | ||
| buffer.add(self._get_episode(id_="H", episode_len=9)) | ||
| self.assertTrue(buffer.get_num_timesteps() == 100) |
| buffer.add([ | ||
| self._get_episode(id_="3", episode_len=1), | ||
| self._get_episode(id_="I", episode_len=1) | ||
| ]) |
There was a problem hiding this comment.
Could we add here a test for what we would expect to be in the buffer, please?
30cbae2 to
4c688f7
Compare
|
Thanks both for your feedback. I've rebased and addressed your comments: @pseudo-rnd-thoughts: Renamed eps_evicted_idxs to eps_evicted_indices, rather than eps_evicted_indexes. This matches the naming of new_indices and _indices already used in the class. There are lots of instances of idx being used instead of index; I haven't changed these as it would be a much larger change. One thing however... I'm still not sure the buffer is behaving correctly. I reduced the test a bit so now it just adds an episode with the same ID ("3") as that being evicted. This results in the buffer evicting the episode, so "3" is not in the buffer at all, despite having just been added and there being capacity in the buffer for it. Note that there is already a TODO related to this in the code: .Ideally I'd like to merge this fix as-is as that will unblock me and then revisit the behaviour later on, but obviously up to you guys. |
|
@MrKWatkins I agree that it would be great to merge this fix but if the solution is still inheritly broken then I would prefer to fix now if possible. If it completely unrelated then we should be able to merge as is, I just want to check first. |
|
@pseudo-rnd-thoughts I can't easily share my actual code, however I've created a repro at https://github.com/MrKWatkins/ReplayBufferBug. Instead of my actual env it uses a test env with the same observation size. Parameters for training are the same as my actual code. It runs fine for a while as is. However, if you reduce the capacity then it immediately triggers the bug. Details in the ReadMe. |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Description
Fixes an exception that occasionally occurs in PrioritizedEpisodeReplayBuffer.
Related issues
Fixes #54284.
Additional information
I've tested this locally with a model that would hit the exception every half hour or so; with the fix I haven't hit the exception.