Skip to content

[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (#54284)#60065

Open
MrKWatkins wants to merge 1 commit intoray-project:masterfrom
MrKWatkins:fix-priority-episode-replay-buffer
Open

[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (#54284)#60065
MrKWatkins wants to merge 1 commit intoray-project:masterfrom
MrKWatkins:fix-priority-episode-replay-buffer

Conversation

@MrKWatkins
Copy link

Description

Fixes an exception that occasionally occurs in PrioritizedEpisodeReplayBuffer.

Related issues

Fixes #54284.

Additional information

I've tested this locally with a model that would hit the exception every half hour or so; with the fix I haven't hit the exception.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in PrioritizedEpisodeReplayBuffer that could cause a crash when an evicted episode was also present in a new batch being added. The fix is straightforward and correct, replacing an incorrect variable to prevent a ValueError. A regression test is included to cover this scenario. The changes are solid, and I've added a couple of suggestions to further improve the new test case for better robustness and clarity.


# Test for issue #54284.
buffer.add(self._get_episode(id_="H", episode_len=9))
self.assertTrue(buffer.get_num_timesteps() == 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better test failure messages, it's recommended to use self.assertEqual(a, b) instead of self.assertTrue(a == b). assertEqual will show the differing values on failure, which is more informative.

Suggested change
self.assertTrue(buffer.get_num_timesteps() == 100)
self.assertEqual(buffer.get_num_timesteps(), 100)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrKWatkins could we use here self.asserEqual?

Comment on lines 94 to 103
buffer.add([
self._get_episode(id_="3", episode_len=1),
self._get_episode(id_="I", episode_len=1)
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added test case is great for ensuring the bug that caused a crash is fixed. However, it currently only checks that the add operation doesn't raise an exception. To make the test more robust and prevent future regressions, it would be beneficial to add assertions to verify the final state of the buffer after the add call.

Based on my analysis, after the operations, the buffer should contain 9 episodes with a total of 91 timesteps. Adding assertions for the number of timesteps, number of episodes, and the set of episode IDs would make this test much stronger.

        buffer.add([
            self._get_episode(id_="3", episode_len=1),
            self._get_episode(id_="I", episode_len=1)
        ])
        self.assertEqual(buffer.get_num_timesteps(), 91)
        self.assertEqual(buffer.get_num_episodes(), 9)
        self.assertEqual(
            {eps.id_ for eps in buffer.episodes},
            {"4", "5", "6", "7", "8", "9", "G", "H", "I"},
        )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add here a test for what we would expect to be in the buffer, please?

@pseudo-rnd-thoughts pseudo-rnd-thoughts added the rllib RLlib related issues label Jan 12, 2026
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I don't see an issue with the PR but I'm interested what the original bug was and how this fixes it plus how the test checks this

@MrKWatkins
Copy link
Author

Under Ray 2.53.0 I've been getting errors like:

  File "/mnt/Development/OakEmu-AI/.venv/lib/python3.13/site-packages/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 257, in add
    episodes[new_episode_ids.index(eps_evicted_idxs[-1])]
             ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
ValueError: 5009 is not in list

when training with Rainbow DQN. It was trying to find an episode index in a list of IDs. It occurred when an evicted episode ID is also present in the incoming episodes.

@pseudo-rnd-thoughts
Copy link
Member

Ok, how did you test that this fixes the problem? What do you understand the fix doing? Have you inspected what the values of eps_evicted_ids and eps_evicted_idxs are and what effect this has?
Sorry for the numerous questions, I would just like to understand the fix more and why it works before we merge

@MrKWatkins
Copy link
Author

No worries, completely understand.

I understand the fix is now using an ID to lookup in the list of IDs, whereas it was using an index to lookup before, which makes no sense.

I've tested it by running it locally whilst training. I was hitting it every 20 minutes or so before the patch, I left it running overnight with the patch and didn't hit the issue. And of course I ran the existing tests.

I haven't been able to inspect the values as I haven't managed to get the test running with a debugger yet. (Had quite a few setup issues to be honest) I haven't gone through the rest of the code in the method in detail though, I just focused on this part which threw the error and seemed obviously wrong to me. I can go back through it all in a bit more detail if you want?

@pseudo-rnd-thoughts
Copy link
Member

@MrKWatkins Could you rename eps_evicted_ids and eps_evicted_idxs as more easily differentiate them, probably to eps_evicted_ids and eps_evicted_indexes if those make sense.

Do you know if there is any other part of the replay buffer that uses similar variable naming?
Finally, you mentioned that it didn't raise an error after a longtime, did you observe if the agent learnt successfully?

Copy link
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this fix @MrKWatkins ! There are just two small changes requested to improve testing. Could you check, if you could implement this. We are ready to go then.

# TODO (simon): Apply the same logic as in the MA-case.
len_to_subtract = len(
episodes[new_episode_ids.index(eps_evicted_idxs[-1])]
episodes[new_episode_ids.index(eps_evicted_ids[-1])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Catch!


# Test for issue #54284.
buffer.add(self._get_episode(id_="H", episode_len=9))
self.assertTrue(buffer.get_num_timesteps() == 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrKWatkins could we use here self.asserEqual?

Comment on lines 94 to 103
buffer.add([
self._get_episode(id_="3", episode_len=1),
self._get_episode(id_="I", episode_len=1)
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add here a test for what we would expect to be in the buffer, please?

@MrKWatkins MrKWatkins force-pushed the fix-priority-episode-replay-buffer branch from 30cbae2 to 4c688f7 Compare January 18, 2026 14:21
@MrKWatkins
Copy link
Author

Thanks both for your feedback. I've rebased and addressed your comments:

@pseudo-rnd-thoughts: Renamed eps_evicted_idxs to eps_evicted_indices, rather than eps_evicted_indexes. This matches the naming of new_indices and _indices already used in the class. There are lots of instances of idx being used instead of index; I haven't changed these as it would be a much larger change.
@simonsays1980: I've left it as assertTrue for consistency, as that is the style used on the lines just before my additions. However it looks like there are a mix of styles in that test case. Do you want me to tidy them up to all use assertEquals whilst I'm in the neighbourhood? Let me know.
@simonsays1980: Added assertions for what we expect in the buffer, along with some comments explaining what we're testing.

One thing however... I'm still not sure the buffer is behaving correctly. I reduced the test a bit so now it just adds an episode with the same ID ("3") as that being evicted. This results in the buffer evicting the episode, so "3" is not in the buffer at all, despite having just been added and there being capacity in the buffer for it. Note that there is already a TODO related to this in the code:

# TODO (sven, simon): Should we just treat such an episode chunk
.

Ideally I'd like to merge this fix as-is as that will unblock me and then revisit the behaviour later on, but obviously up to you guys.

@pseudo-rnd-thoughts
Copy link
Member

@MrKWatkins I agree that it would be great to merge this fix but if the solution is still inheritly broken then I would prefer to fix now if possible. If it completely unrelated then we should be able to merge as is, I just want to check first.
Could you share the full testing code that you have thats showing the potential bug?

@MrKWatkins
Copy link
Author

@pseudo-rnd-thoughts I can't easily share my actual code, however I've created a repro at https://github.com/MrKWatkins/ReplayBufferBug. Instead of my actual env it uses a test env with the same observation size. Parameters for training are the same as my actual code.

It runs fine for a while as is. However, if you reduce the capacity then it immediately triggers the bug. Details in the ReadMe.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rllib RLlib related issues stale The issue is stale. It will be closed within 7 days unless there are further conversation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RLlib] Unexpected KeyError while training SAC

3 participants