[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (#54284) by MrKWatkins · Pull Request #60065 · ray-project/ray

MrKWatkins · 2026-01-12T20:08:27Z

Description

Fixes an exception that occasionally occurs in PrioritizedEpisodeReplayBuffer.

Related issues

Additional information

I've tested this locally with a model that would hit the exception every half hour or so; with the fix I haven't hit the exception.

gemini-code-assist

Code Review

This pull request addresses a bug in PrioritizedEpisodeReplayBuffer that could cause a crash when an evicted episode was also present in a new batch being added. The fix is straightforward and correct, replacing an incorrect variable to prevent a ValueError. A regression test is included to cover this scenario. The changes are solid, and I've added a couple of suggestions to further improve the new test case for better robustness and clarity.

gemini-code-assist · 2026-01-12T20:10:18Z

rllib/utils/replay_buffers/tests/test_prioritized_episode_buffer.py


+        # Test for issue #54284.
+        buffer.add(self._get_episode(id_="H", episode_len=9))
+        self.assertTrue(buffer.get_num_timesteps() == 100)


For better test failure messages, it's recommended to use self.assertEqual(a, b) instead of self.assertTrue(a == b). assertEqual will show the differing values on failure, which is more informative.

Suggested change

self.assertTrue(buffer.get_num_timesteps() == 100)

self.assertEqual(buffer.get_num_timesteps(), 100)

@MrKWatkins could we use here self.asserEqual?

gemini-code-assist · 2026-01-12T20:10:19Z

rllib/utils/replay_buffers/tests/test_prioritized_episode_buffer.py

+        buffer.add([
+            self._get_episode(id_="3", episode_len=1),
+            self._get_episode(id_="I", episode_len=1)
+        ])


The added test case is great for ensuring the bug that caused a crash is fixed. However, it currently only checks that the add operation doesn't raise an exception. To make the test more robust and prevent future regressions, it would be beneficial to add assertions to verify the final state of the buffer after the add call.

Based on my analysis, after the operations, the buffer should contain 9 episodes with a total of 91 timesteps. Adding assertions for the number of timesteps, number of episodes, and the set of episode IDs would make this test much stronger.

buffer.add([ self._get_episode(id_="3", episode_len=1), self._get_episode(id_="I", episode_len=1) ]) self.assertEqual(buffer.get_num_timesteps(), 91) self.assertEqual(buffer.get_num_episodes(), 9) self.assertEqual( {eps.id_ for eps in buffer.episodes}, {"4", "5", "6", "7", "8", "9", "G", "H", "I"}, )

Could we add here a test for what we would expect to be in the buffer, please?

pseudo-rnd-thoughts

Thanks for the PR, I don't see an issue with the PR but I'm interested what the original bug was and how this fixes it plus how the test checks this

MrKWatkins · 2026-01-12T20:40:48Z

Under Ray 2.53.0 I've been getting errors like:

  File "/mnt/Development/OakEmu-AI/.venv/lib/python3.13/site-packages/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 257, in add
    episodes[new_episode_ids.index(eps_evicted_idxs[-1])]
             ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
ValueError: 5009 is not in list

when training with Rainbow DQN. It was trying to find an episode index in a list of IDs. It occurred when an evicted episode ID is also present in the incoming episodes.

pseudo-rnd-thoughts · 2026-01-12T20:48:33Z

Ok, how did you test that this fixes the problem? What do you understand the fix doing? Have you inspected what the values of eps_evicted_ids and eps_evicted_idxs are and what effect this has?
Sorry for the numerous questions, I would just like to understand the fix more and why it works before we merge

MrKWatkins · 2026-01-12T20:58:22Z

No worries, completely understand.

I understand the fix is now using an ID to lookup in the list of IDs, whereas it was using an index to lookup before, which makes no sense.

I've tested it by running it locally whilst training. I was hitting it every 20 minutes or so before the patch, I left it running overnight with the patch and didn't hit the issue. And of course I ran the existing tests.

I haven't been able to inspect the values as I haven't managed to get the test running with a debugger yet. (Had quite a few setup issues to be honest) I haven't gone through the rest of the code in the method in detail though, I just focused on this part which threw the error and seemed obviously wrong to me. I can go back through it all in a bit more detail if you want?

pseudo-rnd-thoughts · 2026-01-13T11:00:17Z

@MrKWatkins Could you rename eps_evicted_ids and eps_evicted_idxs as more easily differentiate them, probably to eps_evicted_ids and eps_evicted_indexes if those make sense.

Do you know if there is any other part of the replay buffer that uses similar variable naming?
Finally, you mentioned that it didn't raise an error after a longtime, did you observe if the agent learnt successfully?

simonsays1980

Thanks a lot for this fix @MrKWatkins ! There are just two small changes requested to improve testing. Could you check, if you could implement this. We are ready to go then.

simonsays1980 · 2026-01-13T14:29:09Z

rllib/utils/replay_buffers/prioritized_episode_buffer.py

                # TODO (simon): Apply the same logic as in the MA-case.
                len_to_subtract = len(
-                    episodes[new_episode_ids.index(eps_evicted_idxs[-1])]
+                    episodes[new_episode_ids.index(eps_evicted_ids[-1])]


Great Catch!

simonsays1980 · 2026-01-13T14:30:22Z

rllib/utils/replay_buffers/tests/test_prioritized_episode_buffer.py


+        # Test for issue #54284.
+        buffer.add(self._get_episode(id_="H", episode_len=9))
+        self.assertTrue(buffer.get_num_timesteps() == 100)


@MrKWatkins could we use here self.asserEqual?

simonsays1980 · 2026-01-13T14:33:43Z

rllib/utils/replay_buffers/tests/test_prioritized_episode_buffer.py

+        buffer.add([
+            self._get_episode(id_="3", episode_len=1),
+            self._get_episode(id_="I", episode_len=1)
+        ])


Could we add here a test for what we would expect to be in the buffer, please?

MrKWatkins · 2026-01-18T14:34:32Z

Thanks both for your feedback. I've rebased and addressed your comments:

@pseudo-rnd-thoughts: Renamed eps_evicted_idxs to eps_evicted_indices, rather than eps_evicted_indexes. This matches the naming of new_indices and _indices already used in the class. There are lots of instances of idx being used instead of index; I haven't changed these as it would be a much larger change.
@simonsays1980: I've left it as assertTrue for consistency, as that is the style used on the lines just before my additions. However it looks like there are a mix of styles in that test case. Do you want me to tidy them up to all use assertEquals whilst I'm in the neighbourhood? Let me know.
@simonsays1980: Added assertions for what we expect in the buffer, along with some comments explaining what we're testing.

One thing however... I'm still not sure the buffer is behaving correctly. I reduced the test a bit so now it just adds an episode with the same ID ("3") as that being evicted. This results in the buffer evicting the episode, so "3" is not in the buffer at all, despite having just been added and there being capacity in the buffer for it. Note that there is already a TODO related to this in the code:

ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py

Line 252 in d2b55a4

# TODO (sven, simon): Should we just treat such an episode chunk

.

Ideally I'd like to merge this fix as-is as that will unblock me and then revisit the behaviour later on, but obviously up to you guys.

pseudo-rnd-thoughts · 2026-01-19T13:27:28Z

@MrKWatkins I agree that it would be great to merge this fix but if the solution is still inheritly broken then I would prefer to fix now if possible. If it completely unrelated then we should be able to merge as is, I just want to check first.
Could you share the full testing code that you have thats showing the potential bug?

MrKWatkins · 2026-01-19T19:31:07Z

@pseudo-rnd-thoughts I can't easily share my actual code, however I've created a repro at https://github.com/MrKWatkins/ReplayBufferBug. Instead of my actual env it uses a test env with the same observation size. Parameters for training are the same as my actual code.

It runs fine for a while as is. However, if you reduce the capacity then it immediately triggers the bug. Details in the ReadMe.

github-actions · 2026-02-03T00:52:03Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

MrKWatkins requested a review from a team as a code owner January 12, 2026 20:08

MrKWatkins mentioned this pull request Jan 12, 2026

[RLlib] Unexpected KeyError while training SAC #54284

Closed

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

pseudo-rnd-thoughts added the rllib RLlib related issues label Jan 12, 2026

pseudo-rnd-thoughts reviewed Jan 12, 2026

View reviewed changes

simonsays1980 requested changes Jan 13, 2026

View reviewed changes

[RLlib] Fix bug in PrioritizedEpisodeReplayBuffer. (ray-project#54284)

4c688f7

MrKWatkins force-pushed the fix-priority-episode-replay-buffer branch from 30cbae2 to 4c688f7 Compare January 18, 2026 14:21

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 3, 2026

pseudo-rnd-thoughts mentioned this pull request Feb 5, 2026

[rllib] Fix ValueError in PrioritizedEpisodeReplayBuffer.add() when eps_evicted_idxs contains 0 not in new_episode_ids #60775

Open

	self.assertTrue(buffer.get_num_timesteps() == 100)
	self.assertEqual(buffer.get_num_timesteps(), 100)

Conversation

MrKWatkins commented Jan 12, 2026

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

MrKWatkins commented Jan 12, 2026

Uh oh!

pseudo-rnd-thoughts commented Jan 12, 2026

Uh oh!

MrKWatkins commented Jan 12, 2026

Uh oh!

pseudo-rnd-thoughts commented Jan 13, 2026

Uh oh!

simonsays1980 left a comment

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

MrKWatkins commented Jan 18, 2026

Uh oh!

pseudo-rnd-thoughts commented Jan 19, 2026

Uh oh!

MrKWatkins commented Jan 19, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants