Improve offline algo performance by MischaPanch · Pull Request #1261 · thu-ml/tianshou

MischaPanch · 2025-05-21T10:47:43Z

Background: currently the pre-processing of most offline learning algorithms is done in _preprocess_batch, which is highly suboptimal. Instead it should be done in process_buffer.

In this PR, a general class that implements process_buffer using _preprocess_batch is introduced that allows converting an OffPolicyAlgorithm into an efficient OfflineAlgorithm.

Current status: when using it to improve performance of TD3BC, the algorithm converges and tests pass, but the determinism test fails, meaning something changed in the processing or at least in the random number generation. If it's the latter, the failure is not a problem, but I currently don't see why any rng related things should have changed.

The implementation that changes the buffer's managed batch is rather hacky. I suspect something goes wrong with the indexing but after 20 mins of debugging I haven't yet pinned down what causes the determinism test to fail. Understanding this will require reading through the sample_indices in ReplayBuffer and ReplayBufferManager. Note that there is inconsistency in how sample_indices(None) is handled between the two, but it shouldn't play a role for this PR.

In the course of this PR, determinism snapshots should be created from the dev-v2 branch, then after switching to this branch, the determinism test of td3bc should succeed. See docstring of AlgorithmDeterminismTest in determinism_test.py for further details.

The PR is finished when all relevant offline algorithms inherit from OfflineAlgorithmFromOffPolicyAlgorithm and either the determinism tests pass, or a source of different random number generation caused by the refactoring has been identified.

We should be able to set batches with supersets of reserved_keys, there's no reason to not allow that

Would lead to duplicated initialization of parents, in particular nn.Module, which is problematic

MischaPanch · 2025-08-01T10:29:32Z

After a long investigation, we understood that this is fundamentally the wrong approach. preprocess_batch of the off-policy algo uses the learned value function to estimate returns, and the value function changes all the time with gradient updates. So it's not possible to just compute the returns once, or rather, it gives a poo estimate. Would work with purely monte-carlo estimates of returns, but we don't control how an algorithm would estimate returns, and we currently don't have a single algorithm implementation of that kind. Closing this PR

MischaPanch changed the base branch from master to dev-v2 May 21, 2025 10:48

v2: WIP, improve offline algo performance

c743fcf

MischaPanch force-pushed the improve-offline-algo-performance branch from ae605bf to c743fcf Compare May 21, 2025 10:55

MischaPanch added 3 commits July 14, 2025 14:15

Merge branch 'dev-v2' into improve-offline-algo-performance

9e0a8b5

v2: Lift input validation on set_batch

8ed4fa2

We should be able to set batches with supersets of reserved_keys, there's no reason to not allow that

v2: don't call super().__init__ in OfflineFromOffPolicy

0ebf152

Would lead to duplicated initialization of parents, in particular nn.Module, which is problematic

MischaPanch force-pushed the improve-offline-algo-performance branch from dbcfad2 to 0ebf152 Compare July 14, 2025 13:27

MischaPanch force-pushed the dev-v2 branch from d424ba3 to 3d5ab5f Compare July 14, 2025 13:29

Base automatically changed from dev-v2 to master July 15, 2025 08:36

MischaPanch closed this Aug 1, 2025

MischaPanch deleted the improve-offline-algo-performance branch September 26, 2025 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve offline algo performance#1261

Improve offline algo performance#1261
MischaPanch wants to merge 4 commits intomasterfrom
improve-offline-algo-performance

MischaPanch commented May 21, 2025 •

edited

Loading

Uh oh!

MischaPanch commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MischaPanch commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MischaPanch commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MischaPanch commented May 21, 2025 •

edited

Loading