MC-PILCO for TorchRL #3538

alektebel · 2026-03-01T16:03:32Z

alektebel
Mar 1, 2026

I've been doing model-based RL for robotics
control (my thesis was PPO/DDPG on a prosthetic hand in MuJoCo), so sample efficiency is something I care about practically. MC-PILCO recently won the AI Olympics at ICRA 2025 on underactuated pendulum tasks, which shows it's still very much alive.

TorchRL already has ModelBasedEnvBase, WorldModelWrapper and the rest of the MBRL infrastructure, but no PILCO variant in sota-implementations/. I'd like to add one.

Roughly: ProbabilisticDynamicsModel + MCPILCOLoss + a training script benchmarking data efficiency against PPO/SAC on Pendulum-v1.

A few things I'd like to check before starting:

Is MBRL for low-dimensional control still in scope for sota-implementations/, given the current LLM focus?
GP vs. NN ensemble?
Any ongoing work I might be duplicating?

Happy to share a draft design or prototype if useful.

vmoens · 2026-03-01T18:29:17Z

vmoens
Mar 1, 2026
Collaborator

Hey @alektebel, great timing -- we just received PR #3537 which adds a vanilla PILCO implementation. A few thoughts:

MBRL is definitely in scope. Low-dimensional control with model-based methods is very much welcome in sota-implementations/. The LLM work is additive, not a replacement for the core RL mission.

Re: duplication with #3537. The current PILCO PR uses analytical moment matching (Deisenroth & Rasmussen's original formulation). Interestingly, the author mentioned they tried MC moment matching but couldn't get it to stabilize. MC-PILCO is a distinct enough algorithm that it warrants its own implementation -- especially given the ICRA 2025 results you mention.

What would be most useful:

We're planning to move several components from [Feature] PILCO #3537 into core (BoTorchGPWorldModel, RBFController, a generic saturating cost module). If you build MC-PILCO on top of those same core primitives, we'd end up with a clean shared foundation for GP-based MBRL methods. Maybe let's coordinate with PSXBRosa (here or on discord) about ways to work together?!
If you got MC moment matching working reliably, that's independently valuable -- it could be an alternative forward mode on the GP world model (the analytical path is O(D^2) in Python loops right now and doesn't scale).
GP vs NN ensemble: both are interesting. The current PR uses BoTorch/GPyTorch. If you'd prefer NN ensembles for the dynamics model, that's a different flavor and also welcome -- but I'd suggest starting with GP to share infrastructure with [Feature] PILCO #3537.

Would recommend taking a look at #3537, maybe commenting there on the MC approach, and then opening a draft PR. Happy to review early.

1 reply

alektebel Mar 1, 2026
Author

Thanks for the thorough breakdown @vmoens!
I'm aiming to have a draft PR up within 3–4 days (definitely this week).
Here's my current plan of action:

Review [Feature] PILCO #3537 in depth and comment there on the MC moment matching approach , specifically where the instability issues tend to arise
Ping @PSXBRosa on Discord to align on the shared primitives (BoTorchGPWorldModel, RBFController, saturating cost module) so that MC-PILCO layers cleanly on top without duplication.
Implement MC moment matching as an alternative forward mode on the GP world model
Validate on low-dim benchmarks (Cartpole, Pendulum), with the goal of reproducing the stability results from the ICRA 2025 paper before opening the draft
Open a draft PR

I'll start with GP dynamics to share infrastructure, and can revisit NN ensembles as a follow-up if there's appetite for it.

alektebel · 2026-03-04T22:03:12Z

alektebel
Mar 4, 2026
Author

Hi @vmoens ,

Update: left a comment on PSXBRosa's PILCO PR #3537 to coordinate on shared primitives, since as per today, pulling up a draft PR would make duplicities on the core primitives.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MC-PILCO for TorchRL #3538

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

MC-PILCO for TorchRL #3538

Uh oh!

alektebel Mar 1, 2026

Replies: 2 comments · 1 reply

Uh oh!

vmoens Mar 1, 2026 Collaborator

Uh oh!

Uh oh!

alektebel Mar 1, 2026 Author

Uh oh!

Uh oh!

alektebel Mar 4, 2026 Author

alektebel
Mar 1, 2026

Replies: 2 comments 1 reply

vmoens
Mar 1, 2026
Collaborator

alektebel Mar 1, 2026
Author

alektebel
Mar 4, 2026
Author