Final State of Active Inference Mountain Car Example #561

ekoshadow · 2025-11-03T09:20:49Z

ekoshadow
Nov 3, 2025

Hi, I've been looking at the Active Inference Mountain Car example and I was wondering about the final state of the car after it reaches the goal. From my understanding of the goal prior, the agent will try to find a course of action to get there by time t=T and then maintain that position. However, in the example results, the car travels past the target position and slowly drifts back towards it. Why does the agent do this? Is there a way to modify the goal prior or another part of the model so that the car stays at the target position indefinitely?

I'm new to active inference and would appreciate any insight :)

Thank you!

abpolym · 2025-11-03T09:56:22Z

abpolym
Nov 3, 2025
Collaborator

It has been a long time since I have worked on the mountain car example but as far as I remember it simply underestimates the momentum that it has, which leads to it overshooting over the goal and then slowly moving back towards it.
Thus, it has not learned the dynamics well enough and needs a bit more training or more trials to get there without overshooting.

But to verify if that is true I have to spend some time again on the example (which I can do in a few days at the earliest) - or you can plot its predictions (mean +- std from the posterior predictive variance) and see how well it actually estimates.
You could play around with the goal prior variance - for example have a high goal prior variance first and then slowly let it decrease - then you essentially replicate some epsilon-decay type of exploration behavior that is commonly seen in reinforcement learning (see e.g. Q-learning) that tells the agent to first explore its environment (learning about its dynamics) before moving towards the goal.

3 replies

ekoshadow Nov 6, 2025
Author

Thank you for the reply! I plotted the state posterior as predicted at a specific time step, and compared it with the actual state of the environment as well as s_t_min from the slide() function:

The state predictions aren't very close to the actual state, but they do improve over time--I will try the different goal prior variances to see if that helps. I also tried using different approximation methods, but that didn't significantly improve the predictions' accuracy (though it did lower the standard deviation).

abpolym Nov 6, 2025
Collaborator

I will investigate this now and report to you back in about an hour.
Edit: Something else came up, I'll take up on it soon

abpolym Nov 10, 2025
Collaborator

Ok I debugged the example and will post my answer in a new post below.

bvdmitri · 2025-11-04T13:02:27Z

bvdmitri
Nov 4, 2025
Maintainer

Summoning @ThijsvdLaar to this thread

0 replies

bvdmitri · 2025-11-04T13:03:48Z

bvdmitri
Nov 4, 2025
Maintainer

This can be as simple as an approximation error introduced by the Linearization method

2 replies

bvdmitri Nov 4, 2025
Maintainer

To check this hypothesis one can try using the Unscented with different hyper-parameters and see if the behavior is better/worse

ekoshadow Nov 6, 2025
Author

Thank you for the reply! I tried 2 different sets of hyper-parameters for the Unscented transform:

This is the original result with Linearization:

This is with Unscented(alpha=0.001, beta=2.0, kappa=0.0):

This is with Unscented(alpha=1.0, beta=0.0, kappa=0.0):

The agent performed worse with the first set of hyper-parameters and didn't reach the goal in the given 100 time steps, and the car stayed stuck partway up the hill.
With the second set of parameters, the behaviour is similar to the original, except that the car keeps travelling away from the goal after reaching it, and stops farther up the hill.
In all of the results, I noticed that the agent chooses almost the exact same action every step shortly after reaching t=T=50, even if it didn't reach the goal.

Are there other combinations of hyper-parameters I should try?

ThijsvdLaar · 2025-11-06T08:46:07Z

ThijsvdLaar
Nov 6, 2025
Collaborator

I've been summoned. This is a good question, and examples like these can be notoriously hard to debug. The main reason is that the dynamic behaviour of the agent results from an interplay between all kinds of precisions in convoluted ways. I'll write some thoughts below that may give some direction.

First note that the mountain car example of [1] is simply a generative model with a goal prior glued to the observations. This approach differs from the mountain car as described in earlier works, for example [2]; or the Bayesian thermostat of [3]. In this setup, transition precision has been known to have an effect on goal-directed behaviour as well, see e.g. [9].
In [2], the generative model prescribes a desired dynamics, and actions are computed through a generative process constrained by these dynamics. Here, goals are enforced through a belief that friction is high around the goal position, such that the agent expects to stick around there longer than anywhere else. But even in this example, the agent will slowly slide back into the valley and try again after some time. (This becomes apparent when choosing a lower friction coefficient).
The approach of [1] lacks an epistemic drive that actually characterizes conventional active inference [4, 5]. More recent works try to address this shortcoming, see e.g. [6, 7, 8].

[1] Van de Laar, T. W., & De Vries, B. (2019). Simulating active inference processes by message passing. Frontiers in Robotics and AI, 6, 20.
[2] Friston, K., & Ao, P. (2012). Free energy, value, and attractors. Computational and mathematical methods in medicine, 2012(1), 937860.
[3] Buckley, C. L., Kim, C. S., McGregor, S., & Seth, A. K. (2017). The free energy principle for action and perception: A mathematical review. Journal of mathematical psychology, 81, 55-79.
[4] Schwöbel, S., Kiebel, S., & Marković, D. (2018). Active inference, belief propagation, and the bethe approximation. Neural computation, 30(9), 2530-2567.
[5] Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., & Pezzulo, G. (2015). Active inference and epistemic value. Cognitive neuroscience, 6(4), 187-214.
[6] Koudahl, M., van de Laar, T., & De Vries, B. (2023). Realising synthetic active inference agents, part I: Epistemic objectives and graphical specification language. arXiv preprint arXiv:2306.08014.
[7] van de Laar, T., Koudahl, M., & de Vries, B. (2024). Realizing Synthetic Active Inference Agents, Part II: Variational Message Updates. Neural Computation, 37(1), 38-75.
[8] Nuijten, W. W., Lukashchuk, M., van de Laar, T., & de Vries, B. (2025). A Message Passing Realization of Expected Free Energy Minimization. arXiv preprint arXiv:2508.02197.
[9] van de Laar, T., Şenöz, İ., Özçelikkale, A., & Wymeersch, H. (2021). Chance-constrained active inference. Neural Computation, 33(10), 2710-2735.

1 reply

ekoshadow Nov 6, 2025
Author

Thank you for the detailed response! I'll take a look at those :)

abpolym · 2025-11-10T06:16:42Z

abpolym
Nov 10, 2025
Collaborator

Alright I did the hard part and debugged the example.
This took me a bit longer, as understanding what is going wrong in these types of examples often requires not only visualizing the state of the agent in the environment (which is present in the example) but also visualizing the actual decision-making / inference process of the agent at every step during the trial (including every step in its planning horizon)¹.

To debug the example, I have spent some time creating a Pluto notebook that loads the results from running an experiment of the "active inference" agent on the mountain car environment, and then plots the posteriors for every random variable in the model at every time step.
You can access the notebook in my repo and I have made a PR to add it to the RxInferExamples repo.
In the notebook, you have a bunch of sliders, e.g. one to choose the timestep k_start during the experiment that you want to investigate the state of the agent and its decision-making.
We have a plot for every variable in the model to plot their posteriors over the planning horizon of length T_ai = 50.
For k_start=1 (i.e. the first step) and the variable u[k] ~ MvNormal(mean = m_u[k], cov = V_u[k]), we can see:

The control at the first step with mean 0 is the control that is being executed which is already "observed" and known to the agent. The second step with mean at around 2 is the control that will be sent to the environment and will be executed in the next step.

We also plot the estimated engine forces stored in u_h_k[k] ~ h(u[k]) (the estimate of F_a(u[k])):

and since u_h_k is two-dimensional (it is a state change actually) we also can move the slider to show the second dimension:

If we plot the estimates u_h_k against the actual experienced for every timestep in the trial of length N_ai = 100 then we see that for the second dimension (the change in velocity i.e. the acceleration) we have kind of bad estimates initially with really bad estimate spikes around t=50 before the estimates are close to the truth:

However, looking at the first dimension (change in position i.e. velocity caused by the control), the estimates are good in the beginning, also have a bad estimate spike at around t=50 and then are quite far off afterwards:

If you have a look at s_g_k (which models states transitions excluding effects from the chosen controls), then you will also see that estimates for both dimensons (position and velocity) are slightly off, especially for position after t=50.

This then comes apparent when we look at the plots for the estimates for the (future) observations:
For the velocity, our estimates are bad at the beginning and are especially bad when the dynamics are highly non-linear, i.e. when there is a big change in velocity, but are good when we stop moving:

However, the estimates for our position are OFF when we stop moving!

This is due to bad estimates of the dynamics and especially because of the bad estimates of the change of position caused by controls!
In the environment, the controls are accelerations that only affect the velocity (Fa = (a::Real) -> engine_force_limit * tanh(a) and later y_dot_t = y_dot_t_min + Fg(y_t_min) + Ff(y_dot_t_min) + Fa(a_t)) and in the model we do also seemingly model that as h = (u::AbstractVector) -> [0.0, Fa(u[1])] and u_h_k[k] ~ h(u[k]). But guess what? The inference process does not care that your prior had zeros for the position in the vector, due to message passing the posteriors of u_h_k will still be influenced by the other dimensions and variables as you can see in the plots.

This means that when we plot the landscape of the agent at k_start=50:

The actual position of the car is at ~0.75 but the agent thinks it is already at the goal!
So why should it move?

Note that there is another UI element to enable/disable showing the variances of each posterior (disabled by default due to large variances dominating the plot and flattening the means).

Also just to advertise our own work here, too.
We have a proper continuous active inference agent that probably can solve this environment with an expected free energy objective that nicely splits into a goal-seeking term and two information-seeking / epistemic terms.
You can try out the code here and read about it here and here.

¹ Just to rant a little bit: More than 3 years ago I spent a long time implementing a framework due to my frustrations with debugging the mountain car agent-environment interaction. My framework allowed you to run any predefined (active inference) agents on any predefined environment and then automatically provided you with plots such as agent-environment interaction plots, and in the case of probabilistic agents their beliefs over time including for every step over its horizon, etc. This was to help people new to active inference understand the decision-making of the agent, but also to e.g. satisfy curiosity i.e. be able to play around hyperparameters and see their effects on the behavior of the agents. The idea for this framework came from RL Baselines3 Zoo. Something else came up and a year later I could not publish my idea anymore as a paper without additional work. However, since this problem is still not solved, I might spend some time working on a solution again.

5 replies

abpolym Nov 10, 2025
Collaborator

Oh and just to replicate: Start the pluto server with (the code in the) run-pluto.sh (which should precompile the packages and open a Pluto notebook tab in the browser), then create trial result data with julia --project=. train.jl, then finally in the Pluto tab open the visualize-mountain-car.jl Pluto notebook and run it.

ekoshadow Nov 10, 2025
Author

Thank you so much! I'll definitely check out the Pluto notebook and the other links.

I have a few more questions:

Is there any way to manually fix the position element of u_h_k such that it remains 0 during inference?
Why are the state changes split into multiple nodes? Would combining them into one node help with estimating the position?
Since switching to the Unscented approximation method didn't fix the issue, is there another way to improve the estimates?

On the topic of the debugging framework--having easier access to all those plots would be super helpful for understanding what's going on :)

abpolym Nov 11, 2025
Collaborator

Those are good questions to which I do not immediately have a quick answer.
One of the ways for the first question that I can think of is to force the agent to only model estimate the change in velocity Fa(u[1]) and let it observe the change of position from the control as 0. This means splitting u_h_k into two variables - one is random and another is a data/observed variable.
Maybe you can revisit changing the parameters of the Unscented approximation method and seeing their effect on u_h_k and s_g_k now that you have a way to get a clearer picture of their effects?

For the second question, I am not sure myself why s[k] exists since x[k] already exists or who made this specific model to begin with. I remember in the beginning of my PhD that I wanted to split internal states of an agent (s) that are expected to have no noise with an observation layer x that are expected to have noise. This might match this idea, but for this exact problem there is no observation noise since the environment returns its state without noise. If you look at the plots for both s and x, they are basically the same. So in my view you could get rid of s.

abpolym Nov 12, 2025
Collaborator

@ekoshadow If you need a way to improve the estimates, I can spend an hour or so to see if I can improve the model. Alternatively I wanted to apply the continuous active inference agent with a multivariate autoregressive model (+exogenous inputs) on that environment.

ekoshadow Nov 12, 2025
Author

That would be much appreciated!

Also, I've been playing around with different planning horizons and goal settings, and I noticed that the bad estimates in u_h_k always begin at step where the goal takes effect. Could the goal somehow be affecting how u_h_k is inferred? Maybe the tighter goal variance is biasing the agent's internal state to be closer to goal, so the position component of u_h_k is inferred to be non-zero to compensate?

ReactiveBayes

Final State of Active Inference Mountain Car Example #561

Uh oh!

ekoshadow Nov 3, 2025

Replies: 5 comments · 11 replies

Uh oh!

Uh oh!

abpolym Nov 3, 2025 Collaborator

Uh oh!

ekoshadow Nov 6, 2025 Author

Uh oh!

Uh oh!

abpolym Nov 6, 2025 Collaborator

Uh oh!

abpolym Nov 10, 2025 Collaborator

Uh oh!

bvdmitri Nov 4, 2025 Maintainer

Uh oh!

bvdmitri Nov 4, 2025 Maintainer

Uh oh!

bvdmitri Nov 4, 2025 Maintainer

Uh oh!

ekoshadow Nov 6, 2025 Author

Uh oh!

ThijsvdLaar Nov 6, 2025 Collaborator

Uh oh!

ekoshadow Nov 6, 2025 Author

Uh oh!

Uh oh!

abpolym Nov 10, 2025 Collaborator

Uh oh!

abpolym Nov 10, 2025 Collaborator

Uh oh!

ekoshadow Nov 10, 2025 Author

Uh oh!

abpolym Nov 11, 2025 Collaborator

Uh oh!

abpolym Nov 12, 2025 Collaborator

Uh oh!

ekoshadow Nov 12, 2025 Author

ekoshadow
Nov 3, 2025

Replies: 5 comments 11 replies

abpolym
Nov 3, 2025
Collaborator

ekoshadow Nov 6, 2025
Author

abpolym Nov 6, 2025
Collaborator

abpolym Nov 10, 2025
Collaborator

bvdmitri
Nov 4, 2025
Maintainer

bvdmitri
Nov 4, 2025
Maintainer

bvdmitri Nov 4, 2025
Maintainer

ekoshadow Nov 6, 2025
Author

ThijsvdLaar
Nov 6, 2025
Collaborator

ekoshadow Nov 6, 2025
Author

abpolym
Nov 10, 2025
Collaborator

abpolym Nov 10, 2025
Collaborator

ekoshadow Nov 10, 2025
Author

abpolym Nov 11, 2025
Collaborator

abpolym Nov 12, 2025
Collaborator

ekoshadow Nov 12, 2025
Author