Skip to content

Inference resampling buffers #289

@turion

Description

@turion

@reubenharry your ideas in #281 have given me an idea in turn. We've discussed a few times that really resampling buffers in rhine-bayes should arise from inference. I think I have a proposal how this should look like.

Principle

A resampling buffer is basically an effectful Moore machine: It has a method to put new data in it (and thus alter its state), and a method to get data from its state. In rhine-bayes, these two methods should have the following meaning:

  • The internal state of the buffer is the current estimate of the state of some stochastic process. (This may be a particle filter, but other methods of inference make sense as well.)
  • put: Time has passed and a new observation has been made. The state of the buffer should be updated to reflect this new information in two steps:
    • The simulation of the process has to advance to the given timestamp
    • The state of the process has to be conditioned on the observation
  • get: Time has passed, but no new observation has been made. But an estimate of the state is requested. The simulation is advanced to the new timestamp, and the current state estimation is returned.

Pseudocode

To implement a buffer as a particle filter, we need this data:

-- | A stochastic process whose development in time we want to use for resampling
model :: StochasticProcess time a
model = ...

-- | Given state @a@, what is the likelihood of @b@ occurring?
--   This is used to weight the different particles later.
observation :: MonadFactor m => a -> b -> m ()
observation = ...

myInferenceBuffer :: MonadMeasure m => ResamplingBuffer m clA clB b [(a, Probability)]
myInferenceBuffer = inferenceBuffer model observation

Example in use

Let's think about how the brownian motion example would simplify. If, for the moment, we disregard the varying temperature, the whole rhine currently looks like this schematically:

simulation >-- keepLast default --> inference >-- keepLast default --> visualization

This setup is unsatisfactory in a few ways: The usage of keepLast is ad hoc, inference may run a couple of times on the same values if it ticks more often than the simulation. Also, we keep creating estimates of the current state more often than we can visualize them, which is wasteful.

I think it would be much better if the inference is activated exactly on every simulation step, and the current estimate is retrieved exactly on every step of the visualization! This would be achieved with this setup:

simulation >-- myInferenceBuffer --> visualization

Much simpler, no ad hoc choices, I believe better performance in runtime as well as in quality.

Variation

This opens the door for a funny game we could implement: Instead of the simulation creating sensor readings, we let the user try to click on the green dot (latent position), and the clicked position is the sensor reading. This way users can try to get a feel how Bayesian inference works.

Open questions

  1. If the model contains a simulation that uses imprecise integration, e.g. Euler integration, then the inference step may produce bad estimates if its simulation isn't called often enough. This may be the case if the visualization has a low frame rate and the sensor readings don't come in fast enough. One can solve this problem by polling not only at every visualization step, but also at a regular high enough rate, by combining the visualization clock with a constant rate clock using ParallelClock
  2. I don't understand how to pass an additional external parameter like the temperature to the buffer. If it is to be estimated, then this is fine (we use the existing workaround to make a stochastic process out of it), but if the buffer should use it as input, then I don't know how to pass it on, because the get method doesn't take input. One solution might be extending resampling buffers to do input and output at the same time. But I don't want to extend the framework just to accomodate one use case. With the existing framework, I can see two ad hoc solutions:
  • Using a global StateT, update the parameter from one component and read it from the buffer. This should work but it clutters the type level and doesn't feel idiomatic.
  • put could accept Either b p where p is the extra parameter. But then some extra organisation of data has to go on before the buffer and intermingle measurements and parameters together.
  1. I don't understand what the clocks for the resampling buffer should be. Are they completely arbitrary, or can model & observation likelihood contain clock information?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions