-
Notifications
You must be signed in to change notification settings - Fork 22
Description
@reubenharry your ideas in #281 have given me an idea in turn. We've discussed a few times that really resampling buffers in rhine-bayes should arise from inference. I think I have a proposal how this should look like.
Principle
A resampling buffer is basically an effectful Moore machine: It has a method to put new data in it (and thus alter its state), and a method to get data from its state. In rhine-bayes, these two methods should have the following meaning:
- The internal state of the buffer is the current estimate of the state of some stochastic process. (This may be a particle filter, but other methods of inference make sense as well.)
put: Time has passed and a new observation has been made. The state of the buffer should be updated to reflect this new information in two steps:- The simulation of the process has to advance to the given timestamp
- The state of the process has to be conditioned on the observation
get: Time has passed, but no new observation has been made. But an estimate of the state is requested. The simulation is advanced to the new timestamp, and the current state estimation is returned.
Pseudocode
To implement a buffer as a particle filter, we need this data:
-- | A stochastic process whose development in time we want to use for resampling
model :: StochasticProcess time a
model = ...
-- | Given state @a@, what is the likelihood of @b@ occurring?
-- This is used to weight the different particles later.
observation :: MonadFactor m => a -> b -> m ()
observation = ...
myInferenceBuffer :: MonadMeasure m => ResamplingBuffer m clA clB b [(a, Probability)]
myInferenceBuffer = inferenceBuffer model observationExample in use
Let's think about how the brownian motion example would simplify. If, for the moment, we disregard the varying temperature, the whole rhine currently looks like this schematically:
simulation >-- keepLast default --> inference >-- keepLast default --> visualizationThis setup is unsatisfactory in a few ways: The usage of keepLast is ad hoc, inference may run a couple of times on the same values if it ticks more often than the simulation. Also, we keep creating estimates of the current state more often than we can visualize them, which is wasteful.
I think it would be much better if the inference is activated exactly on every simulation step, and the current estimate is retrieved exactly on every step of the visualization! This would be achieved with this setup:
simulation >-- myInferenceBuffer --> visualizationMuch simpler, no ad hoc choices, I believe better performance in runtime as well as in quality.
Variation
This opens the door for a funny game we could implement: Instead of the simulation creating sensor readings, we let the user try to click on the green dot (latent position), and the clicked position is the sensor reading. This way users can try to get a feel how Bayesian inference works.
Open questions
- If the model contains a simulation that uses imprecise integration, e.g. Euler integration, then the inference step may produce bad estimates if its simulation isn't called often enough. This may be the case if the visualization has a low frame rate and the sensor readings don't come in fast enough. One can solve this problem by polling not only at every visualization step, but also at a regular high enough rate, by combining the visualization clock with a constant rate clock using
ParallelClock - I don't understand how to pass an additional external parameter like the temperature to the buffer. If it is to be estimated, then this is fine (we use the existing workaround to make a stochastic process out of it), but if the buffer should use it as input, then I don't know how to pass it on, because the
getmethod doesn't take input. One solution might be extending resampling buffers to do input and output at the same time. But I don't want to extend the framework just to accomodate one use case. With the existing framework, I can see two ad hoc solutions:
- Using a global
StateT, update the parameter from one component and read it from the buffer. This should work but it clutters the type level and doesn't feel idiomatic. putcould acceptEither b pwherepis the extra parameter. But then some extra organisation of data has to go on before the buffer and intermingle measurements and parameters together.
- I don't understand what the clocks for the resampling buffer should be. Are they completely arbitrary, or can model & observation likelihood contain clock information?