implement generalized policy iteration in `ValueFunction`

Current in `ValueFunction` we have value iteration going... but we don't have a way to decide what to do at the end of each sweep, within a sweep, and across sweeps.

One idea would be to code specific optimizations. Another would be to code a set of functions that would show what happens at each level.