Skip to content

Defining subset of available actions when using ValueIteration #82

@anachristinaac

Description

@anachristinaac

Hi,

I'm trying to implement a problem in which only a subset of all actions is available at each state. I'm doing this by making the return of the get_all_actions() function inside the PolicyModel class dependent on the state, as shown below:

class PolicyModel(pomdp_py.RolloutPolicy):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def sample(self, state):
        return random.sample(self.get_all_actions(state), 1)[0]

    def rollout(self, state, history=None):
        return self.sample(state)

    def get_all_actions(self, state=None, history=None):
        if state is None or state.name == "work":
            return [Action(a) for a in {"rail", "car", "cycle", "wait",
                                        "go-home", "relax", "drive"}]
        else:
            if state.name == "home":
                return [Action(a) for a in {"rail", "car", "cycle"}]
            if state.name == "wait-room":
                return [Action(a) for a in {"wait", "go-home"}]
            if state.name == "train":
                return [Action("relax")]
            if state.name == "light-traffic":
                return [Action("drive")]
            if state.name == "medium-traffic":
                return [Action("drive")]
            if state.name == "heavy-traffic":
                return [Action("drive")]

This doesn't seem to work if I'm using Value Iteration (planner = pomdp_py.ValueIteration(horizon=1, discount_factor=0.9)) to solve the problem. Below is an example of output. In the first step, the action chosen ("wait") is not one of the available ones at the current state ("home"). Then this results in a belief state with zero probability in every state.

==== Step 1 ====
True state: home
Belief: {State: home: 1.0, State: wait-room: 0.0, State: train: 0.0, State: light-traffic: 0.0, State: medium-traffic: 0.0, State: heavy-traffic: 0.0, State: work: 0.0}
Action: wait
Reward: -1000.0
>> Observation: home
==== Step 2 ====
True state: home
Belief: {State: home: 0.0, State: wait-room: 0.0, State: train: 0.0, State: light-traffic: 0.0, State: medium-traffic: 0.0, State: heavy-traffic: 0.0, State: work: 0.0}
Action: rail
Reward: -2.0
>> Observation: train

From what I understood while debugging, the ValueIteration method does not call the sample() function of the PolicyModel, so the get_all_actions() function is always called with the default argument state=None, resulting in all actions being considered as available.

How can I define these different subsets of actions when using ValueIteration then?

Here is the complete script. I'm running it using Python 3.9.6 with version 1.3.3 of the pomdp-py package.

Thank you!

Kind regards,
Ana

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions