Defining subset of available actions when using ValueIteration

Hi, 

I'm trying to implement a problem in which only a subset of all actions is available at each state. I'm doing this by making the return of the ```get_all_actions()``` function inside the PolicyModel class dependent on the state, as shown below:

```python
class PolicyModel(pomdp_py.RolloutPolicy):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def sample(self, state):
        return random.sample(self.get_all_actions(state), 1)[0]

    def rollout(self, state, history=None):
        return self.sample(state)

    def get_all_actions(self, state=None, history=None):
        if state is None or state.name == "work":
            return [Action(a) for a in {"rail", "car", "cycle", "wait",
                                        "go-home", "relax", "drive"}]
        else:
            if state.name == "home":
                return [Action(a) for a in {"rail", "car", "cycle"}]
            if state.name == "wait-room":
                return [Action(a) for a in {"wait", "go-home"}]
            if state.name == "train":
                return [Action("relax")]
            if state.name == "light-traffic":
                return [Action("drive")]
            if state.name == "medium-traffic":
                return [Action("drive")]
            if state.name == "heavy-traffic":
                return [Action("drive")]
```

This doesn't seem to work if I'm using Value Iteration (```planner = pomdp_py.ValueIteration(horizon=1, discount_factor=0.9)```) to solve the problem. Below is an example of output. In the first step, the action chosen ("wait") is not one of the available ones at the current state ("home"). Then this results in a belief state with zero probability in every state.

```
==== Step 1 ====
True state: home
Belief: {State: home: 1.0, State: wait-room: 0.0, State: train: 0.0, State: light-traffic: 0.0, State: medium-traffic: 0.0, State: heavy-traffic: 0.0, State: work: 0.0}
Action: wait
Reward: -1000.0
>> Observation: home
==== Step 2 ====
True state: home
Belief: {State: home: 0.0, State: wait-room: 0.0, State: train: 0.0, State: light-traffic: 0.0, State: medium-traffic: 0.0, State: heavy-traffic: 0.0, State: work: 0.0}
Action: rail
Reward: -2.0
>> Observation: train
```

From what I understood while debugging, the ValueIteration method does not call the ```sample()``` function of the PolicyModel, so the ```get_all_actions()``` function is always called with the default argument ```state=None```, resulting in all actions being considered as available. 

How can I define these different subsets of actions when using ValueIteration then?

[Here](https://github.com/user-attachments/files/22143926/home_work_example.py) is the complete script. I'm running it using Python 3.9.6 with version 1.3.3 of the pomdp-py package.

Thank you!

Kind regards,
Ana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Defining subset of available actions when using ValueIteration #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Defining subset of available actions when using ValueIteration #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions