Simulating sequential interaction #104
Replies: 4 comments 2 replies
-
|
Hi Nitay, Thanks for writing. You should make the following changes. First, write a new memo model that only performs the belief update step. This model should have the signature: and it should return the updated belief state b' after conditioning on prize With that model in hand, you can write a Python simulation loop where you maintain a variable
Does that plan make sense? |
Beta Was this translation helpful? Give feedback.
-
|
Hi Kartik, @memo(cache=True)
def belief_update[bandit_type: BanditType, arm: Arms, prize: Prize](b):
agent: knows(arm, prize)
agent: thinks[
bandit: knows(arm, prize),
bandit: chooses(bandit_type in BanditType, wpp=reward_probability(prize, arm, bandit_type))
]
agent: observes[bandit.bandit_type] is bandit_type
# Update belief using Bayes' rule
agent: chooses(b in B, wpp= exp(b * bandit.bandit_type))
return agent.breturn wrong values - but it is also unclear to me where the conditioning takes place - shouldn't the new model take (b, arm, prize) as input? |
Beta Was this translation helpful? Give feedback.
-
|
I don't think there is an example of such a model in the demo directory. In your model, you should have:
Does that general plan make sense? |
Beta Was this translation helpful? Give feedback.
-
|
Hi Nitay! Just checking in — did that help you solve your problem? Please feel free to ask follow-up questions if my response wasn't clear, or if more issues come up! :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a code implementing a POMDP solver. I want to run the code in loop, simulating the interaction with the environment rather than solving the entire problem. I want to collect the updated belief and q-values after each iteration.
Here's my code:
Which updates to the code are needed?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions