Skip to content

Discrepancy between the default reward function rewards/reward and the default probability transition matrix in the no crashing case. #48

@alexsieusahai

Description

@alexsieusahai

Consider the default setup (as outlined in exe/road.py), where allow_crashing is set to false. Consider a situation where the car is in column 0 going at speed 2, and there's an obstacle at (0, 0).

If I go LEFT here, the probability transition matrix considers this action to be equivalent to CRUISE | NO_OP, but the reward function considers this action to reduce the forward movement by 1, which isn't exactly CRUISE.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions