Currently, the agent might get stuck in extremely long intermediate state loops that don't improve the trajectory. Similarly, some heuristic passes do not always find a solution, or they may timeout - blocking training.
Add a customizable step limit and truncation for failed/timeout passes to speed up training.
Currently, the agent might get stuck in extremely long intermediate state loops that don't improve the trajectory. Similarly, some heuristic passes do not always find a solution, or they may timeout - blocking training.
Add a customizable step limit and truncation for failed/timeout passes to speed up training.