Skip to content

Commit de26164

Browse files
committed
docs: fix errors and add missing files
1 parent 6831629 commit de26164

3 files changed

Lines changed: 157 additions & 2 deletions

File tree

clips_executive/cx_docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2024-2025 Carologistics
1+
# Copyright (c) 2024-2026 Carologistics
22
# SPDX-License-Identifier: Apache-2.0
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
RL Node
2+
#######
3+
4+
Reinforcement learning in this extension is enabled via a ROS node, which manages a RL environment and a RL algorithm.
5+
6+
The core is a `Gymnasium`_ environment derived fro mthe CXRLGym class, which exchanges all information regarding observations and actions via ROS and hence provides a generic interface.
7+
8+
Along with an environment we provide a suitable RL algorithm and bundle both in the ``CXRLNode``.
9+
10+
CXRLNode
11+
********
12+
13+
The CXRLNode is a node to dynamically load a CXRLGym environment and to manage training and execution of a multi-robot reinforcement learning agent using a maskable PPO algorithm.
14+
15+
The node supports both training and execution modes and interfaces with ROS 2 services and actions through a dynamically loaded environment.
16+
Depending on the configured RL mode it performs the following tasks:
17+
18+
* **Training**
19+
20+
* Create or load a PPO agent
21+
* Train for a specified number of timesteps or episodes
22+
* Save checkpoints and the final trained model
23+
24+
* **Execution**
25+
26+
* Load an existing trained agent
27+
* Execute the policy until shutdown is requested
28+
29+
30+
CXRLGym
31+
*******
32+
33+
CXRLGym provides a Gymnasium-compatible reinforcement learning environment that integrates tightly with ROS 2.
34+
It supports both single-robot and multi-robot scenarios, including settings where a single shared policy is trained and deployed across multiple robots.
35+
36+
37+
Observation Space
38+
-----------------
39+
40+
The observation space encodes the symbolic world state as a fixed-size numerical feature vector.
41+
42+
* The symbolic state is defined using predicates and objects.
43+
* Each feature corresponds to the truth value of a grounded fluent and is represented as a binary value in the interval :math:`[0, 1]`.
44+
* The observation vector is exposed as a Gymnasium ``Box`` space with shape ``(n_obs,)``.
45+
46+
Two mechanisms for defining observables are supported:
47+
48+
* **Automatically grounded predicates**
49+
50+
Predicates are grounded over all compatible object combinations, yielding a complete propositional encoding of the symbolic state. Each grounded predicate corresponds to one feature in the observation vector.
51+
This is realized using the ROS services GetObservableObjects and GetObservablePredicates.
52+
53+
* **Predefined observables**
54+
55+
In addition to automatic grounding, predefined observables with fixed predicate arguments can be specified explicitly. These observables do not span the full object space and allow the definition of domain-specific or abstracted features.
56+
This is realized using the ROS service GetPredefinedPredicates.
57+
58+
Both types of observables are combined into a single observation vector, enabling a trade-off between representational completeness and compactness.
59+
60+
Action Space
61+
------------
62+
63+
The action space is represented as a discrete set of action names and is exposed as a Gymnasium ``Discrete(n_actions)`` space. Each discrete value corresponds to a named high-level action.
64+
65+
.. todo::
66+
67+
THIS IS STILL MISSING
68+
69+
The complete action set is initialized via the ROS 2 service ``GetActionSpace``, which provides the list of all actions supported by the system.
70+
71+
CXRLGym is designed to support multi-robot scenarios. In this setting, actions are assumed to be assignable to individual robots. Action masking is used to indicate which actions are executable for a given robot at each decision step. The set of currently executable actions for a robot is obtained through the ``GetActionListRobot`` service.
72+
73+
Step Function
74+
-------------
75+
76+
The environment step proceeds as follows:
77+
78+
* The selected action is checked for executability with respect to the currently assigned robot.
79+
* If the action is not executable:
80+
81+
* The current observation is returned.
82+
* The reward is computed.
83+
* The episode termination condition is evaluated via the ``GetEpisodeEnd`` service.
84+
85+
* If the action is executable:
86+
87+
* The action is dispatched using the ``ActionSelection`` ROS action interface.
88+
* After execution, the reward, updated observation, and termination status are retrieved.
89+
90+
CXRLGym Interfaces
91+
------------------
92+
93+
The ``CXRLGym`` node provides and consumes various ROS 2 services. The table below indicates whether it is a **service provider** or **client** for each endpoint.
94+
95+
.. list-table::
96+
:header-rows: 1
97+
:widths: 40 60
98+
99+
* - Service / Action
100+
- Description
101+
* - ``/exec_action_selection`` (service provider)
102+
- Executes an action selection step using the RL model and returns the selected action.
103+
* - ``/create_rl_env_state`` (client)
104+
- Requests the current environment state as a serialized string of facts.
105+
* - ``/set_rl_mode`` (client)
106+
- Requests to set the RL mode (e.g., train, test) and receives confirmation.
107+
* - ``/get_action_list_executable`` (client)
108+
- Requests all actions currently executable in the environment.
109+
* - ``/get_action_list_executable_for_robot`` (client)
110+
- Requests executable actions for a robot.
111+
* - ``/get_observable_objects`` (client)
112+
- Requests observable objects in the environment for a given type.
113+
* - ``/get_observable_predicates`` (client)
114+
- Requests predicates and their parameters observable in the environment.
115+
* - ``/get_predefined_observables`` (client)
116+
- Requests predefined observables available in the environment.
117+
* - ``/reset_cx`` (action client)
118+
- Reset the environment or agent and receives confirmation.
119+
* - ``/get_free_robot`` (action client)
120+
- Determine an available robot in multi-robot scenarios.
121+
* - ``/action_selection`` (action client)
122+
- Perform the execution of the selected action.
123+
124+
125+
Multi-Robot Maskable Proximal Policy Optimization
126+
*************************************************
127+
128+
MultiRobotMaskablePPO is an extension of the `MaskableActorCriticPolicy`_ from `sb3_contrib`_ designed for multi-robot reinforcement learning scenarios.
129+
130+
The algorithm enables multiple robots to act concurrently while sharing a single policy, combining invalid action masking with parallel rollout collection.
131+
132+
Key Features
133+
------------
134+
135+
* Shared policy across multiple robots
136+
* Concurrent action execution using parallel threads
137+
* Support for invalid action masking via ``MaskableActorCriticPolicy``
138+
* Time-based and step-based rollout collection modes
139+
* Custom rollout buffers with action mask support
140+
141+
Multi-Robot Rollout Collection
142+
------------------------------
143+
144+
Rollouts are collected concurrently by spawning up to ``n_robots`` worker threads. Each thread executes a single environment step using the current policy and records the resulting transition.
145+
146+
Depending on configuration, rollout collection terminates when either a fixed number of steps is reached or a predefined time budget expires. An optional synchronization barrier ensures that all robot threads complete before advantage computation.
147+
148+
Action Masking
149+
--------------
150+
151+
Invalid action masking is supported during both training and inference. At each decision step, the environment provides a binary action mask indicating executable actions. The policy restricts action selection accordingly, ensuring that only valid actions are sampled.
152+
153+
Custom Rollout Buffers
154+
----------------------
155+
156+
The algorithm employs specialized rollout buffers that store action masks alongside observations, actions, rewards, and value estimates. This allows masked actions to be correctly handled during policy optimization and advantage computation.

clips_executive/cx_docs/links.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
.. _Unified Planning Framework (UPF): https://unified-planning.readthedocs.io/en/latest/
1414
.. _z3 constraint solver: https://github.com/Z3Prover/z3/wiki
1515
.. _NEXTFLAP planner: https://github.com/aiplan4eu/up-nextflap
16-
.. _Gymnasium: https://gymnasium.farama.org/index.html#
1716
.. _sb3_contrib: https://sb3-contrib.readthedocs.io/en/master/index.html
1817
.. _MaskableActorCriticPolicy: https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html
1918

0 commit comments

Comments
 (0)