Description
🐛 Bug
I am developing a custom Feature Extractor Type (based on DeepSets) for SB3 and want to train + optimize it with sb3_zoo. For it I add the following to a custom config.py file:
gym.register(
"env-name",
class,
kwargs)
hyperparams = {
"env-name": dict(
policy="MultiInputPolicy",
policy_kwargs={
"features_extractor_class": FeatureExtractorSet,
"features_extractor_kwargs": {
"features_dim": 10
}
}
)
}
This works well with the normal train.py (Arguments: '--algo', 'a2c', '--conf-file', 'path/to/config.py', '--gym-packages', 'path.to.config', '--n-timesteps', '100', '--device', 'cpu', '-P', '--env', 'env-name', ...)
When adding '-optimize' the training fails (actions contain NaN as I encode invalid observations that are discarded by the custom FeatureExtractorSet with NaN). Closer investigation shows that the objective
function updated self._hyperparams
which contains the sub-dict {'policy_kwargs': {'feature_extractor_class': FeatureExtractorSet}}
with the sampled hyper-parameters that also set other policy_kwargs then feature_extractor_class.
I would suggest replacing
rl-baselines3-zoo/rl_zoo3/exp_manager.py
Line 741 in 28dc228
To Reproduce
No response
Relevant log output / Error message
No response
System Info
- OS: Linux-5.15.0-91-generic-x86_64-with-glibc2.31 # 101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023
- Python: 3.9.18
- Stable-Baselines3: 2.2.1
- PyTorch: 2.1.1+cu121
- GPU Enabled: True
- Numpy: 1.26.2
- Cloudpickle: 3.0.0
- Gymnasium: 0.29.1
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo documentation
- I have provided a minimal and working example to reproduce the bug
- I've used the markdown code blocks for both code and stack traces.