Describe the bug
NemoRL pyproject.toml has transformers==5.3.0 and accelerate>=0.26. However this version of transformers requires accelerate>=1.1.0.
Caught this crash because of it:
�[36m(XXXWorker pid=1783492)�[0m File "/opt/ray_venvs/nemo_rl.models.policy.workers.XXX/lib/python3.12/site-packages/accelerate/big_modeling.py", line 135, in register_empty_parameter�[32m [repeated 6x across cluster]�[0m
�[36m(XXXWorker pid=1783492)�[0m module._parameters[name] = param_cls(module._parameters[name].to(device), **kwargs)�[32m [repeated 6x across cluster]�[0m
�[36m(XXXWorker pid=1783492)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^�[32m [repeated 6x across cluster]�[0m
�[36m(XXXWorker pid=1783492)�[0m TypeError: Parameter.__new__() got an unexpected keyword argument '_is_hf_initialized'�[32m [repeated 6x across cluster]�[0m
�[36m(XXXWorker pid=1783492)�[0m [rank6]:[W406 20:06:50.993913153 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())�[32m [repeated 6x across cluster]�[0m
Describe the bug
NemoRL
pyproject.tomlhastransformers==5.3.0andaccelerate>=0.26. However this version of transformers requiresaccelerate>=1.1.0.Caught this crash because of it: