Skip to content

Conversation

@natinew77-creator
Copy link

Problem

Fixes #305

The CQLLearner.step() method hardcodes a check for 'learner_steps' key when determining whether to continue in the behavior cloning phase:

if 'learner_steps' not in counts:
    cur_step = 0
else:
    cur_step = counts['learner_steps']

However, self._counter.increment(steps=1) stores the count using the key determined by get_steps_key(), which is 'steps' by default (or '{prefix}_steps' when a prefix is set). This mismatch causes the behavior cloning phase to never end.

Solution

Use self._counter.get_steps_key() to dynamically retrieve the correct key:

steps_key = self._counter.get_steps_key()
cur_step = counts.get(steps_key, 0)

This ensures proper transition from behavior cloning to CQL training regardless of how the counter is configured.

This PR addresses issue google-deepmind#297 by updating the README to:

1. Add a prominent warning that the PyPI package may be outdated
2. Recommend installing from source as the primary method for running examples
3. Reorganize installation steps to emphasize source installation
4. Add troubleshooting note for import errors after pip install

The PyPI package (dm-acme) was last updated in February 2022, while the
GitHub repository has continued to evolve with new agents and features.
This mismatch causes import errors when users try to run the examples
after installing via pip.

Fixes google-deepmind#297
The CQLLearner.step() method was hardcoding the check for 'learner_steps'
key, but the counter stores step counts using the key from get_steps_key()
which varies based on the counter's prefix configuration.

This caused the behavior cloning phase to never end when the counter was
initialized without the 'learner' prefix (e.g., in run_cql_jax.py example).

The fix uses self._counter.get_steps_key() to dynamically retrieve the
correct key, ensuring proper transition from BC to CQL training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The program always executes behavior clone when running CQLLearner.

1 participant