Skip to content

fix(exploration): typo self.ago -> self.algo in sample_func_logits#117

Open
jmtoepperwien wants to merge 1 commit into
automl:mainfrom
jmtoepperwien:fix/typo-sample-nondeterministic-logprobs
Open

fix(exploration): typo self.ago -> self.algo in sample_func_logits#117
jmtoepperwien wants to merge 1 commit into
automl:mainfrom
jmtoepperwien:fix/typo-sample-nondeterministic-logprobs

Conversation

@jmtoepperwien
Copy link
Copy Markdown
Contributor

Summary

In MightyExplorationPolicy.sample_func_logits, the sac flag passed to sample_nondeterministic_logprobs used self.ago instead of self.algo.

# Before (broken)
log_prob = sample_nondeterministic_logprobs(
    z=out[1], mean=out[2], log_std=out[3], sac=self.ago == "sac"
)

# After (fixed)
log_prob = sample_nondeterministic_logprobs(
    z=out[1], mean=out[2], log_std=out[3], sac=self.algo == "sac"
)

Impact

Because self.ago does not exist, the expression always evaluated to False, so the tanh-squash log-probability correction was never applied for SAC in this code path — even though SAC requires it. This caused old log-probs stored in the rollout buffer to be plain Gaussian log-probs without the tanh correction, introducing a systematic bias in the PPO importance-sampling ratio.

In MightyExplorationPolicy.sample_func_logits, the sac flag passed to
sample_nondeterministic_logprobs used self.ago instead of self.algo.
Because self.ago does not exist, this always resolved to False, meaning
the tanh-squash correction was never applied for SAC in this code path.
@jmtoepperwien jmtoepperwien force-pushed the fix/typo-sample-nondeterministic-logprobs branch from ec3778b to 2759cfb Compare May 28, 2026 08:12
@jmtoepperwien jmtoepperwien requested a review from amsks May 28, 2026 08:45
@jmtoepperwien jmtoepperwien added the bug Something isn't working label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant