Skip to content

Actor Loss Explodes to 7e4 with Rapidly Dropping Q_discriminator (-200+) and Overpowered Discriminator #18

@Younch-15

Description

@Younch-15

Problem Description
During training, critical training instability issues are observed:
Q_discriminator drops rapidly from -40 to -200+ and finally to around -300;
Total actor loss explodes to 7e4, far beyond the reasonable range;
The discriminator maintains extremely low loss but achieves over 96% accuracy in distinguishing expert samples from policy-generated samples, leading to lopsided training where the discriminator "dominates" the policy and prevents it from learning expert-like behaviors.
Core Questions
Why does Q_discriminator keep dropping rapidly and cause actor loss explosion even with extremely low discriminator learning rate?
Is it necessary to further constrain the discriminator's capability or adjust the actor loss weighting logic (set cfg.train.scale_reg to False)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions