-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
For the reason of #17 I turned off the sanity check in the Trainer code to avoid doing validation.
The training finished as I got some tfevent log files that look correct, but
- Validation failed as expected because of Training phase sanity check fails by loading "../../data/imagenet/val" as an image #17
- I find nothing in my checkpoint path(It's an existing directory on my machine, set here: https://github.com/TencentARC/Open-MAGVIT2/blob/main/configs/imagenet_lfqgan_128_B.yaml#L15). The total step is around 16000.
The potential causes I can think of are
- validation failed and checkpoint saving is part of validation or depends on it
- Total step not enough for a checkpoint
- I'm not configuring the checkpoint save path correctly
Has anyone encountered this error?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels