Skip to content

训练log 打印多次,跟GPU数量有关,代码没写好?保存模型难到也要保存很多? #383

@cqray1990

Description

@cqray1990
        for batch in train_data_loader:
            self.update_learning_rate(optimizer, epoch, self.steps)

            self.logger.report_time("Data loading")

            if self.experiment.validation and\
                    self.steps % self.experiment.validation.interval == 0 and \
                    self.steps > self.experiment.validation.exempt:
                self.validate(validation_loaders, model, epoch, self.steps)
            self.logger.report_time('Validating ')
            if self.logger.verbose:
                torch.cuda.synchronize()

            self.train_step(model, optimizer, batch,
                            epoch=epoch, step=self.steps)
            if self.logger.verbose:
                torch.cuda.synchronize()
            self.logger.report_time('Forwarding ')

            self.model_saver.maybe_save_model(
                model, epoch, self.steps, self.logger)

            self.steps += 1
            self.logger.report_eta(self.steps, self.total, epoch)

        epoch += 1
        if epoch > self.experiment.train.epochs:
            self.model_saver.save_checkpoint(model, 'final')
            if self.experiment.validation:
                self.validate(validation_loaders, model, epoch, self.steps)
            self.logger.info('Training done')
            break
        iter_delta = 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions