Skip to content

Hard error on EarlyStopping() #159

@cirobr

Description

@cirobr

Cheers. When training a model with FluxTraining.fit!(learner, epochs) and an early stop condition is met, I am having a hard error that causes the Julia script to be teminated, which prevents execution of code lines placed after the fit! command. I believe this is unintended behavior, please kindly verify. Thanks in advance.

Code is as follows (early stop parameters purposedly set to small numbers):

ms = [accuracy,
      t.Metric(LibML.IoU, device=gpu, name="IoU"),
]

cbs = [ToGPU(),
       StopOnNaNLoss(),
       Checkpointer(modelsfolder),
       EarlyStopping(1),
       EarlyStopping(NumberSinceBest(1)),
       EarlyStopping(Threshold(0.5)),
       Metrics(ms...),
       LogMetrics(TensorBoardBackend(tbfolder)),
       ]

learner = t.Learner(model, lossFunction;
                    data=(trainset, validset),
                    optimizer=modelOptimizer,
                    callbacks=cbs,
)

epochs = 100
FluxTraining.fit!(learner, epochs)
@info "project finished"

Error message as follows:

ERROR: CancelFittingException("Stop triggered by EarlyStopping.Patience(1) stopping criterion. ")
Stacktrace:
 [1] on(::FluxTraining.Events.EpochEnd, phase::ValidationPhase, cb::EarlyStopping, learner::FluxTraining.Protected{Learner})
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/callbacks/earlystopping.jl:72
 [2] _on(e::FluxTraining.Events.EpochEnd, p::ValidationPhase, cb::EarlyStopping, learner::Learner)
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/callbacks/callback.jl:254
 [3] handle(runner::FluxTraining.LinearRunner, event::FluxTraining.Events.EpochEnd, phase::ValidationPhase, learner::Learner)
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/callbacks/execution.jl:12
 [4] (::FluxTraining.var"#handlefn#81"{Learner, ValidationPhase})(e::FluxTraining.Events.EpochEnd)
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/training.jl:102
 [5] runepoch(epochfn::FluxTraining.var"#71#72"{…}, learner::Learner, phase::ValidationPhase)
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/training.jl:106
 [6] epoch!
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/training.jl:22 [inlined]
 [7] fit!(learner::Learner, nepochs::Int64, ::Tuple{MLUtils.DataLoader{…}, MLUtils.DataLoader{…}})
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/training.jl:169
 [8] fit!(learner::Learner, nepochs::Int64)
   @ FluxTraining ~/.julia/packages/FluxTraining/xCOPx/src/training.jl:174
 [9] top-level scope
   @ ~/projects/pascalvoc-segmentation/8-training.jl:123
Some type information was truncated. Use `show(err)` to see complete types.

julia> 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions