Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion watchdog/external_endpoint.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,28 @@ import (
"github.com/TwiN/gatus/v5/config/endpoint"
"github.com/TwiN/gatus/v5/metrics"
"github.com/TwiN/gatus/v5/storage/store"
"github.com/TwiN/gatus/v5/storage/store/common/paging"
"github.com/TwiN/logr"
)

func monitorExternalEndpointHeartbeat(ee *endpoint.ExternalEndpoint, cfg *config.Config, extraLabels []string, ctx context.Context) {
ticker := time.NewTicker(ee.Heartbeat.Interval)
timeToNextCheck := ee.Heartbeat.Interval

lastStatus, err := store.Get().GetEndpointStatusByKey(ee.Key(), paging.NewEndpointStatusParams().WithResults(0, 1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an appropriate way to do this here with this store method and results size 1?

if err != nil {
logr.Errorf("[watchdog.monitorExternalEndpointHeartbeat] Failed to get last endpoint status: %s", err.Error())
} else if lastStatus != nil {
if results := lastStatus.Results; len(results) > 0 {
timeSinceLastResult := time.Since(results[0].Timestamp)
logr.Debugf("[watchdog.monitorExternalEndpointHeartbeat] Time since last result: '%s'", timeSinceLastResult)
if timeSinceLastResult < timeToNextCheck {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question still open for me is which way to handle what should happen if the time since the last result is longer than the configured heartbeat interval. Currently it will wait for the duration of the heartbeat interval before the first check.

Alternatively the behaviour could be that a failure is created immediately. However I suspect this could often lead to false positives, especially for very short intervals.

I think I prefer the currently implemented behaviour. Just wanted to throw the different possible approaches out there!

timeToNextCheck -= timeSinceLastResult
}
}
}
logr.Debugf("[watchdog.monitorExternalEndpointHeartbeat] Waiting for duration=%s before checking heartbeat for group=%s endpoint=%s (key=%s)", timeToNextCheck, ee.Group, ee.Name, ee.Key())

ticker := time.NewTicker(timeToNextCheck)
defer ticker.Stop()
for {
select {
Expand All @@ -21,6 +38,7 @@ func monitorExternalEndpointHeartbeat(ee *endpoint.ExternalEndpoint, cfg *config
return
case <-ticker.C:
executeExternalEndpointHeartbeat(ee, cfg, extraLabels)
ticker.Reset(ee.Heartbeat.Interval)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is considered a "hack". I'd love to learn how to do this in a more clean looking way.

}
}
}
Expand Down