Skip to content

Pingdom duplicate checks creation #293

Open
@artemlive

Description

@artemlive

Hello, the community!
I have found one major issue with Pingdom API integration. At some point, I discovered that my Pingdom monitoring slots have been significantly reduced. I checked my Pingdom account and found so many checks with the same name.
Here is the screenshot with a duplicated check example from Pingdom UI: https://prnt.sc/vbyhjm
During the investigation, I found such lines in the IMC log:

time="2020-11-01T23:40:07Z" level=info msg="Error received while listing checks: invalid character '<' looking for beginning of value"
time="2020-11-01T23:40:07Z" level=info msg="Creating Monitor: mycompany-software-master-xxx.com"

It means that the API call has been failed and IMC re-created existed monitor, because of some API error (it may be either an API service problem or transport problem, etc)
One more important thing that Pingdom allows us to create multiple checks with the same name.
Here is the method which returns all existing checks from Pingdom without error checking because of MonitorService interface implementation doesn't have an error in the returning parameters:

func (service *PingdomMonitorService) GetAll() []models.Monitor {
var monitors []models.Monitor
checks, err := service.client.Checks.List()
if err != nil {
log.Println("Error received while listing checks: ", err.Error())
return nil
}
for _, mon := range checks {
newMon := models.Monitor{
URL: mon.Hostname,
ID: fmt.Sprintf("%v", mon.ID),
Name: mon.Name,
}
monitors = append(monitors, newMon)
}
return monitors
}

Then during processing the response from the previous method, IMC generates the error if monitor wasn't found:
return match, fmt.Errorf("Unable to locate monitor with name %v", name)

But in the ReconcileEndpointMonitor we don't check the error itself, so we are able to create a duplicated monitor:

monitor := findMonitorByName(r.monitorServices[index], monitorName)

I decided to implement the method findMonitorByName without ignoring an error. It may help us to avoid duplicate checks creation. I've refactored error raising algorithm, so it will track only real errors with API calls. If monitor didn't exist, it won't be marked as an error.

As I mentioned before, there is one more big issue, which blocks me to correctly implement the GetAll with an error returning.
GetAll is the part of MonitorService that has the method prototype without the error in return parameters. Of course, I could change the interface, but then I won't be able to guarantee that all of my changes to other monitors would be fine, because I don't have enough expertise with other solutions than Pingdom and I found the same way avoiding of GetAll function usage in the uptimerobot monitor.
Here is my PR #294 (Which is still in work for that moment)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't workingkind/help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions