Skip to content

HQ seencheck panics from query timeout #516

@willmhowes

Description

@willmhowes

Problem

If the seencheck in preprocessor returns an error of any kind, there is a panic:

if err := preprocess(workerID, seed); err != nil {
panic(fmt.Sprintf("preprocess failed with err: %v", err))
}

This behavior makes sense to guarantee that fundamental problems are not occuring during seencheck. However, Zeno will panic and crash if the HQ seencheck request times out which can happen if HQ is operating with degraded performance. Because HQ running slower does not threaten the validity of the data generated by Zeno, there should be a way to avoid panicking from this specific error to continue crawling.

Solution

A couple of ideas:

  1. changing default behavior to retry if there is a timeout
  2. increasing the timeout value (or allowing it be set as a runtime flag)
  3. logging the timeout but not returning an error to prevent panic (this could be enabled/disabled via runtime flag)

Corresponding with the above solutions, there should be improved prometheus reporting on the number of seencheck attempts that fail, are retried, or exceed a certain threshold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions