HQ seencheck panics from query timeout

## Problem

If the seencheck in preprocessor returns an error of any kind, there is a panic:

https://github.com/internetarchive/Zeno/blob/2abbcd32abdf62cab1eb0031b27f9b13c48c41c7/internal/pkg/preprocessor/preprocessor.go#L140-L142

This behavior makes sense to guarantee that fundamental problems are not occuring during seencheck. However, Zeno will panic and crash if the HQ seencheck request times out which can happen if HQ is operating with degraded performance. Because HQ running slower does not threaten the validity of the data generated by Zeno, there should be a way to avoid panicking from this specific error to continue crawling.

## Solution

A couple of ideas:

1. changing default behavior to retry if there is a timeout
2. increasing the timeout value (or allowing it be set as a runtime flag)
3. logging the timeout but not returning an error to prevent panic (this could be enabled/disabled via runtime flag)

Corresponding with the above solutions, there should be improved prometheus reporting on the number of seencheck attempts that fail, are retried, or exceed a certain threshold.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HQ seencheck panics from query timeout #516

Problem

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if err := preprocess(workerID, seed); err != nil {
	panic(fmt.Sprintf("preprocess failed with err: %v", err))
	}

HQ seencheck panics from query timeout #516

Description

Problem

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions