Problem
If the seencheck in preprocessor returns an error of any kind, there is a panic:
|
if err := preprocess(workerID, seed); err != nil { |
|
panic(fmt.Sprintf("preprocess failed with err: %v", err)) |
|
} |
This behavior makes sense to guarantee that fundamental problems are not occuring during seencheck. However, Zeno will panic and crash if the HQ seencheck request times out which can happen if HQ is operating with degraded performance. Because HQ running slower does not threaten the validity of the data generated by Zeno, there should be a way to avoid panicking from this specific error to continue crawling.
Solution
A couple of ideas:
- changing default behavior to retry if there is a timeout
- increasing the timeout value (or allowing it be set as a runtime flag)
- logging the timeout but not returning an error to prevent panic (this could be enabled/disabled via runtime flag)
Corresponding with the above solutions, there should be improved prometheus reporting on the number of seencheck attempts that fail, are retried, or exceed a certain threshold.
Problem
If the seencheck in preprocessor returns an error of any kind, there is a panic:
Zeno/internal/pkg/preprocessor/preprocessor.go
Lines 140 to 142 in 2abbcd3
This behavior makes sense to guarantee that fundamental problems are not occuring during seencheck. However, Zeno will panic and crash if the HQ seencheck request times out which can happen if HQ is operating with degraded performance. Because HQ running slower does not threaten the validity of the data generated by Zeno, there should be a way to avoid panicking from this specific error to continue crawling.
Solution
A couple of ideas:
Corresponding with the above solutions, there should be improved prometheus reporting on the number of seencheck attempts that fail, are retried, or exceed a certain threshold.