fix(nvd): retry transient body-read errors and 524 responses by DmitriyLewen · Pull Request #459 · aquasecurity/vuln-list-update

DmitriyLewen · 2026-06-18T13:31:32Z

Description

The NVD updater could fail an entire run on transient errors that were not covered by the retry loop:

The response body was read outside fetchURL's retry loop, so an HTTP/2 INTERNAL_ERROR (the stream aborted mid-body after a 200 OK) was fatal instead of retried.
Cloudflare's non-standard 524 (origin timeout) fell into the default branch and returned an "unexpected status code" error.

This PR reads the body inside the retry loop and adds 524 to the retryable statuses.
A single request attempt is extracted into a doRequest helper that closes the response body via defer.

Context

On 2026-06-17 NVD rolled out a schema/metric expansion (SSVC data from CISA-ADP in addition to CVSS, plus an "affected" block per the CVE Record Format).
Every updated record received a new Last Modified date, which means a large share of the database was re-stamped in a single day.

As a result, the updater's per-day window started returning almost the whole database at once.
Compare a normal day to the affected day.

Successful run (run 27657694905):

2026/06/17 00:39:15 INFO Fetched NVD entries total=169 start_index=0

Failing run (run 27744143884):

2026/06/18 07:38:54 INFO Fetched NVD entries total=261624 start_index=0

Total number of records currently stored:

➜  api find . -type f | wc -l
  357760

So a single day's window covered 261624 / 357760 ≈ 73% of all records — roughly 131 pages of 2000 entries.
Fetching that many pages takes 40–60 minutes, runs into the NVD rate limit, and eventually hits a transient server error (503, 524, or an HTTP/2 INTERNAL_ERROR).
Before this change such a transient error failed the whole run and reverted it, so lastUpdatedDate never advanced and the next run pulled an even larger window — a self-sustaining loop.

This is not a regression on our side; it is the fallout of the NVD update.
Making the transient errors retryable lets a run survive the spike and finish, after which the window shrinks back to normal.

Tests

TestUpdate — added two 524 cases: a happy path that succeeds after one reconnect (retry: 1), and a sad path that exhausts retries (confirming 524 is now retried instead of returning "unexpected status code").
TestUpdate_RetryOnBodyReadError — hijacks the connection to send a 200 OK with a truncated body (declared Content-Length larger than the bytes sent, then closes the connection), forcing an io.ReadAll error after the status code, and verifies the next attempt succeeds.

The NVD updater could fail an entire run on transient errors that were not covered by the retry loop: - The response body was read outside `fetchURL`'s retry loop, so an HTTP/2 `INTERNAL_ERROR` (stream aborted mid-body after a 200 OK) was fatal instead of retried. - Cloudflare's non-standard 524 (origin timeout) fell into the default branch and returned an "unexpected status code" error. Read the body inside the retry loop and add 524 to the retryable statuses. A single request attempt is extracted into `doRequest`, which closes the response body via `defer`.

DmitriyLewen · 2026-06-18T15:04:36Z

test run in fork - https://github.com/DmitriyLewen/vuln-list-update/actions/runs/27763761063/job/82144770085

knqyf263 · 2026-06-19T08:30:38Z

+// doRequest performs a single NVD request attempt and closes the response body
+// before returning. It returns the response body on success. A nil body with a
+// nil error means the request should be retried after waiting for `wait`.
+func (u Updater) doRequest(c *http.Client, url string, attempt int) (body []byte, wait time.Duration, err error) {


nit: passing attempt into doRequest leaks the backoff responsibility into a per-attempt function. attempt is loop state owned by fetchURL, and it's only used here to compute the linear backoff time.Duration(attempt) * time.Second.

The only thing doRequest really needs to hand back to the loop is the server-mandated wait (the rate-limit Retry-After); the backoff itself is the loop's policy. One way to express that is to drop attempt and signal "retryable" via a wrapped error that also carries the optional retry-after, so fetchURL owns the backoff:

// errRetry signals fetchURL to retry. retryAfter is the server-mandated // minimum wait (rate limit); zero means apply the caller's backoff. type errRetry struct{ retryAfter time.Duration } func (errRetry) Error() string { return "retryable" } func (u Updater) fetchURL(url string) ([]byte, error) { var c http.Client for i := 0; i <= u.retry; i++ { body, err := u.doRequest(&c, url) var re errRetry switch { case err == nil: return body, nil case errors.As(err, &re): wait := re.retryAfter if wait == 0 { wait = time.Duration(i) * time.Second } time.Sleep(wait) default: return nil, err } } return nil, xerrors.Errorf("unable to fetch url. Retry limit exceeded.") }

doRequest then returns ([]byte, error): errRetry{retryAfter: ra} for 403/429, errRetry{} for 503/524/timeout/network/body-read errors, the body for 200, and a plain error for the unexpected-status case. As a bonus this also removes the body != nil retry sentinel, since success is now distinguished by err == nil.

That said, adding a dedicated error type might be overkill for this single spot, so I'll leave the call to you. Not blocking.

Good idea — that's more accurate.

Updated in 38049fd.

New test run from fork - https://github.com/DmitriyLewen/vuln-list-update/actions/runs/27869235304/job/82478367450

Replace the implicit "nil body + nil error means retry" contract and the attempt parameter passed into doRequest with a dedicated errRetry type. doRequest now returns ([]byte, error): the body on success, errRetry on a retryable condition (with retryAfter for rate limits), and a plain error otherwise. fetchURL owns the backoff policy.

DmitriyLewen marked this pull request as ready for review June 18, 2026 15:04

knqyf263 previously approved these changes Jun 19, 2026

View reviewed changes

DmitriyLewen dismissed knqyf263’s stale review via 38049fd June 19, 2026 12:35

knqyf263 approved these changes Jun 22, 2026

View reviewed changes

DmitriyLewen merged commit 0839fbb into aquasecurity:main Jun 22, 2026
3 checks passed

DmitriyLewen deleted the fix/nvd-retry-body-read-and-524 branch June 22, 2026 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(nvd): retry transient body-read errors and 524 responses#459

fix(nvd): retry transient body-read errors and 524 responses#459
DmitriyLewen merged 2 commits into
aquasecurity:mainfrom
DmitriyLewen:fix/nvd-retry-body-read-and-524

DmitriyLewen commented Jun 18, 2026

Uh oh!

DmitriyLewen commented Jun 18, 2026

Uh oh!

knqyf263 Jun 19, 2026

Uh oh!

DmitriyLewen Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DmitriyLewen commented Jun 18, 2026

Description

Context

Tests

Uh oh!

DmitriyLewen commented Jun 18, 2026

Uh oh!

knqyf263 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

DmitriyLewen Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DmitriyLewen Jun 19, 2026 •

edited

Loading