fix(nimbus): isolate and stream warm_api_caches so v6/v7/v8 actually warm#15622
Merged
Conversation
…warm
Because
* The hourly warm_api_caches Celery task OOM-kills the worker pod
(~3.5 GB anon-rss) about 7 seconds after logging "Warmed v5:csv",
meaning v6/v7/v8 are never pre-warmed and only get populated by
ad-hoc cache-miss requests
* When the cache is cold, real requests time out at gunicorn's worker
limit and surface in Sentry as SystemExit:1 on
/api/v{6,7,8}/experiments/
* The single-task design propagates one endpoint's failure to all
later endpoints in the loop, and materializing the full Python data
list plus the full JSON bytestring at once was the source of the
memory blowup for the heavier endpoints
This commit
* Splits warm_api_caches into a thin dispatcher that fires one
warm_api_cache_endpoint sub-task per endpoint via .delay(), so a
single endpoint failing (OOM, timeout, etc.) no longer prevents the
others from warming
* Streams the warm path: warm_api_cache now iterates the queryset
via .iterator(chunk_size=N) and writes per-experiment rendered JSON
bytes to a BytesIO buffer, eliminating the materialized list of
serialized dicts that previously co-existed with the final bytes
* Adds per-endpoint markus metrics
(warm_api_cache_endpoint.<key>.started/completed/failed) so an
endpoint silently not warming is alertable
* Updates tests to cover the dispatcher pattern, the failure-isolation
semantics, and the unknown-key path; enables eager Celery mode in
the affected test classes so existing assertions still hold
Fixes #15621
…amArray Because * The previous BytesIO + manual `[`/`,`/`]` framing in stream_render_queryset was hand-rolled rather than the idiomatic stdlib pattern for the "stream a large JSON array" problem * The well-documented stdlib idiom — wrap a generator-of-dicts in a list-subclass that lies about __len__ and pipe it through json.JSONEncoder.iterencode — produces the same byte output with the same memory characteristics in a more recognizable shape This commit * Replaces the BytesIO buffer in stream_render_queryset with a _StreamArray(list) wrapper around the per-item serializer-data generator, fed to json.JSONEncoder.iterencode so the encoder emits the array framing itself * Peeks the underlying iterator at construction so the empty case reports __len__ == 0, which makes iterencode short-circuit to `[]` instead of dropping the opening bracket (the standard pure-Python _iterencode_list emits the opening `[` only inside its for-body) * Builds the encoder with DRFJSONEncoder + SHORT_SEPARATORS pulled from rest_framework so the streamed bytes are byte-identical to JSONRenderer().render(serializer(qs, many=True).data) — pinned by the new test_stream_render_queryset_matches_drf_json_renderer test * Local tracemalloc profile (500 factory recipes against the v8 viewset queryset+serializer) shows peak RSS drops from 71.75 MB on the pre-PR full-materialize path to 49.62 MB on the streamed path (-31%), with byte-identical output across all variants tested
b183278 to
79ce9b0
Compare
Because * The streaming refactor stripped the pre-existing docstrings on get_api_cache_key, warm_api_cache, and CachedListMixin, which weren't part of the actual change * _StreamArray, _drf_compatible_encoder, and stream_render_queryset are not obvious from their bodies alone - the StreamArray __len__ trick in particular relies on a documented stdlib quirk that's worth calling out This commit * Restores the three pre-existing docstrings unchanged * Adds brief docstrings on the three new symbols explaining what they do and why the shape is the way it is
mikewilli
approved these changes
May 14, 2026
Comment on lines
82
to
+88
| if sort_key is not None: | ||
| qs = sorted(qs, key=sort_key, reverse=True) | ||
| data = serializer_class(qs, many=True).data | ||
| rendered = renderer.render(data) | ||
| qs = sorted(queryset.all(), key=sort_key, reverse=True) | ||
| data = serializer_class(qs, many=True).data | ||
| rendered = renderer.render(data) | ||
| else: | ||
| rendered = stream_render_queryset(queryset, serializer_class) | ||
|
|
Contributor
There was a problem hiding this comment.
So will the v5:csv still fail since it has a sort_key defined? But it just won't kill the tasks for the other endpoints now?
Collaborator
Author
There was a problem hiding this comment.
No v5:csv wasn't failing, it was just succeeding before proceeding to the next ones which were then OOMing. It should still proceed healthily.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Because
warm_api_cachesCelery task OOM-kills the worker (~3.5 GB) seven seconds after warming v5:csv, so v6/v7/v8 are never pre-warmedSystemExit:1on/api/v{6,7,8}/experiments/in Sentry)This commit
warm_api_cachesinto a dispatcher that fires one sub-task per endpoint, so one failure can't tear down the otherswarm_api_cacheviaqs.iterator()+JSONEncoder.iterencodeso neither the full dict tree nor the full JSON string co-exist in memory. Output byte-identical to the non-streamed render, pinned by a new testLocal profile (500 factory experiments, v8 viewset): peak RSS 71.75 → 49.62 MB (−31%).
Fixes #15621