Sanitize NaN/Infinity in metrics JSON before parsing#139
Sanitize NaN/Infinity in metrics JSON before parsing#139abhijeet-dhumal merged 1 commit intoopendatahub-io:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR adds a JSON sanitization helper Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Security and Code Quality Findings
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/rhai/progression/progression.go (1)
281-284:⚠️ Potential issue | 🟠 MajorMissing
io.LimitReader— potential memory exhaustion (CWE-400).Unbounded
io.ReadAllon an HTTP response body can exhaust memory if the metrics endpoint returns unexpectedly large payloads, whether maliciously or due to misconfiguration.Proposed fix
- body, err := io.ReadAll(resp.Body) + const maxResponseSize = 1 << 20 // 1 MiB limit for metrics JSON + body, err := io.ReadAll(io.LimitReader(resp.Body, maxResponseSize))As per coding guidelines: "Use io.LimitReader for HTTP response bodies (prevent memory exhaustion)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rhai/progression/progression.go` around lines 281 - 284, Replace the unbounded io.ReadAll(resp.Body) call with a limited reader to prevent memory exhaustion: define a constant like maxResponseBodySize (e.g., 1<<20 for 1MiB) and call io.ReadAll(io.LimitReader(resp.Body, maxResponseBodySize)) instead of io.ReadAll(resp.Body); after reading, if len(body) == maxResponseBodySize return an error indicating the response is too large. Update the code around the resp.Body read (the variables resp.Body and body and the existing error handling) to use the LimitReader and the new size constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rhai/progression/progression_test.go`:
- Around line 618-623: Add a new failing unit test that exercises the regex bug
by putting ": NaN" inside a string and ensuring only the actual NaN value is
replaced; specifically add a test case alongside the existing table entry (the
one named "NaN in string value not replaced") with name like "NaN in string with
colon should not be replaced", input `{"message": "Error: NaN detected",
"grad_norm": NaN}` and expected output `{"message": "Error: NaN detected",
"grad_norm": null}` so the test highlights the regex failing to respect string
boundaries in the code exercised by the progression tests.
In `@pkg/rhai/progression/progression.go`:
- Around line 296-306: The current sanitizeJSON uses regexes (nanPattern,
infPattern) that can match inside JSON strings; update sanitizeJSON to only
replace unquoted tokens by scanning the input and skipping quoted string regions
(handle escapes) and replacing bare NaN or Infinity tokens with null; keep the
nanPattern/infPattern as simple literals or remove them and perform the
replacement in the scanner logic inside sanitizeJSON so only unquoted
occurrences (e.g., the grad_norm: NaN token) are changed while string values
like "Error: NaN detected" remain untouched.
---
Outside diff comments:
In `@pkg/rhai/progression/progression.go`:
- Around line 281-284: Replace the unbounded io.ReadAll(resp.Body) call with a
limited reader to prevent memory exhaustion: define a constant like
maxResponseBodySize (e.g., 1<<20 for 1MiB) and call
io.ReadAll(io.LimitReader(resp.Body, maxResponseBodySize)) instead of
io.ReadAll(resp.Body); after reading, if len(body) == maxResponseBodySize return
an error indicating the response is too large. Update the code around the
resp.Body read (the variables resp.Body and body and the existing error
handling) to use the LimitReader and the new size constant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: d20a1720-b647-4924-89b7-833c5c9fbd63
📒 Files selected for processing (2)
pkg/rhai/progression/progression.gopkg/rhai/progression/progression_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/rhai/progression/progression_test.go (1)
618-622: Missing edge case: Infinity in string value.Test coverage includes NaN-in-string but not Infinity-in-string. For symmetry with the regex patterns.
Optional test case
+ { + name: "Infinity in string value not replaced", + input: `{"message": "limit: Infinity reached", "value": Infinity}`, + want: `{"message": "limit: Infinity reached", "value": null}`, + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rhai/progression/progression_test.go` around lines 618 - 622, Add a parallel test case for Infinity-in-string symmetry with the NaN case: locate the table entry with name "NaN in string value not replaced" in progression_test.go and add a new case named "Infinity in string value not replaced" with input `{"message": "Error: Infinity detected", "grad_norm": Infinity}` and expected want `{"message": "Error: Infinity detected", "grad_norm": null}` so the regex/path that replaces numeric Infinity is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/rhai/progression/progression_test.go`:
- Around line 618-622: Add a parallel test case for Infinity-in-string symmetry
with the NaN case: locate the table entry with name "NaN in string value not
replaced" in progression_test.go and add a new case named "Infinity in string
value not replaced" with input `{"message": "Error: Infinity detected",
"grad_norm": Infinity}` and expected want `{"message": "Error: Infinity
detected", "grad_norm": null}` so the regex/path that replaces numeric Infinity
is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5a299bab-38ef-4d62-826a-da2f25081aab
📒 Files selected for processing (2)
pkg/rhai/progression/progression.gopkg/rhai/progression/progression_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/rhai/progression/progression.go
154e00e to
d5100d1
Compare
Python's json.dumps() allows NaN and Infinity by default, but these are not valid JSON per the spec. When training metrics contain these values (e.g., grad_norm when loss=0, eval_loss with bf16 on ROCm), Go's json.Unmarshal fails, causing the operator to never capture training progress. Add sanitizeJSON() to replace NaN/Infinity with null before parsing in both PollTrainingProgress and CaptureMetricsFromTerminationMessage. The regex patterns are anchored to trailing JSON delimiters to avoid corrupting string values. Ref: RHOAIENG-56898 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python's json.dumps() allows NaN and Infinity by default, but these are not valid JSON per the spec. When training metrics contain these values (e.g., grad_norm when loss=0, eval_loss with bf16 on ROCm), Go's json.Unmarshal fails, causing the operator to never capture training progress.
Add sanitizeJSON() to replace NaN/Infinity with null before parsing in both PollTrainingProgress and CaptureMetricsFromTerminationMessage.
Ref: RHOAIENG-56898
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist:
Summary by CodeRabbit
Bug Fixes
Tests