fix backend listener by xutongNV · Pull Request #681 · NVIDIA/OSMO

xutongNV · 2026-03-11T17:10:29Z

Description

Issue #None

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Summary by CodeRabbit

Bug Fixes
- Improved time handling by switching internal timestamp parsing and timeout calculations to timezone-aware UTC. This makes container error timeout detection and related status reporting consistent and reliable across different system timezones, reducing false or inconsistent backend error markings.

coderabbitai · 2026-03-11T17:10:56Z

📝 Walkthrough

Walkthrough

Updated time handling to use timezone-aware UTC now() in runtime logic and corresponding tests: calculate_pod_status now computes time differences with UTC, and test helpers return UTC-aware datetimes.

Changes

Cohort / File(s)	Summary
Time Zone Adjustment (runtime) `src/operator/backend_listener.py`	Replaced naive local `datetime.now()` usage with `datetime.now(datetime.timezone.utc)` in `calculate_pod_status` for CreateContainerConfigError waiting/timeout computation.
Time Zone Adjustment (tests) `src/operator/tests/test_pod_status_calculator.py`	Updated `parse_time_string` to produce timezone-aware UTC datetimes (`utc_now = datetime.now(datetime.timezone.utc)`) and use it for "now", relative parsing, and fallback cases; docstring updated accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I hopped the clock from local to blue,
UTC now guides what I do,
One careful change, both test and core,
Time aligned — I twitched my whiskers more. 🕰️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and does not specify what aspect of the backend listener was fixed or what problem was addressed.	Use a more descriptive title that explains the specific fix, such as 'Use UTC timezone for pod status timeout calculations' or 'Fix timezone handling in backend listener status checks'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch xutongr/fic

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/operator/backend_listener.py (2)

933-941: ⚠️ Potential issue | 🔴 Critical

Fix timezone mismatch causing pipeline failure.

The change to use datetime.datetime.now(datetime.timezone.utc) is correct in intent, but condition.last_transition_time from the Kubernetes API may be timezone-naive, causing the subtraction at line 936 to fail with TypeError: can't subtract offset-naive and offset-aware datetimes.

You need to ensure last_transition_time is timezone-aware before comparison.

🐛 Proposed fix to handle timezone mismatch

                     if condition.type == 'Ready' and condition.status == 'False':
                         now = datetime.datetime.now(datetime.timezone.utc)
                         last_transition_time = condition.last_transition_time
                         if last_transition_time:
+                            # Ensure timezone-aware comparison
+                            if last_transition_time.tzinfo is None:
+                                last_transition_time = last_transition_time.replace(
+                                    tzinfo=datetime.timezone.utc)
                             time_diff = now - last_transition_time

Note: The same issue likely exists at lines 952-955 for the ContainerStatusUnknown handling, which uses the same pattern. Consider applying the same fix there for consistency and safety.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/operator/backend_listener.py` around lines 933 - 941,
condition.last_transition_time from the k8s API can be timezone-naive, causing
TypeError when subtracting from datetime.now(timezone.utc); update the check
around now = datetime.datetime.now(datetime.timezone.utc) and time_diff = now -
last_transition_time (the block that sets
task.TaskGroupStatus.FAILED_BACKEND_ERROR and exit_code =
task.ExitCode.FAILED_BACKEND_ERROR.value) to first coerce last_transition_time
to an aware timestamp (e.g., if last_transition_time.tzinfo is None then treat
it as UTC by attaching datetime.timezone.utc or convert it) before computing
time_diff, and apply the same fix in the ContainerStatusUnknown handling that
uses the same pattern so both comparisons are timezone-aware and safe.

952-960: ⚠️ Potential issue | 🟠 Major

Same timezone mismatch vulnerability exists here.

This code block has the identical pattern and will fail with the same TypeError if last_transition_time is timezone-naive. Apply the same timezone handling fix for consistency.

🐛 Proposed fix

                     if condition.type == 'Ready' and condition.status == 'False':
                         now = datetime.datetime.now(datetime.timezone.utc)
                         last_transition_time = condition.last_transition_time
                         if last_transition_time:
+                            # Ensure timezone-aware comparison
+                            if last_transition_time.tzinfo is None:
+                                last_transition_time = last_transition_time.replace(
+                                    tzinfo=datetime.timezone.utc)
                             time_diff = now - last_transition_time

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/operator/backend_listener.py` around lines 952 - 960, The same
timezone-naive vulnerability appears when computing time_diff using now =
datetime.datetime.now(datetime.timezone.utc) and condition.last_transition_time;
ensure last_transition_time is made timezone-aware before subtracting: check
condition.last_transition_time.tzinfo (or use hasattr) and if it's naive, set
its tzinfo to datetime.timezone.utc (e.g., via replace) so the subsequent
time_diff > datetime.timedelta(minutes=30) comparison is between two aware
datetimes; update the block that sets status =
task.TaskGroupStatus.FAILED_BACKEND_ERROR and exit_code =
task.ExitCode.FAILED_BACKEND_ERROR.value to use this normalized
last_transition_time.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/operator/backend_listener.py`:
- Around line 933-941: condition.last_transition_time from the k8s API can be
timezone-naive, causing TypeError when subtracting from
datetime.now(timezone.utc); update the check around now =
datetime.datetime.now(datetime.timezone.utc) and time_diff = now -
last_transition_time (the block that sets
task.TaskGroupStatus.FAILED_BACKEND_ERROR and exit_code =
task.ExitCode.FAILED_BACKEND_ERROR.value) to first coerce last_transition_time
to an aware timestamp (e.g., if last_transition_time.tzinfo is None then treat
it as UTC by attaching datetime.timezone.utc or convert it) before computing
time_diff, and apply the same fix in the ContainerStatusUnknown handling that
uses the same pattern so both comparisons are timezone-aware and safe.
- Around line 952-960: The same timezone-naive vulnerability appears when
computing time_diff using now = datetime.datetime.now(datetime.timezone.utc) and
condition.last_transition_time; ensure last_transition_time is made
timezone-aware before subtracting: check condition.last_transition_time.tzinfo
(or use hasattr) and if it's naive, set its tzinfo to datetime.timezone.utc
(e.g., via replace) so the subsequent time_diff > datetime.timedelta(minutes=30)
comparison is between two aware datetimes; update the block that sets status =
task.TaskGroupStatus.FAILED_BACKEND_ERROR and exit_code =
task.ExitCode.FAILED_BACKEND_ERROR.value to use this normalized
last_transition_time.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 044b4958-d17d-4d25-90f2-883927bd9d5d

📥 Commits

Reviewing files that changed from the base of the PR and between b53f053 and a380b4b.

📒 Files selected for processing (1)

src/operator/backend_listener.py

src/operator/backend_listener.py

ethany-nv

Please fix title and description

coderabbitai

🧹 Nitpick comments (1)

src/operator/tests/test_pod_status_calculator.py (1)
36-49: LGTM! Timezone-aware UTC datetime changes are consistent.

The refactoring correctly ensures all code paths return timezone-aware UTC datetimes, aligning with the production code changes in backend_listener.py.

One minor consideration: the fallback on line 49 silently returns utc_now for invalid format strings (e.g., "now-5d" or malformed input). Since this is a test helper, you might want to raise an error for unsupported formats to catch test configuration mistakes early.
💡 Optional: Add explicit error for unsupported formats
         elif parts.endswith('h'):
             hours = int(parts[:-1])
             return utc_now - datetime.timedelta(hours=hours)
-    return utc_now
+        else:
+            raise ValueError(f"Unsupported time format: {time_str}")
+    raise ValueError(f"Unsupported time format: {time_str}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/operator/tests/test_pod_status_calculator.py` around lines 36 - 49, The
helper parse_time_string currently falls back to returning utc_now for unknown
or malformed inputs; instead change it to raise a clear ValueError for
unsupported formats so test misconfiguration fails fast. In parse_time_string,
after handling 'now', 'now-<n>m', and 'now-<n>h' cases, replace the final return
utc_now with raising ValueError(f"Unsupported time string: {time_str}"), and
ensure any int conversion/parsing paths propagate errors (or convert parse
errors into the same ValueError) so inputs like "now-5d" or malformed strings
trigger a test failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/operator/tests/test_pod_status_calculator.py`:
- Around line 36-49: The helper parse_time_string currently falls back to
returning utc_now for unknown or malformed inputs; instead change it to raise a
clear ValueError for unsupported formats so test misconfiguration fails fast. In
parse_time_string, after handling 'now', 'now-<n>m', and 'now-<n>h' cases,
replace the final return utc_now with raising ValueError(f"Unsupported time
string: {time_str}"), and ensure any int conversion/parsing paths propagate
errors (or convert parse errors into the same ValueError) so inputs like
"now-5d" or malformed strings trigger a test failure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4b0e1931-4749-49a3-8702-0711632884ec

📥 Commits

Reviewing files that changed from the base of the PR and between a380b4b and 796ddf5.

📒 Files selected for processing (1)

src/operator/tests/test_pod_status_calculator.py

xutongNV requested a review from a team as a code owner March 11, 2026 17:10

xutongNV had a problem deploying to internal-ci March 11, 2026 17:10 — with GitHub Actions Error

fix backend listener

a380b4b

xutongNV force-pushed the xutongr/fic branch from c9e1ab3 to a380b4b Compare March 11, 2026 17:11

xutongNV had a problem deploying to internal-ci March 11, 2026 17:11 — with GitHub Actions Failure

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

ethany-nv reviewed Mar 11, 2026

View reviewed changes

src/operator/backend_listener.py Show resolved Hide resolved

ethany-nv reviewed Mar 11, 2026

View reviewed changes

lint

796ddf5

xutongNV temporarily deployed to internal-ci March 11, 2026 19:49 — with GitHub Actions Inactive

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

xutongNV enabled auto-merge (squash) March 11, 2026 20:02

cypres approved these changes Mar 11, 2026

View reviewed changes

vvnpn-nv approved these changes Mar 11, 2026

View reviewed changes

xutongNV merged commit a484225 into main Mar 11, 2026
9 checks passed

xutongNV deleted the xutongr/fic branch March 11, 2026 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix backend listener#681

fix backend listener#681
xutongNV merged 2 commits intomainfrom
xutongr/fic

xutongNV commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ethany-nv left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xutongNV commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ethany-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xutongNV commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading