Skip to content

Anirud/metrics improvements#606

Open
anirudTT wants to merge 7 commits intodevfrom
anirud/metrics-improvements
Open

Anirud/metrics improvements#606
anirudTT wants to merge 7 commits intodevfrom
anirud/metrics-improvements

Conversation

@anirudTT
Copy link
Collaborator

No description provided.

- Introduced a new `InferenceMetricsTracker` class in both backend and frontend to accurately track inference performance metrics, including Time to First Token (TTFT) and Time Per Output Token (TPOT).
- Updated `stream_to_cloud_model` and `stream_response_from_external_api` functions to utilize the new metrics tracker for logging and calculating performance statistics.
- Enhanced frontend components to display detailed inference statistics, including client-side TTFT and per-token timing metrics.
- Added support for progressive statistics during streaming, allowing real-time performance monitoring.
- Updated related types to accommodate new metrics and statistics structures.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive metrics improvements for inference performance tracking, adding both client-side and backend measurement capabilities, along with UI enhancements for displaying "thinking" processes in LLM responses.

Changes:

  • Adds new metrics tracking infrastructure on both frontend (TypeScript) and backend (Python) to measure TTFT, TPOT, network latency, and per-token timing
  • Implements thinking block visualization in the streaming message UI, allowing users to see and toggle the LLM's reasoning process
  • Enhances the InferenceStats component to display new metrics including client TTFT, network latency, and per-token timing statistics

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
app/frontend/tsconfig.json Adds explicit type declarations for vite/client, react, and react-dom
app/frontend/src/components/chatui/types.ts Defines new interfaces for token timestamps, progressive stats, and enhanced inference metrics
app/frontend/src/components/chatui/metricsTracker.ts Implements client-side metrics tracker for measuring TTFT, recording token timestamps, and calculating network latency
app/frontend/src/components/chatui/runInference.ts Integrates metrics tracker into inference flow to capture timing data during streaming
app/frontend/src/components/chatui/StreamingMessage.tsx Adds thinking block detection, extraction, and visualization with collapsible UI
app/frontend/src/components/chatui/InferenceStats.tsx Enhances stats display with new client-side metrics, network latency, and per-token timing percentiles
app/backend/model_control/metrics_tracker.py Implements backend metrics tracker with TTFT/TPOT calculation and per-token timing
app/backend/model_control/model_utils.py Integrates backend metrics tracker into streaming functions for both cloud and external API calls

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@anirudTT anirudTT self-assigned this Feb 9, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// SPDX-FileCopyrightText: © 2025 Tenstorrent AI ULC
// SPDX-FileCopyrightText: © 2025 Tenstorrent AI ULC
// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC
// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file header contains the SPDX-FileCopyrightText line twice. This can cause header/CI checks to fail and should be deduplicated to a single line.

Suggested change
// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC

Copilot uses AI. Check for mistakes.
continue; // Skip processing this chunk as part of the generated text
}
// Handle final statistics from backend (after [DONE])
if (!isAgentSelected && jsonData.ttft && jsonData.tpot) {
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final-stats detection uses a truthy check (jsonData.ttft && jsonData.tpot). Since the backend can legitimately emit 0.0 for TTFT/TPOT (e.g., no tokens or only one token), this condition will skip attaching stats. Use an explicit numeric/undefined check (e.g., typeof jsonData.ttft === "number" and typeof jsonData.tpot === "number").

Suggested change
if (!isAgentSelected && jsonData.ttft && jsonData.tpot) {
if (
!isAgentSelected &&
typeof jsonData.ttft === "number" &&
typeof jsonData.tpot === "number"
) {

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +125
def get_stats(self) -> Dict[str, float]:
"""
Get final statistics

Returns:
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_stats() is annotated as returning Dict[str, float], but the returned dict includes integer values for token counts/context length. Update the annotation to a mixed type (e.g., Dict[str, float | int]) or normalize values to floats to keep typing accurate.

Copilot uses AI. Check for mistakes.
}) {
const [renderedContent, setRenderedContent] = useState("");
const showThinking = externalShowThinking ?? false;
const [showThinking, setShowThinking] = useState(false);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalShowThinking is destructured from props but never used. With noUnusedLocals: true, this will fail type-checking; remove the destructuring or wire it into the new internal showThinking state (e.g., treat it as a controlled prop).

Suggested change
const [showThinking, setShowThinking] = useState(false);
const [showThinking, setShowThinking] = useState(externalShowThinking ?? false);

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
for _ in range(tokens_arrived):
self.token_times.append(current_time)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For token count jumps, record_token() appends multiple identical timestamps (same current_time). This biases get_tpot() downward by treating batch-arrived tokens as zero-time intervals. Consider deriving per-token time from (delta_time / delta_tokens) or using the median-based approach (get_accurate_tpot()) for reported stats.

Suggested change
for _ in range(tokens_arrived):
self.token_times.append(current_time)
# Distribute timestamps evenly between last token time and now
if self.token_times:
last_time = self.token_times[-1]
else:
# Fallback: if for some reason we have no prior timestamps,
# treat all tokens as arriving at current_time.
last_time = current_time
if tokens_arrived == 1 or current_time <= last_time:
# Single token or no positive time delta: keep current behavior
self.token_times.append(current_time)
else:
delta_time = current_time - last_time
step = delta_time / tokens_arrived
for i in range(1, tokens_arrived + 1):
self.token_times.append(last_time + step * i)

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +18
class InferenceMetricsTracker:
"""Track inference metrics during streaming"""
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing backend tests under app/backend/model_control/, but the new InferenceMetricsTracker logic is untested. Adding unit tests for record_token() (including token-count jumps) and get_stats() would help prevent regressions in TTFT/TPOT calculations.

Copilot uses AI. Check for mistakes.
"module": "ESNext",
"skipLibCheck": true,
"baseUrl": ".",
"types": ["vite/client", "react", "react-dom"],
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting compilerOptions.types overrides TypeScript’s automatic inclusion of @types/* packages. With the current list, @types/node is no longer included, but the frontend source uses NodeJS.* types (e.g. NodeJS.Timeout), which will cause type-check/build failures. Consider either removing the types override or adding "node" to this list (or refactoring those call sites to use ReturnType<typeof setTimeout>).

Suggested change
"types": ["vite/client", "react", "react-dom"],
"types": ["vite/client", "react", "react-dom", "node"],

Copilot uses AI. Check for mistakes.
const [showThinking, setShowThinking] = useState(false);
const [isThinkingActive, setIsThinkingActive] = useState(false);
const contentRef = useRef(processContent(content).cleanedContent);
const contentRef = useRef(processContent(content, isStreamFinished).cleanedContent);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processContent is declared as (content: string), but it’s called as processContent(content, isStreamFinished) here. This is a TS compile error; either update processContent to accept the second parameter (and use it) or remove the extra argument.

Suggested change
const contentRef = useRef(processContent(content, isStreamFinished).cleanedContent);
const contentRef = useRef(processContent(content).cleanedContent);

Copilot uses AI. Check for mistakes.
@anirudTT anirudTT requested a review from rfatimaTT February 11, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant