Anirud/metrics improvements by anirudTT · Pull Request #606 · tenstorrent/tt-studio

anirudTT · 2026-01-28T22:04:43Z

No description provided.

- Introduced a new `InferenceMetricsTracker` class in both backend and frontend to accurately track inference performance metrics, including Time to First Token (TTFT) and Time Per Output Token (TPOT). - Updated `stream_to_cloud_model` and `stream_response_from_external_api` functions to utilize the new metrics tracker for logging and calculating performance statistics. - Enhanced frontend components to display detailed inference statistics, including client-side TTFT and per-token timing metrics. - Added support for progressive statistics during streaming, allowing real-time performance monitoring. - Updated related types to accommodate new metrics and statistics structures.

Copilot

Pull request overview

This PR implements comprehensive metrics improvements for inference performance tracking, adding both client-side and backend measurement capabilities, along with UI enhancements for displaying "thinking" processes in LLM responses.

Changes:

Adds new metrics tracking infrastructure on both frontend (TypeScript) and backend (Python) to measure TTFT, TPOT, network latency, and per-token timing
Implements thinking block visualization in the streaming message UI, allowing users to see and toggle the LLM's reasoning process
Enhances the InferenceStats component to display new metrics including client TTFT, network latency, and per-token timing statistics

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
app/frontend/tsconfig.json	Adds explicit type declarations for vite/client, react, and react-dom
app/frontend/src/components/chatui/types.ts	Defines new interfaces for token timestamps, progressive stats, and enhanced inference metrics
app/frontend/src/components/chatui/metricsTracker.ts	Implements client-side metrics tracker for measuring TTFT, recording token timestamps, and calculating network latency
app/frontend/src/components/chatui/runInference.ts	Integrates metrics tracker into inference flow to capture timing data during streaming
app/frontend/src/components/chatui/StreamingMessage.tsx	Adds thinking block detection, extraction, and visualization with collapsible UI
app/frontend/src/components/chatui/InferenceStats.tsx	Enhances stats display with new client-side metrics, network latency, and per-token timing percentiles
app/backend/model_control/metrics_tracker.py	Implements backend metrics tracker with TTFT/TPOT calculation and per-token timing
app/backend/model_control/model_utils.py	Integrates backend metrics tracker into streaming functions for both cloud and external API calls

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app/frontend/src/components/chatui/metricsTracker.ts

app/frontend/src/components/chatui/StreamingMessage.tsx

app/frontend/src/components/chatui/metricsTracker.ts

app/backend/model_control/metrics_tracker.py

app/frontend/src/components/chatui/StreamingMessage.tsx

app/frontend/src/components/chatui/InferenceStats.tsx

app/backend/model_control/metrics_tracker.py

app/frontend/src/components/chatui/StreamingMessage.tsx

…ments

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T19:37:57Z

app/frontend/src/components/chatui/runInference.ts

-// SPDX-FileCopyrightText: © 2025 Tenstorrent AI ULC
-// SPDX-FileCopyrightText: © 2025 Tenstorrent AI ULC
+// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC
+// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC


The file header contains the SPDX-FileCopyrightText line twice. This can cause header/CI checks to fail and should be deduplicated to a single line.

Suggested change

// SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC

Copilot · 2026-02-10T19:37:58Z

app/frontend/src/components/chatui/runInference.ts

-                    continue; // Skip processing this chunk as part of the generated text
-                  }
+                // Handle final statistics from backend (after [DONE])
+                if (!isAgentSelected && jsonData.ttft && jsonData.tpot) {


The final-stats detection uses a truthy check (jsonData.ttft && jsonData.tpot). Since the backend can legitimately emit 0.0 for TTFT/TPOT (e.g., no tokens or only one token), this condition will skip attaching stats. Use an explicit numeric/undefined check (e.g., typeof jsonData.ttft === "number" and typeof jsonData.tpot === "number").

Suggested change

if (!isAgentSelected && jsonData.ttft && jsonData.tpot) {

if (

!isAgentSelected &&

typeof jsonData.ttft === "number" &&

typeof jsonData.tpot === "number"

) {

Copilot · 2026-02-10T19:37:58Z

app/backend/model_control/metrics_tracker.py

+    def get_stats(self) -> Dict[str, float]:
+        """
+        Get final statistics
+
+        Returns:


get_stats() is annotated as returning Dict[str, float], but the returned dict includes integer values for token counts/context length. Update the annotation to a mixed type (e.g., Dict[str, float | int]) or normalize values to floats to keep typing accurate.

Copilot · 2026-02-10T19:37:58Z

app/frontend/src/components/chatui/StreamingMessage.tsx

  }) {
    const [renderedContent, setRenderedContent] = useState("");
-    const showThinking = externalShowThinking ?? false;
+    const [showThinking, setShowThinking] = useState(false);


externalShowThinking is destructured from props but never used. With noUnusedLocals: true, this will fail type-checking; remove the destructuring or wire it into the new internal showThinking state (e.g., treat it as a controlled prop).

Suggested change

const [showThinking, setShowThinking] = useState(false);

const [showThinking, setShowThinking] = useState(externalShowThinking ?? false);

Copilot · 2026-02-10T19:37:58Z

app/backend/model_control/metrics_tracker.py

+            for _ in range(tokens_arrived):
+                self.token_times.append(current_time)


For token count jumps, record_token() appends multiple identical timestamps (same current_time). This biases get_tpot() downward by treating batch-arrived tokens as zero-time intervals. Consider deriving per-token time from (delta_time / delta_tokens) or using the median-based approach (get_accurate_tpot()) for reported stats.

Suggested change

for _ in range(tokens_arrived):

self.token_times.append(current_time)

# Distribute timestamps evenly between last token time and now

if self.token_times:

last_time = self.token_times[-1]

else:

# Fallback: if for some reason we have no prior timestamps,

# treat all tokens as arriving at current_time.

last_time = current_time

if tokens_arrived == 1 or current_time <= last_time:

# Single token or no positive time delta: keep current behavior

self.token_times.append(current_time)

else:

delta_time = current_time - last_time

step = delta_time / tokens_arrived

for i in range(1, tokens_arrived + 1):

self.token_times.append(last_time + step * i)

Copilot · 2026-02-10T19:37:59Z

app/backend/model_control/metrics_tracker.py

+class InferenceMetricsTracker:
+    """Track inference metrics during streaming"""


There are existing backend tests under app/backend/model_control/, but the new InferenceMetricsTracker logic is untested. Adding unit tests for record_token() (including token-count jumps) and get_stats() would help prevent regressions in TTFT/TPOT calculations.

Copilot · 2026-02-10T19:37:59Z

app/frontend/tsconfig.json

    "module": "ESNext",
    "skipLibCheck": true,
    "baseUrl": ".",
+    "types": ["vite/client", "react", "react-dom"],


Setting compilerOptions.types overrides TypeScript’s automatic inclusion of @types/* packages. With the current list, @types/node is no longer included, but the frontend source uses NodeJS.* types (e.g. NodeJS.Timeout), which will cause type-check/build failures. Consider either removing the types override or adding "node" to this list (or refactoring those call sites to use ReturnType<typeof setTimeout>).

Suggested change

"types": ["vite/client", "react", "react-dom"],

"types": ["vite/client", "react", "react-dom", "node"],

Copilot · 2026-02-10T19:37:59Z

app/frontend/src/components/chatui/StreamingMessage.tsx

+    const [showThinking, setShowThinking] = useState(false);
    const [isThinkingActive, setIsThinkingActive] = useState(false);
-    const contentRef = useRef(processContent(content).cleanedContent);
+    const contentRef = useRef(processContent(content, isStreamFinished).cleanedContent);


processContent is declared as (content: string), but it’s called as processContent(content, isStreamFinished) here. This is a TS compile error; either update processContent to accept the second parameter (and use it) or remove the extra argument.

Suggested change

const contentRef = useRef(processContent(content, isStreamFinished).cleanedContent);

const contentRef = useRef(processContent(content).cleanedContent);

app/frontend/src/components/chatui/StreamingMessage.tsx

anirudTT added 2 commits January 27, 2026 17:28

fix thinking tokens

640dc52

anirudTT requested a review from Copilot January 29, 2026 17:02

Copilot started reviewing on behalf of anirudTT January 29, 2026 17:02 View session

chore: update lc headers

1b7cc84

Copilot AI reviewed Jan 29, 2026

View reviewed changes

Merge remote-tracking branch 'origin/dev' into anirud/metrics-improve…

d0b2bde

…ments

anirudTT self-assigned this Feb 9, 2026

anirudTT added 2 commits February 9, 2026 16:22

Fix duplicate toast notifications (#618)

8560c5e

anirudTT requested a review from Copilot February 10, 2026 19:31

Copilot started reviewing on behalf of anirudTT February 10, 2026 19:32 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

anirudTT requested a review from rfatimaTT February 11, 2026 16:44

Merge branch 'dev' into anirud/metrics-improvements

f5091ea

	const [showThinking, setShowThinking] = useState(false);
	const [showThinking, setShowThinking] = useState(externalShowThinking ?? false);

		for _ in range(tokens_arrived):
		self.token_times.append(current_time)

-            for _ in range(tokens_arrived):
-                self.token_times.append(current_time)
+            # Distribute timestamps evenly between last token time and now
+            if self.token_times:
+                last_time = self.token_times[-1]
+            else:
+                # Fallback: if for some reason we have no prior timestamps,
+                # treat all tokens as arriving at current_time.
+                last_time = current_time
+            if tokens_arrived == 1 or current_time <= last_time:
+                # Single token or no positive time delta: keep current behavior
+                self.token_times.append(current_time)
+            else:
+                delta_time = current_time - last_time
+                step = delta_time / tokens_arrived
+                for i in range(1, tokens_arrived + 1):
+                    self.token_times.append(last_time + step * i)

		class InferenceMetricsTracker:
		"""Track inference metrics during streaming"""

	"types": ["vite/client", "react", "react-dom"],
	"types": ["vite/client", "react", "react-dom", "node"],

	const contentRef = useRef(processContent(content, isStreamFinished).cleanedContent);
	const contentRef = useRef(processContent(content).cleanedContent);

Conversation

anirudTT commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant