Skip to content

feat: add query phase tracking for SHOW QUERIES#34706

Open
yihaoDeng wants to merge 5 commits into3.0from
feat/addShowQuery
Open

feat: add query phase tracking for SHOW QUERIES#34706
yihaoDeng wants to merge 5 commits into3.0from
feat/addShowQuery

Conversation

@yihaoDeng
Copy link
Contributor

Add current_phase and action_start_time fields to track query execution stages:

  • 0=query, 1=fetch, 2=query_callback, 3=fetch_callback

This helps monitor what phase a query is in and how long each phase takes.

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Add current_phase and action_start_time fields to track query execution stages:
- 0=query, 1=fetch, 2=query_callback, 3=fetch_callback

This helps monitor what phase a query is in and how long each phase takes.
Copilot AI review requested due to automatic review settings March 7, 2026 11:54
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of query execution by adding detailed phase tracking. It introduces new fields to monitor the current stage of a query and the timestamp when that stage began, making this information accessible through the SHOW QUERIES command. This improvement allows for better performance analysis, bottleneck identification, and a clearer understanding of query lifecycle, ultimately aiding in debugging and optimization efforts.

Highlights

  • Query Phase Tracking: Introduced currentPhase and actionStartTime fields in SQueryDesc and SRequestObj to track the current execution stage and its start time for queries.
  • Visibility in SHOW QUERIES: Exposed the new current_phase and action_start_time fields in the SHOW QUERIES output, providing users with real-time insights into query progress.
  • Phase Lifecycle Management: Implemented logic to update the currentPhase and actionStartTime during various stages of query execution, including initial query, data fetching, and their respective callbacks.
  • Comprehensive Testing: Added a new Python test suite to validate the functionality, schema, and accuracy of the query phase tracking feature under different scenarios, including long-running and concurrent queries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/common/tmsg.h
    • Added currentPhase and actionStartTime fields to the SQueryDesc structure.
  • source/client/inc/clientInt.h
    • Added currentPhase and actionStartTime fields to the SRequestObj structure.
  • source/client/src/clientEnv.c
    • Initialized currentPhase and actionStartTime for new SRequestObj instances.
    • Removed an unnecessary memory allocation for msgBuf.
  • source/client/src/clientHb.c
    • Populated currentPhase and actionStartTime in SQueryDesc from the SRequestObj during heartbeat request building.
  • source/client/src/clientImpl.c
    • Updated currentPhase and actionStartTime when doRequestCallback is invoked for query and fetch callbacks.
  • source/client/src/clientMain.c
    • Set currentPhase to 'query' and recorded actionStartTime at the beginning of doAsyncQuery.
    • Set currentPhase to 'fetch' and recorded actionStartTime at the beginning of taos_fetch_rows_a.
  • source/common/src/msg/tmsg.c
    • Modified tSerializeSClientHbReq to serialize the new currentPhase and actionStartTime fields.
    • Modified tDeserializeSClientHbReq to deserialize the new currentPhase and actionStartTime fields.
  • source/common/src/systable.c
    • Added current_phase (VARCHAR) and action_start_time (TIMESTAMP) columns to the querySchema for SHOW QUERIES.
  • source/dnode/mnode/impl/src/mndProfile.c
    • Implemented logic to convert currentPhase integer to a human-readable string and added it to the SHOW QUERIES output.
    • Added actionStartTime to the SHOW QUERIES output.
  • test/cases/24-Users/test_query_phase_tracking.py
    • Added a new Python test file test_query_phase_tracking.py to validate the query phase tracking feature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces query phase tracking for SHOW QUERIES by adding current_phase and action_start_time fields. The changes are well-contained and correctly implemented across the data structures, client logic, and server-side display logic. My main suggestion is to introduce an enum for the query phases to replace the magic numbers currently used, which will enhance code readability and maintainability. I've also provided a suggestion to strengthen the new test case for timing accuracy.

Note: Security Review did not run due to the size of the PR.

Comment on lines +5077 to +5078
int32_t currentPhase; // Current execution phase: 0=query, 1=fetch, 2=query callback, 3=fetch callback
int64_t actionStartTime; // When current action started (timestamp precision ms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve code readability and maintainability, it's better to define an enum for the query phases instead of using magic numbers. This enum can then be used across the codebase where these phases are referenced.

I suggest adding the following enum definition before the SQueryDesc struct. Subsequent comments will suggest where to use this enum.

typedef enum EQueryPhase {
  QUERY_PHASE_NOT_STARTED = -1,
  QUERY_PHASE_QUERY = 0,
  QUERY_PHASE_FETCH = 1,
  QUERY_PHASE_QUERY_CALLBACK = 2,
  QUERY_PHASE_FETCH_CALLBACK = 3,
} EQueryPhase;

Comment on lines +2032 to +2033
pRequest->currentPhase = 0; // 0 = query phase
pRequest->actionStartTime = taosGetTimestampMs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

As suggested in the comment on include/common/tmsg.h, this magic number 0 should be replaced with the QUERY_PHASE_QUERY enum value for better code clarity.

  pRequest->currentPhase = QUERY_PHASE_QUERY;  // 0 = query phase
  pRequest->actionStartTime = taosGetTimestampMs();

Comment on lines +1330 to +1336
switch (pQuery->currentPhase) {
case 0: phaseStr = "query"; break;
case 1: phaseStr = "fetch"; break;
case 2: phaseStr = "query_callback"; break;
case 3: phaseStr = "fetch_callback"; break;
default: phaseStr = "unknown"; break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid using magic numbers and improve readability, please replace the case values with the suggested EQueryPhase enum values.

    switch (pQuery->currentPhase) {
      case QUERY_PHASE_QUERY: phaseStr = "query"; break;
      case QUERY_PHASE_FETCH: phaseStr = "fetch"; break;
      case QUERY_PHASE_QUERY_CALLBACK: phaseStr = "query_callback"; break;
      case QUERY_PHASE_FETCH_CALLBACK: phaseStr = "fetch_callback"; break;
      default: phaseStr = "unknown"; break;
    }

Comment on lines +213 to +215
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# The query time should be between before and after (with some tolerance)
# Note: The timestamp might be in different precision, so we just verify it's reasonable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_phase_timing_accuracy currently only logs the timestamps but doesn't perform any verification. To make the test more effective, an assertion should be added to verify that action_start_time is within the expected range.

Suggested change
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# The query time should be between before and after (with some tolerance)
# Note: The timestamp might be in different precision, so we just verify it's reasonable
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# The query time should be between before and after (with some tolerance)
assert before_time <= query_time <= after_time

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds query execution phase tracking for the SHOW QUERIES command in TDengine. It introduces two new columns (current_phase and action_start_time) to the query schema, tracking which execution stage (query, fetch, query_callback, fetch_callback) a query is in and when that stage began.

Changes:

  • New currentPhase and actionStartTime fields added to SRequestObj and SQueryDesc structs, with lifecycle tracking at each execution phase
  • Heartbeat serialization/deserialization updated to transmit the new fields to the MNode, and MNode updated to pack them into the SHOW QUERIES block
  • New test file added to validate the new columns and phase values

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
include/common/tmsg.h Adds currentPhase and actionStartTime fields to SQueryDesc
source/client/inc/clientInt.h Adds the same fields to SRequestObj
source/client/src/clientEnv.c Initializes new fields; accidentally removes msgBuf allocation
source/client/src/clientMain.c Sets phase=0 at query start, phase=1 at fetch start
source/client/src/clientImpl.c Transitions phase to 2/3 in doRequestCallback
source/client/src/clientHb.c Copies new fields into heartbeat descriptor
source/common/src/msg/tmsg.c Encodes/decodes new fields in heartbeat (breaking wire change)
source/common/src/systable.c Adds two new columns to querySchema
source/dnode/mnode/impl/src/mndProfile.c Packs phase string and start time into SHOW QUERIES result block
test/cases/24-Users/test_query_phase_tracking.py New test file for the feature
Comments suppressed due to low confidence (1)

source/client/src/clientEnv.c:604

  • The line (*pRequest)->msgBuf = taosMemoryCalloc(1, ERROR_MSG_BUF_DEFAULT_SIZE); was accidentally removed from createRequest(). Since *pRequest is zero-initialized via taosMemoryCalloc, msgBuf will always be NULL, causing the null-check on line 601 to always trigger and createRequest to always fail. This breaks all query requests, as msgBuf is used by the parse context in multiple places (e.g., clientMain.c:1964, clientImpl.c:378, clientImpl.c:600).
  if (NULL == (*pRequest)->msgBuf) {
    code = terrno;
    goto _return;
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +404 to +408
code = tDecodeI32(pDecoder, &desc.currentPhase);
TAOS_CHECK_GOTO(code, &line, _error);

code = tDecodeI64(pDecoder, &desc.actionStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heartbeat serialization format change is backward-incompatible. The new currentPhase and actionStartTime fields are encoded unconditionally in the middle of the per-query descriptor stream (after subPlanNum, before snum). There is no tDecodeIsEnd guard on the decoder side to handle messages from older clients. If a new server receives a heartbeat from an old client (which doesn't encode these fields), it will try to read 4+8 = 12 bytes that don't exist, causing a decode failure. This will break rolling upgrades where clients and servers may be at different versions. The new fields should be added at the end of the per-query descriptor and guarded with tDecodeIsEnd checks on deserialization, following the pattern used elsewhere in the same file (e.g., lines 463–476).

Suggested change
code = tDecodeI32(pDecoder, &desc.currentPhase);
TAOS_CHECK_GOTO(code, &line, _error);
code = tDecodeI64(pDecoder, &desc.actionStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
/* currentPhase and actionStartTime were added later; guard with tDecodeIsEnd
* so we can still decode heartbeats from older clients that don't send them.
*/
if (!tDecodeIsEnd(pDecoder)) {
code = tDecodeI32(pDecoder, &desc.currentPhase);
TAOS_CHECK_GOTO(code, &line, _error);
code = tDecodeI64(pDecoder, &desc.actionStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
}

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +215
if tdSql.getRows() > 0:
col_names = [row[0] for row in tdSql.getColNames()]
time_idx = col_names.index("action_start_time") if "action_start_time" in col_names else -1

if time_idx >= 0:
query_time = tdSql.getData(0, time_idx)
# Convert to milliseconds if in different unit
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# The query time should be between before and after (with some tolerance)
# Note: The timestamp might be in different precision, so we just verify it's reasonable
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_phase_timing_accuracy test collects the timing values and logs them but never performs any actual assertion on them. The docstring states it will "Verify action_start_time is within reasonable range of recorded time," but the test body only logs the values without checking anything. This means the test will always pass regardless of correctness, providing no real validation of the timing behavior.

Suggested change
if tdSql.getRows() > 0:
col_names = [row[0] for row in tdSql.getColNames()]
time_idx = col_names.index("action_start_time") if "action_start_time" in col_names else -1
if time_idx >= 0:
query_time = tdSql.getData(0, time_idx)
# Convert to milliseconds if in different unit
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# The query time should be between before and after (with some tolerance)
# Note: The timestamp might be in different precision, so we just verify it's reasonable
# Ensure we have at least one query to inspect
assert tdSql.getRows() > 0, "show queries returned no rows to validate action_start_time"
col_names = [row[0] for row in tdSql.getColNames()]
assert "action_start_time" in col_names, "show queries output has no action_start_time column"
time_idx = col_names.index("action_start_time")
query_time = tdSql.getData(0, time_idx)
# Log values for debugging
tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}")
# Convert query_time to int if possible (some clients may return string/decimal)
try:
query_time_int = int(query_time)
except (TypeError, ValueError):
raise AssertionError(f"action_start_time has non-numeric value: {query_time!r}")
# Allow some tolerance in case of slight clock/precision differences (e.g. ±60s)
tolerance_ms = 60 * 1000
lower_bound = before_time - tolerance_ms
upper_bound = after_time + tolerance_ms
assert lower_bound <= query_time_int <= upper_bound, (
f"action_start_time {query_time_int} not within expected range "
f"[{lower_bound}, {upper_bound}] based on before/after timestamps"
)

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,221 @@
import time
import platform
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform module is imported but never used in this file. This is an unused import that adds unnecessary noise.

Suggested change
import platform

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,221 @@
import time
import platform
from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports sc, clusterComCheck, and tdCom from new_test_framework.utils are never used in this file. These are unused imports.

Suggested change
from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom
from new_test_framework.utils import tdLog, tdSql

Copilot uses AI. Check for mistakes.
Use EQueryExecPhase enum (none/parse/catalog/plan/schedule/execute/fetch/done)
instead of raw integer phases. Fix field name mismatches, serialization order,
and backward-compatible deserialization for SHOW QUERIES phase tracking.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 10, 2026 14:28
Extend SQuerySubDesc with startTs/endTs from scheduler task profile.
Update sub_status format to tid:status:startMs:endMs for each sub-task.
Backward-compatible serialization via tDecodeIsEnd guard.

Made-with: Cursor
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants