feat: add query phase tracking for SHOW QUERIES#34706
Conversation
Add current_phase and action_start_time fields to track query execution stages: - 0=query, 1=fetch, 2=query_callback, 3=fetch_callback This helps monitor what phase a query is in and how long each phase takes.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of query execution by adding detailed phase tracking. It introduces new fields to monitor the current stage of a query and the timestamp when that stage began, making this information accessible through the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces query phase tracking for SHOW QUERIES by adding current_phase and action_start_time fields. The changes are well-contained and correctly implemented across the data structures, client logic, and server-side display logic. My main suggestion is to introduce an enum for the query phases to replace the magic numbers currently used, which will enhance code readability and maintainability. I've also provided a suggestion to strengthen the new test case for timing accuracy.
Note: Security Review did not run due to the size of the PR.
include/common/tmsg.h
Outdated
| int32_t currentPhase; // Current execution phase: 0=query, 1=fetch, 2=query callback, 3=fetch callback | ||
| int64_t actionStartTime; // When current action started (timestamp precision ms) |
There was a problem hiding this comment.
To improve code readability and maintainability, it's better to define an enum for the query phases instead of using magic numbers. This enum can then be used across the codebase where these phases are referenced.
I suggest adding the following enum definition before the SQueryDesc struct. Subsequent comments will suggest where to use this enum.
typedef enum EQueryPhase {
QUERY_PHASE_NOT_STARTED = -1,
QUERY_PHASE_QUERY = 0,
QUERY_PHASE_FETCH = 1,
QUERY_PHASE_QUERY_CALLBACK = 2,
QUERY_PHASE_FETCH_CALLBACK = 3,
} EQueryPhase;
source/client/src/clientMain.c
Outdated
| pRequest->currentPhase = 0; // 0 = query phase | ||
| pRequest->actionStartTime = taosGetTimestampMs(); |
There was a problem hiding this comment.
| switch (pQuery->currentPhase) { | ||
| case 0: phaseStr = "query"; break; | ||
| case 1: phaseStr = "fetch"; break; | ||
| case 2: phaseStr = "query_callback"; break; | ||
| case 3: phaseStr = "fetch_callback"; break; | ||
| default: phaseStr = "unknown"; break; | ||
| } |
There was a problem hiding this comment.
To avoid using magic numbers and improve readability, please replace the case values with the suggested EQueryPhase enum values.
switch (pQuery->currentPhase) {
case QUERY_PHASE_QUERY: phaseStr = "query"; break;
case QUERY_PHASE_FETCH: phaseStr = "fetch"; break;
case QUERY_PHASE_QUERY_CALLBACK: phaseStr = "query_callback"; break;
case QUERY_PHASE_FETCH_CALLBACK: phaseStr = "fetch_callback"; break;
default: phaseStr = "unknown"; break;
}| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | ||
| # The query time should be between before and after (with some tolerance) | ||
| # Note: The timestamp might be in different precision, so we just verify it's reasonable |
There was a problem hiding this comment.
The test test_phase_timing_accuracy currently only logs the timestamps but doesn't perform any verification. To make the test more effective, an assertion should be added to verify that action_start_time is within the expected range.
| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | |
| # The query time should be between before and after (with some tolerance) | |
| # Note: The timestamp might be in different precision, so we just verify it's reasonable | |
| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | |
| # The query time should be between before and after (with some tolerance) | |
| assert before_time <= query_time <= after_time |
There was a problem hiding this comment.
Pull request overview
This PR adds query execution phase tracking for the SHOW QUERIES command in TDengine. It introduces two new columns (current_phase and action_start_time) to the query schema, tracking which execution stage (query, fetch, query_callback, fetch_callback) a query is in and when that stage began.
Changes:
- New
currentPhaseandactionStartTimefields added toSRequestObjandSQueryDescstructs, with lifecycle tracking at each execution phase - Heartbeat serialization/deserialization updated to transmit the new fields to the MNode, and MNode updated to pack them into the
SHOW QUERIESblock - New test file added to validate the new columns and phase values
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
include/common/tmsg.h |
Adds currentPhase and actionStartTime fields to SQueryDesc |
source/client/inc/clientInt.h |
Adds the same fields to SRequestObj |
source/client/src/clientEnv.c |
Initializes new fields; accidentally removes msgBuf allocation |
source/client/src/clientMain.c |
Sets phase=0 at query start, phase=1 at fetch start |
source/client/src/clientImpl.c |
Transitions phase to 2/3 in doRequestCallback |
source/client/src/clientHb.c |
Copies new fields into heartbeat descriptor |
source/common/src/msg/tmsg.c |
Encodes/decodes new fields in heartbeat (breaking wire change) |
source/common/src/systable.c |
Adds two new columns to querySchema |
source/dnode/mnode/impl/src/mndProfile.c |
Packs phase string and start time into SHOW QUERIES result block |
test/cases/24-Users/test_query_phase_tracking.py |
New test file for the feature |
Comments suppressed due to low confidence (1)
source/client/src/clientEnv.c:604
- The line
(*pRequest)->msgBuf = taosMemoryCalloc(1, ERROR_MSG_BUF_DEFAULT_SIZE);was accidentally removed fromcreateRequest(). Since*pRequestis zero-initialized viataosMemoryCalloc,msgBufwill always be NULL, causing the null-check on line 601 to always trigger andcreateRequestto always fail. This breaks all query requests, asmsgBufis used by the parse context in multiple places (e.g.,clientMain.c:1964,clientImpl.c:378,clientImpl.c:600).
if (NULL == (*pRequest)->msgBuf) {
code = terrno;
goto _return;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
source/common/src/msg/tmsg.c
Outdated
| code = tDecodeI32(pDecoder, &desc.currentPhase); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
|
|
||
| code = tDecodeI64(pDecoder, &desc.actionStartTime); | ||
| TAOS_CHECK_GOTO(code, &line, _error); |
There was a problem hiding this comment.
The heartbeat serialization format change is backward-incompatible. The new currentPhase and actionStartTime fields are encoded unconditionally in the middle of the per-query descriptor stream (after subPlanNum, before snum). There is no tDecodeIsEnd guard on the decoder side to handle messages from older clients. If a new server receives a heartbeat from an old client (which doesn't encode these fields), it will try to read 4+8 = 12 bytes that don't exist, causing a decode failure. This will break rolling upgrades where clients and servers may be at different versions. The new fields should be added at the end of the per-query descriptor and guarded with tDecodeIsEnd checks on deserialization, following the pattern used elsewhere in the same file (e.g., lines 463–476).
| code = tDecodeI32(pDecoder, &desc.currentPhase); | |
| TAOS_CHECK_GOTO(code, &line, _error); | |
| code = tDecodeI64(pDecoder, &desc.actionStartTime); | |
| TAOS_CHECK_GOTO(code, &line, _error); | |
| /* currentPhase and actionStartTime were added later; guard with tDecodeIsEnd | |
| * so we can still decode heartbeats from older clients that don't send them. | |
| */ | |
| if (!tDecodeIsEnd(pDecoder)) { | |
| code = tDecodeI32(pDecoder, &desc.currentPhase); | |
| TAOS_CHECK_GOTO(code, &line, _error); | |
| code = tDecodeI64(pDecoder, &desc.actionStartTime); | |
| TAOS_CHECK_GOTO(code, &line, _error); | |
| } |
| if tdSql.getRows() > 0: | ||
| col_names = [row[0] for row in tdSql.getColNames()] | ||
| time_idx = col_names.index("action_start_time") if "action_start_time" in col_names else -1 | ||
|
|
||
| if time_idx >= 0: | ||
| query_time = tdSql.getData(0, time_idx) | ||
| # Convert to milliseconds if in different unit | ||
| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | ||
| # The query time should be between before and after (with some tolerance) | ||
| # Note: The timestamp might be in different precision, so we just verify it's reasonable |
There was a problem hiding this comment.
The test_phase_timing_accuracy test collects the timing values and logs them but never performs any actual assertion on them. The docstring states it will "Verify action_start_time is within reasonable range of recorded time," but the test body only logs the values without checking anything. This means the test will always pass regardless of correctness, providing no real validation of the timing behavior.
| if tdSql.getRows() > 0: | |
| col_names = [row[0] for row in tdSql.getColNames()] | |
| time_idx = col_names.index("action_start_time") if "action_start_time" in col_names else -1 | |
| if time_idx >= 0: | |
| query_time = tdSql.getData(0, time_idx) | |
| # Convert to milliseconds if in different unit | |
| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | |
| # The query time should be between before and after (with some tolerance) | |
| # Note: The timestamp might be in different precision, so we just verify it's reasonable | |
| # Ensure we have at least one query to inspect | |
| assert tdSql.getRows() > 0, "show queries returned no rows to validate action_start_time" | |
| col_names = [row[0] for row in tdSql.getColNames()] | |
| assert "action_start_time" in col_names, "show queries output has no action_start_time column" | |
| time_idx = col_names.index("action_start_time") | |
| query_time = tdSql.getData(0, time_idx) | |
| # Log values for debugging | |
| tdLog.info(f"Before: {before_time}, Query: {query_time}, After: {after_time}") | |
| # Convert query_time to int if possible (some clients may return string/decimal) | |
| try: | |
| query_time_int = int(query_time) | |
| except (TypeError, ValueError): | |
| raise AssertionError(f"action_start_time has non-numeric value: {query_time!r}") | |
| # Allow some tolerance in case of slight clock/precision differences (e.g. ±60s) | |
| tolerance_ms = 60 * 1000 | |
| lower_bound = before_time - tolerance_ms | |
| upper_bound = after_time + tolerance_ms | |
| assert lower_bound <= query_time_int <= upper_bound, ( | |
| f"action_start_time {query_time_int} not within expected range " | |
| f"[{lower_bound}, {upper_bound}] based on before/after timestamps" | |
| ) |
| @@ -0,0 +1,221 @@ | |||
| import time | |||
| import platform | |||
There was a problem hiding this comment.
The platform module is imported but never used in this file. This is an unused import that adds unnecessary noise.
| import platform |
| @@ -0,0 +1,221 @@ | |||
| import time | |||
| import platform | |||
| from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom | |||
There was a problem hiding this comment.
The imports sc, clusterComCheck, and tdCom from new_test_framework.utils are never used in this file. These are unused imports.
| from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom | |
| from new_test_framework.utils import tdLog, tdSql |
Use EQueryExecPhase enum (none/parse/catalog/plan/schedule/execute/fetch/done) instead of raw integer phases. Fix field name mismatches, serialization order, and backward-compatible deserialization for SHOW QUERIES phase tracking. Made-with: Cursor
Extend SQuerySubDesc with startTs/endTs from scheduler task profile. Update sub_status format to tid:status:startMs:endMs for each sub-task. Backward-compatible serialization via tDecodeIsEnd guard. Made-with: Cursor
Add current_phase and action_start_time fields to track query execution stages:
This helps monitor what phase a query is in and how long each phase takes.
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.