Conversation
- Extended SQL grammar to support tag column references using the same syntax as non-tag column references (col_name FROM db.table.col and db.table.col positional form) - Updated SCreateVSubTableStmt with pSpecificTagRefs and pTagRefs fields - Modified parTranslater to validate and process tag references through checkAndReplaceTagRefs - Added JSON serialization/deserialization for new tag reference fields - Added test cases in test_vtable_tag_ref.py
…thout truncation
- parser/parTranslater.c: relax checkColRef to only check type (not bytes)
for variable-length types (binary/nchar/varchar/varbinary), allowing
virtual table columns to have different lengths from source columns.
- executor/executil.c: in initQueryTableDataCondWithColArray, use
TMAX(virtual_bytes, source_bytes) for variable-length types to ensure
buildBuf allocation is adequate for source data.
- executor/scanoperator.c:
- createVTableScanInfoFromParam: relax type+bytes check to type-only
for variable-length types; after createOneDataBlockWithColArray,
update pResBlock column info.bytes to source table bytes so
doCopyColVal length check passes.
- createVTableScanInfoFromBatchParam: populate SColIdPair.type from
source schema; add same pResBlock info.bytes update loop.
- test: add test_vtable_nchar_length.py covering three scenarios
(vtable col len > / = / < source col len) with projection, SQL
functions (length, char_length, lower, upper, concat, substring,
replace, ascii, position, cast, first, last), filters, and mixed
cross-table references.
- Add validateSrcTableColRef to verify source table/column existence - vtbRefValidateLocal: check local vnode meta first - vtbRefValidateRemote: fetch DB vgroup info from MNode, calculate vgId, fetch table schema from target vnode via RPC, validate column existence - Fix deadlock: release META_READER_LOCK before RPC calls in doOptimizeVTableNameFilter; skip self-RPC when localVgId matches - Fix double htonl bug in vtbRefFetchTableSchema (vgId encoding) - Fix rpcMallocCont/taosMemoryFree mismatch causing crash on cross-db refs - Fix sysTableFillOneVirtualTableRefImpl: correct src_column_name (col 6), err_code (col 8) and err_msg (col 9) population - Populate err_code with TSDB_CODE_PAR_TABLE_NOT_EXIST or TSDB_CODE_PAR_INVALID_REF_COLUMN when validation fails
- Introduce a new test suite for validating virtual table referencing in both same-db and cross-db scenarios. - Implement tests to ensure that references to normal and child tables are correctly validated, checking for successful error codes. - Set up necessary databases and tables for testing, including source tables and virtual tables for comprehensive coverage.
# Conflicts: # source/libs/parser/inc/sql.y
When creating a virtual table with tag column references, validate that the referenced column is actually a tag column, not a data column. - Add check using getNormalColSchema to detect data columns - Return distinct error messages: - "references column which is not a tag column" for data columns - "references non-existent tag" for missing columns - Add test cases in test_vtable_tag_ref.py
Fix compilation error where STableType was incorrectly used instead of ETableType in vtbRefCreateSchemaCache and vtbRefGetTableSchemaLocal. Also adds schema cache functionality for virtual table validation.
When validating virtual table column references, multiple columns may reference the same remote table. Previously, each column reference triggered a separate RPC call to fetch the table schema. This optimization: - Adds 'ownsSchema' flag to SVtbRefSchemaCache to support deep-copy for remote tables (vs shallow copy for local tables) - Creates vtbRefCreateSchemaCacheFromMetaRsp to build cache from RPC response - Adds vtbRefGetRemoteCacheEntry/vtbRefPutRemoteCacheEntry helpers - Modifies vtbRefValidateRemote to cache successful results - Checks cache before making RPC calls in validateSrcTableColRef Result: N columns referencing the same remote table = 1 RPC instead of N
When validating virtual table column references, the previous implementation used linear search (O(N)) to find column names in the cached schema. This becomes inefficient for tables with many columns. This optimization: - Adds pColNameIndex (SHashObj*) to SVtbRefSchemaCache - Creates vtbRefBuildColNameIndex to build hash index from schema array - Modifies vtbRefCreateSchemaCache to build index for local tables - Modifies vtbRefCreateSchemaCacheFromMetaRsp to build index for remote tables - Simplifies vtbRefCheckColumnInCache to use hash lookup (O(1)) - Updates vtbRefFreeSchemaCache to cleanup hash table Result: Column name lookup from O(N) to O(1), especially beneficial for wide tables with many columns.
- Add test_vtable_performance.py: 11 test cases for performance benchmarking - Large dataset queries (100K rows) - Wide table queries (100/500 columns) - Aggregation, filtering, and interval queries - Cross-database and mixed column references - Add test_vtable_stress.py: 8 test cases for stability testing - Memory leak detection with repeated operations - Large result set handling - Concurrent access simulation - Complex query patterns - Update CI config (cases.task) to include new tests
There was a problem hiding this comment.
Pull request overview
This PR extends virtual table functionality by adding richer tag-reference syntax (including positional and explicit tag_name FROM ... forms), supporting cross-table/cross-db references, and ensuring variable-length columns in virtual tables can return source values without truncation. It also introduces a new validation/show path around virtual-table referencing metadata and adds substantial new test coverage (including performance/stress suites).
Changes:
- Add support for tag reference metadata end-to-end (meta encode/decode, catalog, planner/executor handling) and new virtual-tag parsing forms.
- Adjust planning/execution to tolerate source-vs-virtual length differences for variable-length types (binary/nchar/varchar/varbinary).
- Add new system table +
SHOW VTABLE VALIDATE FOR ...plumbing and add multiple new virtual-table test suites, including stress/performance.
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_new/case_list_docs/query/hint.md | Removes a single case-list entry. |
| tests/army/vtable/test_vtable_tag_ref.py | Adds army-framework tests for vtable tag reference syntax. |
| test/ci/cases.task | Enables new vtable test suites in CI, including performance/stress. |
| test/cases/05-VirtualTables/test_vtable_tag_ref.py | Adds comprehensive new-framework tests for tag-ref syntax, cross-db refs, and negative cases. |
| test/cases/05-VirtualTables/test_vtable_stress.py | Adds stress scenarios (create/drop loops, many vtables, large scans). |
| test/cases/05-VirtualTables/test_vtable_performance.py | Adds performance-oriented scenarios (wide tables, 10k rows, repeated queries). |
| test/cases/05-VirtualTables/test_vtable_nchar_length_supplemental.py | Adds supplemental no-truncation tests for unicode/NULL/edge cases and super-table scenarios. |
| test/cases/05-VirtualTables/test_vtable_nchar_length.py | Adds core no-truncation tests across GT/EQ/LT vtable-length scenarios. |
| test/cases/05-VirtualTables/in/test_vtable_nchar_length.in | Adds SQL input coverage mirroring the nchar/binary length tests. |
| source/libs/transport/src/transSvr.c | Initializes libuv queue before use in async worker callback. |
| source/libs/qcom/src/querymsg.c | Allocates/copies tag-ref metadata into table meta structures. |
| source/libs/qcom/src/queryUtil.c | Clones tag-ref metadata in table meta cloning. |
| source/libs/planner/src/planPhysiCreater.c | Routes new system table scans to correct exec node. |
| source/libs/planner/src/planLogicCreater.c | Uses reference schema bytes for var-len columns; relaxes bytes mismatch for var-len types. |
| source/libs/parser/src/parUtil.c | Accepts placeholders for new virtual-table-related system-table columns. |
| source/libs/parser/src/parTokenizer.c | Adds VALIDATE keyword. |
| source/libs/parser/src/parAstParser.c | Collects metadata/auth for tag-ref tables; adds validation stmt metadata collection. |
| source/libs/parser/src/parAstCreater.c | Adds helpers to build column refs from db.table.col / table.col; creates validate-vtable stmt. |
| source/libs/parser/inc/sql.y | Adds SHOW VTABLE VALIDATE FOR ... grammar; adds vtags literal grammar for new tag-ref forms. |
| source/libs/parser/inc/parAst.h | Declares new AST helpers and validate stmt creator. |
| source/libs/nodes/src/nodesUtilFuncs.c | Adds node creation/destruction support for validate-vtable stmt and new lists. |
| source/libs/nodes/src/nodesCodeFuncs.c | Adds JSON encode/decode for validate-vtable stmt; persists new create-vsubtable fields. |
| source/libs/function/src/tudf.c | Initializes libuv queue before use; fixes statement/queue declaration ordering. |
| source/libs/executor/src/scanoperator.c | Propagates type/bytes through vtable scan maps; relaxes bytes check for var-len types; adjusts result block bytes. |
| source/libs/executor/src/executil.c | Uses max(source, vtable) bytes for var-len types when building scan conditions. |
| source/libs/catalog/src/ctgAsync.c | Extends catalog meta handling to include tag refs (but contains a critical pointer-arithmetic bug). |
| source/dnode/vnode/src/vnd/vnodeQuery.c | Adds filling/returning tag ref metadata in table meta responses (but misses freeing pTagRefs). |
| source/dnode/vnode/src/tsdb/tsdbRead2.c | Adds debug logging around var-length data length checks. |
| source/dnode/vnode/src/meta/metaTable.c | Adds tag-ref population and cleanup in vtable meta responses. |
| source/dnode/vnode/src/meta/metaEntry.c | Encodes/decodes tag-ref wrapper fields with backward compatibility. |
| source/dnode/mnode/impl/src/mndShow.c | Adds mapping for new system table show type. |
| source/dnode/mgmt/node_mgmt/src/dmTransport.c | Routes TDMT_VND_TABLE_META_RSP through the correct handler. |
| source/common/src/systable.c | Adds ins_virtual_tables_referencing system table schema. |
| source/common/src/msg/tmsg.c | Adds tag-ref fields to table-meta response encode/decode and frees them. |
| include/util/tdef.h | Adds constant for validate error message size. |
| include/libs/qcom/query.h | Extends meta structs with numOfTagRefs/tagRef. |
| include/libs/nodes/cmdnodes.h | Extends create-vsubtable stmt + adds validate-vtable stmt node struct. |
| include/common/tmsg.h | Extends message structs/wrappers with tag refs + init/free helpers. |
| include/common/systable.h | Declares new system table name constant. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pOut->tbMeta = taosMemoryRealloc(pOut->tbMeta, sizeof(STableMeta) + colRefSize + tagRefSize); | ||
| TAOS_MEMCPY(pOut->tbMeta, pOut->vctbMeta, sizeof(SVCTableMeta)); | ||
| TAOS_MEMCPY(pOut->tbMeta + sizeof(STableMeta), pOut->vctbMeta + sizeof(SVCTableMeta), colRefSize); | ||
| pOut->tbMeta->colRef = (SColRef *)((char *)pOut->tbMeta + sizeof(STableMeta)); | ||
| if (pOut->vctbMeta->tagRef && tagRefSize > 0) { | ||
| pOut->tbMeta->tagRef = (SColRef *)((char *)pOut->tbMeta + sizeof(STableMeta) + colRefSize); | ||
| TAOS_MEMCPY(pOut->tbMeta->tagRef, pOut->vctbMeta->tagRef, tagRefSize); |
There was a problem hiding this comment.
In the CTG_IS_META_VBOTH branch where pOut->tbMeta is NULL, the TAOS_MEMCPY offsets use pointer arithmetic on typed pointers (e.g., pOut->tbMeta + sizeof(STableMeta) and pOut->vctbMeta + sizeof(SVCTableMeta)), which advances by sizeof(...) * sizeof(*ptr) and will corrupt memory. Use (char*) (or uint8_t*) byte offsets when copying the colRef payload, and consider adding a NULL check for the realloc result before dereferencing.
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_tag_ref.py | ||
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_nchar_length.py | ||
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_validate_referencing.py | ||
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_query_comprehensive.py | ||
| # Performance and Stress tests | ||
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_performance.py | ||
| ,,y,.,./ci/pytest.sh pytest cases/05-VirtualTables/test_vtable_stress.py |
There was a problem hiding this comment.
These stress/performance test jobs are enabled in the default CI task list (,,y,). The new suites create many tables/vtables and run large scans/loops, which may significantly increase CI runtime and flakiness. Consider marking them disabled by default (,,n,) or gating them behind a dedicated performance/stress pipeline.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the functionality and robustness of virtual tables. It addresses a critical data integrity concern by ensuring that Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant enhancements for virtual tables, including support for tag references with a new unified syntax and a fix for handling variable-length data types. However, several security vulnerabilities related to integer overflows and lack of input validation in message decoding were identified, which could lead to heap buffer overflows or Denial of Service (DoS) when processing maliciously crafted network messages. It is highly recommended to validate the number of column and tag references against system limits (e.g., TSDB_MAX_TAGS, TSDB_MAX_COLUMNS) before performing memory allocations. Additionally, a critical memory corruption issue was found in the SHOW VTABLE VALIDATE command, and there's minor code duplication that could be refactored.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 45 out of 46 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CTG_ERR_JRET(terrno); | ||
| } | ||
| TAOS_MEMCPY(pOut->tbMeta, pOut->vctbMeta, sizeof(SVCTableMeta)); | ||
| TAOS_MEMCPY(pOut->tbMeta + sizeof(STableMeta), pOut->vctbMeta + sizeof(SVCTableMeta), colRefSize); |
There was a problem hiding this comment.
In the CTG_IS_META_VBOTH branch where pOut->tbMeta is NULL, the TAOS_MEMCPY destination/source use pointer arithmetic on STableMeta*/SVCTableMeta* (e.g., pOut->tbMeta + sizeof(STableMeta)), which advances by sizeof(STableMeta) elements rather than bytes. This will write/read out-of-bounds and corrupt memory. Use byte-pointer arithmetic (cast to (char*)) for both destination and source offsets, consistent with the colRef pointer calculation just below.
| TAOS_MEMCPY(pOut->tbMeta + sizeof(STableMeta), pOut->vctbMeta + sizeof(SVCTableMeta), colRefSize); | |
| TAOS_MEMCPY((char *)pOut->tbMeta + sizeof(STableMeta), | |
| (char *)pOut->vctbMeta + sizeof(SVCTableMeta), | |
| colRefSize); |
| {"XNODES", TK_XNODES}, | ||
| {"DRAIN", TK_DRAIN}, | ||
| {"REBALANCE", TK_REBALANCE}, | ||
| {"VALIDATE", TK_VALIDATE}, |
There was a problem hiding this comment.
Minor formatting: the new keyword table entry has inconsistent alignment ({"VALIDATE", TK_VALIDATE}) compared to neighboring entries. Consider aligning spaces to keep the table uniformly formatted.
| {"VALIDATE", TK_VALIDATE}, | |
| {"VALIDATE", TK_VALIDATE}, |
…column length < actual data length When a virtual table has a column with length smaller than the source table column (e.g., vtable binary(8) referencing source binary(32)), the vectorConvertFromVarData function would allocate a buffer based on the vtable column length but copy the full actual data length, causing buffer overflow. This fix: 1. Calculates the actual data size before allocation 2. Reallocates buffer if actual size exceeds allocated buffer size 3. Uses TMAX to ensure buffer is large enough for actual data Also updated test case to verify data integrity using WHERE clause and length() function instead of expecting consistent display format across different column length definitions.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 46 out of 47 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int32_t bufSize = pCtx->pIn->columnData->info.bytes; | ||
| if (tmp == NULL) { | ||
| tmp = taosMemoryMalloc(bufSize); | ||
| int32_t actualSize = vton ? varDataTLen(data) : (varDataLen(data) + 1); | ||
|
|
||
| // Reallocate buffer if actual data size exceeds allocated buffer | ||
| if (tmp == NULL || actualSize > bufSize) { | ||
| if (tmp != NULL) { | ||
| taosMemoryFree(tmp); | ||
| } | ||
| tmp = taosMemoryMalloc(TMAX(bufSize, actualSize)); | ||
| if (tmp == NULL) { | ||
| sclError("out of memory in vectorConvertFromVarData"); | ||
| SCL_ERR_JRET(terrno); | ||
| } | ||
| bufSize = TMAX(bufSize, actualSize); | ||
| } |
There was a problem hiding this comment.
vectorConvertFromVarData recomputes bufSize from pCtx->pIn->columnData->info.bytes on every loop iteration, but tmp may already have been allocated larger in a previous iteration. As a result, actualSize > bufSize can become true repeatedly and trigger unnecessary free/malloc churn (and makes bufSize not reflect the actual allocated size). Track the allocated temp buffer size separately outside the loop (e.g., int32_t tmpCap) and only realloc when actualSize > tmpCap.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 46 out of 47 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SNodeList* pSpecificTagRefs; // tag_name FROM db.table.tag_col (same as specific_column_ref) | ||
| SNodeList* pTagRefs; // db.table.tag_col (same as column_ref, positional) |
There was a problem hiding this comment.
SCreateVSubTableStmt adds pSpecificTagRefs / pTagRefs, but they are not populated by the current grammar/actions (CREATE VTABLE ... TAGS uses pValsOfTags) and there are no reads of these fields outside JSON serialization/destruction. Either wire these lists through parsing/planning (so they carry meaning) or remove them to avoid dead/duplicated state.
| SNodeList* pSpecificTagRefs; // tag_name FROM db.table.tag_col (same as specific_column_ref) | |
| SNodeList* pTagRefs; // db.table.tag_col (same as column_ref, positional) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 47 out of 48 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # 3.2. change column length when child table still has column reference | ||
| tdSql.execute("alter stable vtb_virtual_stb modify column nchar_16_col nchar(32);") | ||
| tdSql.error("select nchar_16_col from vtb_virtual_ctb0;") | ||
| #tdSql.error("select nchar_16_col from vtb_virtual_ctb0;") |
There was a problem hiding this comment.
This negative assertion was commented out, which weakens the existing error-case coverage for altering a virtual stable while child vtables still reference the modified column. If the behavior has changed intentionally, the test should be updated to assert the new expected behavior (or check for the new error code/message) rather than disabling the check.
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.