-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Enhancement] Support default values for complex types (ARRAY, MAP, STRUCT) #67181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Enhancement] Support default values for complex types (ARRAY, MAP, STRUCT) #67181
Conversation
a000a77 to
c58a69a
Compare
c58a69a to
1766b5b
Compare
[BE Incremental Coverage Report]❌ fail : 84 / 121 (69.42%) file detail
|
|
@cursor review |
| [[maybe_unused]] auto ret = dst->append_strings(slices); | ||
| } else { | ||
| dst->append_value_multiple_times(_mem_value, to_read); | ||
| _current_rowid = range.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Row counter not updated for certain column types
In the next_batch(const SparseRange<>& range, ...) function, _current_rowid is not updated when the column type is TYPE_OBJECT, TYPE_HLL, or TYPE_PERCENTILE. The change moved _current_rowid = range.end(); from outside the if-else block (where it was always executed) to inside only the else branch. This means the row counter won't advance for these column types, potentially causing incorrect iteration behavior during segment scans. The first next_batch(size_t* n, ...) overload correctly updates _current_rowid after the if-else for all types, showing this is inconsistent.
| if (!expr.getType().equals(targetColumn.getType())) { | ||
| expr = new CastExpr(targetColumn.getType(), expr); | ||
| } | ||
| scalarOperator = SqlToScalarOperatorTranslator.translate(expr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent null check leads to potential NPE
The hasExprObject() method in DefaultExpr returns true when either exprObject != null or serializedExpr != null. However, obtainExpr() only returns a non-null value when exprObject != null. If deserialization fails in gsonPostProcess() (which only logs a warning), exprObject remains null while serializedExpr is still set. This creates an inconsistent state where hasExprObject() returns true but obtainExpr() returns null, causing a NullPointerException at line 679 when calling expr.getType().
Additional Locations (1)
1766b5b to
cc8b0a8
Compare
Signed-off-by: stephen <[email protected]>
cc8b0a8 to
35f6f2d
Compare
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 178 / 214 (83.18%) file detail
|
|
@cursor review |
| LOG(WARNING) << "Failed to preprocess default_expr in OlapTableSchemaParam::init: " | ||
| << preprocess_status.to_string(); | ||
| columns_copy = t_index.column_param.columns; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling fallback code unreachable due to always-OK return
The preprocess_default_expr_for_tcolumns function always returns Status::OK() even when individual column conversions fail (it only logs errors internally). The call site in tablet_info.cpp checks for failure and has fallback logic to reset columns_copy to the original columns, but this fallback code is unreachable. When some columns fail preprocessing, the code proceeds with partially-preprocessed columns instead of falling back to the originals as apparently intended. This could lead to inconsistent column configurations where some columns have converted defaults and others don't.



Why I'm doing:
Currently, StarRocks only supports default values for primitive types (INT, VARCHAR, etc.). Complex types like ARRAY, MAP, and STRUCT do not support default values, which limits user flexibility when defining table schemas with nested data structures.
What I'm doing:
This PR adds support for default values for complex types (ARRAY, MAP, STRUCT) in StarRocks.
1. Syntax
StarRocks uses the following syntax for complex type default values:
ARRAY: Use square brackets
[]MAP: Use
map{}STRUCT: Use
row()CREATE TABLE Example:
ALTER TABLE Examples:
2. Limitations
Literal Values Only: Default expressions must contain only literal values. Functions (including
CAST) and expressions are NOT supported.Fast Schema Evolution Required: Complex type default values are only supported when table property
fast_schema_evolutionistrue(which is the default for new tables).3. Implementation Details
3.1 Frontend (FE)
[],map{},row()) into expression trees (ArrayExpr,MapExpr,FunctionCallExpr)analyze()phasefast_schema_evolutionproperty before allowing complex type default valuesColumn.defaultExpr: Stores the analyzed expression (may contain CAST)Expr.toSql()and persisted to FE metadata (EditLog/Image)[1, 2, 3]→ after analysis →[CAST(1 AS BIGINT), CAST(2 AS BIGINT), CAST(3 AS BIGINT)]gsonPostProcess()to reconstruct theExprobjectColumn.defaultValue: Transient field, NOT persisted in FESHOW CREATE TABLE/DESCusevalidateComplexTypeDefaultExpressionForm()to strip auto-added CAST expressions[CAST(1 AS BIGINT), ...]→ displayed as[1, 2, 3]TColumn.default_expr: The complete analyzed expression tree serialized asTExpr(Thrift format)TCreateTabletReq,TAlterTabletReqV2TOlapTableSchemaParaminOlapTableSink3.2 Backend (BE)
TExpr(Thrift expression tree) from FE viaTColumn.default_exprpreprocess_default_expr_for_tcolumns()function convertsTExprto JSON stringTExpr(ArrayExpr with [1,2,3]) → evaluates to JSON string"[1,2,3]"TColumn.default_valuefor later useTabletSchemaPB.ColumnPB.default_valueTExprDefaultValueColumnIterator::init()reads JSON string from_default_value(loaded from TabletSchema)JsonValue::parse_json_or_string()→vpack::SliceDatumusingVPackToDatumCasterDatumobjects:cast_primitive<TYPE>(),cast_decimal<DecimalType>()) eliminates code duplicationcast_array(),cast_map(),cast_struct())ObjectIterator(slice, true))Datumvalues during query execution:rowset_column_update_state.cpp) and row mode (tablet_updates.cpp)Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Enables DEFAULT values for complex types (ARRAY/MAP/STRUCT) across the stack.
DEFAULTcomplex literals via grammar; analyze/validate literal-only expressions (no funcs/CAST), requirefast_schema_evolution=true; persist analyzedExprinColumn.defaultExpr; serialize asTExpr(newTColumn.default_expr); InsertPlanner uses expr; SHOW CREATE strips analyzer-added CASTs; re-validation on load.TExprdefault expressions to JSON (convert_default_expr_to_json_string) and inject intoTColumn.default_valueviapreprocess_default_expr_for_tcolumns; integrate in create-table, schema change, scanners; extend JSON casting (cast_type_to_json_strwith ordered structs) and add robust VPack→Datum for arrays/maps/structs; DefaultValueColumnIterator now parses JSON defaults for complex types; minor JSON converter support for DATE/DATETIME.Written by Cursor Bugbot for commit 35f6f2d. This will update automatically on new commits. Configure here.