Skip to content

Commit e953931

Browse files
oerlingmeta-codesync[bot]
authored andcommitted
Track Value constraints through plan construction
Summary: This diff implements end-to-end constraint propagation through the Axiom query plan construction process. Building on the filter selectivity estimation framework (D88164653), this change tracks refined Value constraints (min/max ranges, cardinalities, null fractions) from leaf scans through joins, filters, projects, and aggregations, enabling more accurate cardinality estimates at every level of the plan. The core motivation is to improve cost-based optimization decisions by maintaining up-to-date Value metadata as constraints are derived from predicates and joins. For example, after a filter `WHERE age > 18 AND age < 65`, the optimizer now knows the age column has min=18, max=65, and reduced cardinality, which influences downstream join cost estimates. Similarly, join equalities like `customer.id = order.customer_id` update both columns' constraints to reflect the intersection of their ranges and cardinalities. This constraint tracking is essential for: - Accurate join cardinality estimation (especially for multi-way joins) - Proper null handling in outer joins (tracking which columns become nullable) - Aggregate cardinality estimation (result size of GROUP BY) - Project cardinality propagation (computing distinct counts through expressions) - Filter selectivity estimation (more accurate with refined input constraints) Main changes: 1. **PlanState constraint tracking** (Plan.h, Plan.cpp): - Added `ConstraintMap constraints` field to PlanState to track Value constraints by expression ID during plan construction - Added `exprConstraint()` function that derives constraints for expressions based on their type: - Literals: use the literal's min/max (already set in expr->value()) - Columns: look up in state.constraints, fall back to expr->value() - Field access: propagate cardinality from base expression - Aggregates: set cardinality from state.cost.cardinality (group count) - Function calls: use functionConstraint metadata if available, otherwise max of argument cardinalities - Modified Plan constructor to populate Plan::constraints from state.constraints using QueryGraphContext::registerAny() for lifetime management - Modified NextJoin to preserve constraints across join candidates - Added PlanStateSaver to save/restore constraints along with cost and placed sets 2. **Join constraint propagation** (RelationOp.cpp): - Added `addJoinConstraint()` function to update constraints for join key pairs: - For inner joins: both sides become non-nullable (nullFraction=0), ranges intersect via columnComparisonSelectivity - For outer joins: optional side becomes nullable, nullFraction set to (1 - innerFanout) - Non-key columns from optional side get nullFraction = (1 - innerFanout) - Added `addJoinConstraints()` to apply constraint updates for all key pairs - Modified Join constructor to: - Accept new `innerFanout` parameter (fanout if this were an inner join, used for null fraction calculation) - Call addJoinConstraints() with join type information - Handle filter expressions using conjunctsSelectivity() to get filter selectivity - For semi-joins and anti-joins, multiply fanout by filter selectivity - For semi-project (mark joins), update the mark column's trueFraction - Propagate non-key nullable constraints for outer joins 3. **Filter constraint integration** (Filters.cpp, Filters.h): - Modified `value()` function to first check state.constraints before falling back to expr->value() - Added `addConstraint()` helper that enforces type-based cardinality limits: - BOOLEAN: max 2 distinct values - TINYINT: max 256 distinct values - SMALLINT: max 65,536 distinct values - Modified `conjunctsSelectivity()` to call `exprConstraint()` for each conjunct before computing selectivity, ensuring constraints are computed and cached - Changed `exprSelectivity()` and `conjunctsSelectivity()` signatures to take non-const `PlanState&` to allow updating state.constraints - Added `constraintsString()` debugging helper to format ConstraintMap as readable string 4. **Project constraint propagation** (RelationOp.cpp): - Modified Project constructor to accept PlanState parameter - Added loop to derive and store constraints for each output column by calling exprConstraint() on projection expressions - This ensures that projected expressions (e.g., `col + 1`, `upper(name)`) have accurate cardinality estimates 5. **Aggregation constraint propagation** (RelationOp.cpp): - Modified Aggregation constructor to accept PlanState parameter - Modified `setCostWithGroups()` to accept PlanState - Grouping keys get cardinality from the aggregation's result cardinality (number of groups) - Aggregate functions get cardinality from exprConstraint() evaluation 6. **VeloxHistory integration** (VeloxHistory.cpp, VeloxHistory.h): - Added `setBaseTableValues()` function to update BaseTable column Values from ConstraintMap - Modified `findLeafSelectivity()` to always: 1. Call conjunctsSelectivity() with updateConstraints=true to get constraints 2. Update BaseTable column Values using setBaseTableValues() 3. Optionally sample if sampling is enabled - This ensures filter-derived constraints (min/max, cardinality, null fractions) are applied to base table columns before planning 7. **Optimization.cpp updates**: - Updated all RelationOp construction sites to pass PlanState: - Project: added state parameter to constructor calls - Aggregation: added state parameter, including planSingleAggregation() - Join: added innerFanout parameter and state - Filter: already had state - Modified PrecomputeProjection::maybeProject() to accept PlanState parameter - Updated makeDistinct() to accept PlanState - Updated Join::makeCrossJoin() to accept PlanState - Ensured PlanStateSaver preserves constraints when exploring alternative join orders 8. **QueryGraphContext lifetime management** (QueryGraphContext.h): - Added `registerAny()` template function to take ownership of arbitrary objects (stored as shared_ptr<void>) - Added `ownedObjects_` set and `mutex_` for thread-safe lifetime management - Used to manage ConstraintMap pointers in Plan objects, ensuring they remain valid throughout optimization 9. **Schema.h/Schema.cpp updates**: - Changed Value::cardinality from const to non-const to allow constraint updates - Added Value assignment operator that validates type equality before assigning - Added Value::toString() for debugging 10. **ToGraph.cpp updates**: - Updated all plan construction to pass PlanState to constructors - Modified constant deduplication to set min/max for literals to the literal value 11. **FunctionRegistry.h extension**: - Added optional `functionConstraint` callback to FunctionMetadata - Allows functions to provide custom constraint derivation logic - Returns `std::optional<Value>` with refined constraints for the function result 12. **Comprehensive test coverage** (ConstraintsTest.cpp, 347 lines): - `scanEquality`: Tests join equality constraints (n_nationkey = r_regionkey), verifies min/max ranges intersect and cardinality is limited to intersection size - `aggregateConstraint`: Tests grouping key cardinality propagates to all output columns - `projectConstraint`: Tests projected expression cardinality (col+1, col+col) inherits from source columns - `outer`: Tests outer join null fraction propagation to optional side (left join, right side becomes nullable with nullFraction ≈ 0.8) - `bitwiseAnd`: Tests custom functionConstraint for bitwise_and, verifies min=0 and max=min(arg1.max, arg2.max) The implementation maintains the invariant that state.constraints contains the most up-to-date Value metadata for all expressions in the current plan state. When exploring alternative join orders or plan structures, PlanStateSaver ensures constraints are properly saved and restored. This enables accurate "what-if" analysis during optimization without polluting the global expression Value metadata. The constraint propagation integrates seamlessly with the filter selectivity estimation from D88164653: - Filters call conjunctsSelectivity() which updates state.constraints - Joins use columnComparisonSelectivity() which updates constraints for join keys - All downstream operators see refined constraints via value(state, expr) - Cost estimation at each operator uses the most accurate available cardinality Future enhancements enabled by this infrastructure: - Constraint-based partition pruning (skip partitions outside min/max range) - Dynamic filter pushdown (propagate join-derived constraints to scans) - Constraint-based empty result detection (min > max indicates zero rows) - Correlation detection (tracking when columns have matching values) Differential Revision: D89130357
1 parent a9e1ac0 commit e953931

23 files changed

+1159
-176
lines changed

axiom/optimizer/Filters.cpp

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,32 @@ float combineSelectivities(
127127
}
128128

129129
const Value& value(const PlanState& state, ExprCP expr) {
130+
auto it = state.constraints.find(expr->id());
131+
if (it != state.constraints.end()) {
132+
return it->second;
133+
}
130134
return expr->value();
131135
}
132136

137+
void addConstraint(int32_t exprId, Value value, ConstraintMap& constraints) {
138+
// Limit cardinality based on type kind
139+
auto typeKind = value.type->kind();
140+
141+
if (typeKind == velox::TypeKind::BOOLEAN) {
142+
// Boolean can have at most 2 distinct values (true/false)
143+
value.cardinality = std::min(value.cardinality, 2.0f);
144+
} else if (typeKind == velox::TypeKind::TINYINT) {
145+
// TINYINT (int8) can have at most 256 distinct values (-128 to 127)
146+
value.cardinality = std::min(value.cardinality, 256.0f);
147+
} else if (typeKind == velox::TypeKind::SMALLINT) {
148+
// SMALLINT (int16) can have at most 65536 distinct values
149+
value.cardinality = std::min(value.cardinality, 65536.0f);
150+
}
151+
// Other types (INTEGER, BIGINT, VARCHAR, etc.) have no practical limit
152+
153+
constraints.insert_or_assign(exprId, value);
154+
}
155+
133156
Selectivity comparisonSelectivity(
134157
const PlanState& state,
135158
ExprCP expr,
@@ -216,10 +239,15 @@ Selectivity functionSelectivity(
216239
}
217240

218241
Selectivity conjunctsSelectivity(
219-
const PlanState& state,
242+
PlanState& state,
220243
std::span<const ExprCP> conjuncts,
221244
bool updateConstraints,
222245
ConstraintMap& newConstraints) {
246+
// Update constraints for each conjunct before processing
247+
for (auto* conjunct : conjuncts) {
248+
exprConstraint(conjunct, state, true);
249+
}
250+
223251
std::vector<Selectivity> selectivities;
224252
selectivities.reserve(conjuncts.size());
225253

@@ -271,7 +299,7 @@ Selectivity conjunctsSelectivity(
271299
}
272300

273301
Selectivity exprSelectivity(
274-
const PlanState& state,
302+
PlanState& state,
275303
ExprCP expr,
276304
bool updateConstraints,
277305
ConstraintMap& newConstraints) {
@@ -1313,4 +1341,20 @@ Selectivity rangeSelectivity(
13131341
return {0.5 * (1.0 - nullFrac), nullFrac};
13141342
}
13151343

1344+
/// Declared in namespace to allow calling from debugger.
1345+
std::string constraintsString(ConstraintMap& constraints) {
1346+
std::stringstream out;
1347+
for (const auto& pair : constraints) {
1348+
out << pair.first;
1349+
if (queryCtx() != nullptr) {
1350+
auto* expr = queryCtx()->objectAt(pair.first);
1351+
if (expr != nullptr) {
1352+
out << " (" << expr->toString() << ")";
1353+
}
1354+
}
1355+
out << " = " << pair.second.toString() << "\n";
1356+
}
1357+
return out.str();
1358+
}
1359+
13161360
} // namespace facebook::axiom::optimizer

axiom/optimizer/Filters.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ float combineSelectivities(
4343

4444
const Value& value(const PlanState& state, ExprCP expr);
4545

46+
/// Adds a constraint to the constraint map with cardinality limited by type.
47+
/// For BOOLEAN: max 2, TINYINT: max 256, SMALLINT: max 65536.
48+
void addConstraint(int32_t exprId, Value value, ConstraintMap& constraints);
49+
4650
Selectivity columnComparisonSelectivity(
4751
ExprCP left,
4852
ExprCP right,
@@ -53,13 +57,13 @@ Selectivity columnComparisonSelectivity(
5357
ConstraintMap& constraints);
5458

5559
Selectivity exprSelectivity(
56-
const PlanState& state,
60+
PlanState& state,
5761
ExprCP expr,
5862
bool updateConstraints,
5963
ConstraintMap& newConstraints);
6064

6165
Selectivity conjunctsSelectivity(
62-
const PlanState& state,
66+
PlanState& state,
6367
std::span<const ExprCP> conjuncts,
6468
bool updateConstraints,
6569
ConstraintMap& newConstraints);

axiom/optimizer/FunctionRegistry.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@
2020

2121
namespace facebook::axiom::optimizer {
2222

23+
struct PlanState;
24+
struct Value;
25+
2326
/// A bit set that qualifies an Expr. Represents which functions/kinds
2427
/// of functions are found inside the children of an Expr.
2528
class FunctionSet {
@@ -182,6 +185,10 @@ struct FunctionMetadata {
182185
const logical_plan::CallExpr* call,
183186
std::vector<PathCP>& paths)>
184187
explode;
188+
189+
/// Function to compute derived constraints for function calls.
190+
std::function<std::optional<Value>(ExprCP, PlanState& state)>
191+
functionConstraint;
185192
};
186193

187194
using FunctionMetadataCP = const FunctionMetadata*;

axiom/optimizer/JoinSample.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,15 +142,20 @@ std::shared_ptr<runner::Runner> prepareSampleRunner(
142142
make<Call>(toName(kHashMix), bigintValue(), hashes, FunctionSet{});
143143

144144
ColumnCP hashColumn = make<Column>(toName("hash"), nullptr, hash->value());
145+
146+
// Create temporary PlanState for Project and Filter constructors
147+
PlanState tempState(*queryCtx()->optimization(), nullptr);
148+
145149
RelationOpPtr project = make<Project>(
146-
scan, ExprVector{hash}, ColumnVector{hashColumn}, /*redundant=*/false);
150+
scan,
151+
ExprVector{hash},
152+
ColumnVector{hashColumn},
153+
/*redundant=*/false,
154+
tempState);
147155

148156
// (hash % mod) < lim
149157
ExprCP filterExpr = makeCall(
150158
kSample, velox::BOOLEAN(), hashColumn, bigintLit(mod), bigintLit(lim));
151-
152-
// Create temporary PlanState for Filter constructor
153-
PlanState tempState(*queryCtx()->optimization(), nullptr);
154159
RelationOpPtr filter =
155160
make<Filter>(tempState, project, ExprVector{filterExpr});
156161

0 commit comments

Comments
 (0)