-
Notifications
You must be signed in to change notification settings - Fork 619
[15721] Support for Outer Join #1280
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did a first passthrough. will be doing another tomorrow
Statement stmt = conn.createStatement(); | ||
ResultSet resultSet = stmt.executeQuery("SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.d;");) { | ||
assertTrue(resultSet.next()); | ||
assertEquals(3, resultSet.getInt(4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This along with other assertEquals could probably be collapsed into a helper that takes in a map of expected values to the input into getInt to avoid writing assertEquals many times
} | ||
ResultSet resultSet = stmt.executeQuery("SELECT COUNT(*) FROM t1;"); | ||
resultSet.next(); | ||
assertEquals(3, resultSet.getInt(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably shouldn't be doing asserts in the Setup phase for tests. make any assertions about the data in a separately contained test. I think the purpose of Setup and Teardown methods should only be for creating and deleting your testing environment per test.
src/include/optimizer/operators.h
Outdated
//===--------------------------------------------------------------------===// | ||
class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> { | ||
public: | ||
static Operator make( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add documentation about any public methods
src/include/optimizer/operators.h
Outdated
class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> { | ||
public: | ||
static Operator make( | ||
JoinType _type, std::vector<AnnotatedExpression> conditions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you are using a leading underscore to denote _type
from the member variable type. you should do the same for left_keys and right_keys if you are using that style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to use leading underscores for this. AFAIK, we don't use that anywhere else in the code.
@@ -87,6 +87,10 @@ class AbstractJoinPlan : public AbstractPlan { | |||
virtual void HandleSubplanBinding(bool from_left, | |||
const BindingContext &input) = 0; | |||
|
|||
const std::string GetPredicateInfo() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add documentation for public method
src/include/optimizer/operators.h
Outdated
class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> { | ||
public: | ||
static Operator make( | ||
JoinType _type, std::vector<AnnotatedExpression> conditions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should pass the conditions
by ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
src/optimizer/operators.cpp
Outdated
|
||
hash_t PhysicalNLJoin::Hash() const { | ||
hash_t hash = BaseOperatorNode::Hash(); | ||
for (auto &expr : left_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these be for (cost auto &...
)? Generally good to prefer using const where possible. Based on an initial glance looks like it could be but i may have missed something
src/optimizer/operators.cpp
Outdated
|
||
hash_t PhysicalHashJoin::Hash() const { | ||
hash_t hash = BaseOperatorNode::Hash(); | ||
for (auto &expr : left_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above regarding constness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Consolidating joins like how you did was a good decision.
src/optimizer/cost_calculator.cpp
Outdated
memo_->GetGroupByID(gexpr_->GetChildGroupId(1))->GetNumRows(); | ||
|
||
output_cost_ = left_child_rows * right_child_rows * DEFAULT_TUPLE_COST; | ||
} | ||
void CostCalculator::Visit(UNUSED_ATTRIBUTE const PhysicalInnerNLJoin *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the cost functions below (PhysicalInnerNLJoin
, PhysicalLeftNLJoin
, etc) still needed with the addition of the two new cost functions (PhysicalNLJoin
, PhysicalHashJoin
)
src/optimizer/cost_calculator.cpp
Outdated
auto right_child_rows = | ||
memo_->GetGroupByID(gexpr_->GetChildGroupId(1))->GetNumRows(); | ||
|
||
output_cost_ = left_child_rows * right_child_rows * DEFAULT_TUPLE_COST; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cost is different from the original cost here. Is this change intended?
src/include/optimizer/rule_impls.h
Outdated
|
||
bool Check(std::shared_ptr<OperatorExpression> plan, | ||
OptimizeContext *context) const override; | ||
|
||
void Transform(std::shared_ptr<OperatorExpression> input, | ||
std::vector<std::shared_ptr<OperatorExpression>> &transformed, | ||
OptimizeContext *context) const override; | ||
private: | ||
bool StrongPredicate(std::shared_ptr<OperatorExpression> plan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some documentation as to why a strong predicate check is needed for outer join associativity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GustavoAngulo Where do you want them to add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have a central documentation for the optimizer yet, so above the implementation of the StrongPredicate
function should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, can we add this in https://github.com/cmu-db/peloton/blob/master/src/include/optimizer/util.h? It makes more sense add it as a helper function instead of a member function of the rule.
src/optimizer/plan_generator.cpp
Outdated
join_plan->AddChild(move(hash_plan)); | ||
output_plan_ = move(join_plan); | ||
} | ||
|
||
void PlanGenerator::Visit(const PhysicalInnerNLJoin *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the methods for PhysicalInnerNLJoin
and PhysicalInnerHashJoin
if we are consolidating Inner+Outer joins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
src/optimizer/rule_impls.cpp
Outdated
(void)context; | ||
(void)plan; | ||
return true; | ||
} | ||
|
||
void InnerJoinToInnerNLJoin::Transform( | ||
void JoinToNLJoin::Transform( | ||
std::shared_ptr<OperatorExpression> input, | ||
std::vector<std::shared_ptr<OperatorExpression>> &transformed, | ||
UNUSED_ATTRIBUTE OptimizeContext *context) const { | ||
// first build an expression representing hash join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be "representing nested loop join"
e.printStackTrace(); | ||
fail(); | ||
} | ||
try ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest splitting these into different tests. That way if it fails it will be quicker to identify where it went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
e.printStackTrace(); | ||
fail(); | ||
} | ||
try ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
src/optimizer/operators.cpp
Outdated
hash_t hash = BaseOperatorNode::Hash(); | ||
for (auto &expr : left_keys) | ||
hash = HashUtil::CombineHashes(hash, expr->Hash()); | ||
for (auto &expr : right_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
src/optimizer/operators.cpp
Outdated
return hash; | ||
} | ||
|
||
bool PhysicalHashJoin::operator==(const BaseOperatorNode &r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that this method shares a lot of code with the PhysicalNLJoin::operator==
. I'd recommend making the code reuasable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
src/optimizer/plan_generator.cpp
Outdated
vector<oid_t> left_keys; | ||
vector<oid_t> right_keys; | ||
for (auto &expr : op->left_keys) { | ||
PELOTON_ASSERT(children_expr_map_[0].find(expr.get()) != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't immediately clear what is in children_expr_map_[0] vs [1]
It looks like this is how it is done in other parts of the code (not in your PR) but it may be worthwhile adding some documentation on why we only expect there to be two elements in this vector
src/optimizer/plan_generator.cpp
Outdated
|
||
vector<unique_ptr<const expression::AbstractExpression>> left_keys; | ||
vector<unique_ptr<const expression::AbstractExpression>> right_keys; | ||
vector<ExprMap> l_child_map{move(children_expr_map_[0])}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer left_child_map
instead of l_child_map
. generally good to use full words when possible (you wouldn't exceed the line length in this case)
Same can be applied to r_child_map
src/optimizer/plan_generator.cpp
Outdated
} | ||
// Evaluate Expr for hash plan | ||
vector<unique_ptr<const expression::AbstractExpression>> hash_keys; | ||
for (auto &expr : op->right_keys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd imagine these can be const auto &expr
based on what is going on here but could be wrong
@@ -70,6 +72,81 @@ TEST_F(OptimizerRuleTests, SimpleCommutativeRuleTest) { | |||
EXPECT_EQ(output_join->Children()[1], left_get); | |||
} | |||
|
|||
TEST_F(OptimizerRuleTests, LeftJoinCommutativeRuleTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment on what this test is trying to do
EXPECT_EQ(output_join->Op().As<LogicalJoin>()->type, JoinType::RIGHT); | ||
} | ||
|
||
TEST_F(OptimizerRuleTests, RightJoinCommutativeRuleTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
EXPECT_EQ(output_join->Op().As<LogicalJoin>()->type, JoinType::LEFT); | ||
} | ||
|
||
TEST_F(OptimizerRuleTests, OuterJoinCommutativeRuleTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same regarding comment
} | ||
|
||
TEST_F(OptimizerRuleTests, RightJoinCommutativeRuleTest) { | ||
// Build op plan node to match rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code here looks largely similar to the LeftJoin test from above. i'd recommend trying to share as much as you can
} | ||
|
||
TEST_F(OptimizerRuleTests, OuterJoinCommutativeRuleTest) { | ||
// Build op plan node to match rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same regarding code reuse
Good job. I am closing this PR. You can resubmit it at the end of the semester. @nappelson @GustavoAngulo @dasteere Nice work. |
I decided to open this back up |
5b7650c
to
e1a4797
Compare
… one operator with a JoinType to represent them all. * Clang-format. * Update the optimizer_rule_test and optimizer_test to use new operators.
…ed even if we actually push it to the stack.
…s executed even if we actually push it to the stack." This reverts commit 078c060.
1. Refacored code, added necessary comments. 2. Removed PhysicalInnerNLJoin and PhysicalInnerHashJoin (left/right/outer as well).
d201ca4
to
5412a11
Compare
…peloton into optimizer_project
Who is currently responsible for merging this in? Are code reviews here outdated or still being addressed? |
@saatviks is the PR czar this month. |
This PR adds support for [Left | Right | Outer] JOIN queries in peloton's query optimizer.
Major changes:
Current Status