Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

[15721] Support for Outer Join #1280

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

zhaoguoquan94
Copy link
Contributor

This PR adds support for [Left | Right | Outer] JOIN queries in peloton's query optimizer.

Major changes:

  • Replace InnerJoin* Rules with Join* Rules which support not only inner join transformation rules but also outer join rules. [rule_impls.h, rule.cpp]
  • Add new Join operators [query_to_operator_transformer.cpp, operators.h,operator_node.h, internal_types.h]
  • New physical plan generators[plan_generator.cpp].
  • Bug fixes.
  • Test cases.

Current Status

  • two test cases still fails because old bug in optimizer. [planner_equality_test, optimizer_sql_test]
  • Unknown bug in executor which generate incorrect results given correct plan.

@coveralls
Copy link

coveralls commented Apr 11, 2018

Coverage Status

Coverage decreased (-0.07%) to 77.381% when pulling bb50222 on zhaoguoquan94:optimizer_project into f6446ba on cmu-db:master.

@apavlo apavlo changed the title Support for Outer Join [15721] Support for Outer Join Apr 11, 2018
Copy link
Contributor

@nappelson nappelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first passthrough. will be doing another tomorrow

Statement stmt = conn.createStatement();
ResultSet resultSet = stmt.executeQuery("SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.d;");) {
assertTrue(resultSet.next());
assertEquals(3, resultSet.getInt(4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This along with other assertEquals could probably be collapsed into a helper that takes in a map of expected values to the input into getInt to avoid writing assertEquals many times

}
ResultSet resultSet = stmt.executeQuery("SELECT COUNT(*) FROM t1;");
resultSet.next();
assertEquals(3, resultSet.getInt(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably shouldn't be doing asserts in the Setup phase for tests. make any assertions about the data in a separately contained test. I think the purpose of Setup and Teardown methods should only be for creating and deleting your testing environment per test.

//===--------------------------------------------------------------------===//
class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> {
public:
static Operator make(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation about any public methods

class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> {
public:
static Operator make(
JoinType _type, std::vector<AnnotatedExpression> conditions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you are using a leading underscore to denote _type from the member variable type. you should do the same for left_keys and right_keys if you are using that style

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to use leading underscores for this. AFAIK, we don't use that anywhere else in the code.

@@ -87,6 +87,10 @@ class AbstractJoinPlan : public AbstractPlan {
virtual void HandleSubplanBinding(bool from_left,
const BindingContext &input) = 0;

const std::string GetPredicateInfo() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation for public method

class PhysicalNLJoin : public OperatorNode<PhysicalNLJoin> {
public:
static Operator make(
JoinType _type, std::vector<AnnotatedExpression> conditions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should pass the conditions by ref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


hash_t PhysicalNLJoin::Hash() const {
hash_t hash = BaseOperatorNode::Hash();
for (auto &expr : left_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be for (cost auto &...)? Generally good to prefer using const where possible. Based on an initial glance looks like it could be but i may have missed something


hash_t PhysicalHashJoin::Hash() const {
hash_t hash = BaseOperatorNode::Hash();
for (auto &expr : left_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above regarding constness

Copy link
Member

@GustavoAngulo GustavoAngulo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Consolidating joins like how you did was a good decision.

memo_->GetGroupByID(gexpr_->GetChildGroupId(1))->GetNumRows();

output_cost_ = left_child_rows * right_child_rows * DEFAULT_TUPLE_COST;
}
void CostCalculator::Visit(UNUSED_ATTRIBUTE const PhysicalInnerNLJoin *op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the cost functions below (PhysicalInnerNLJoin, PhysicalLeftNLJoin, etc) still needed with the addition of the two new cost functions (PhysicalNLJoin, PhysicalHashJoin)

auto right_child_rows =
memo_->GetGroupByID(gexpr_->GetChildGroupId(1))->GetNumRows();

output_cost_ = left_child_rows * right_child_rows * DEFAULT_TUPLE_COST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cost is different from the original cost here. Is this change intended?


bool Check(std::shared_ptr<OperatorExpression> plan,
OptimizeContext *context) const override;

void Transform(std::shared_ptr<OperatorExpression> input,
std::vector<std::shared_ptr<OperatorExpression>> &transformed,
OptimizeContext *context) const override;
private:
bool StrongPredicate(std::shared_ptr<OperatorExpression> plan,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some documentation as to why a strong predicate check is needed for outer join associativity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GustavoAngulo Where do you want them to add that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have a central documentation for the optimizer yet, so above the implementation of the StrongPredicate function should be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can we add this in https://github.com/cmu-db/peloton/blob/master/src/include/optimizer/util.h? It makes more sense add it as a helper function instead of a member function of the rule.

join_plan->AddChild(move(hash_plan));
output_plan_ = move(join_plan);
}

void PlanGenerator::Visit(const PhysicalInnerNLJoin *op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the methods for PhysicalInnerNLJoin and PhysicalInnerHashJoin if we are consolidating Inner+Outer joins

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

(void)context;
(void)plan;
return true;
}

void InnerJoinToInnerNLJoin::Transform(
void JoinToNLJoin::Transform(
std::shared_ptr<OperatorExpression> input,
std::vector<std::shared_ptr<OperatorExpression>> &transformed,
UNUSED_ATTRIBUTE OptimizeContext *context) const {
// first build an expression representing hash join
Copy link
Member

@GustavoAngulo GustavoAngulo Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be "representing nested loop join"

e.printStackTrace();
fail();
}
try (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest splitting these into different tests. That way if it fails it will be quicker to identify where it went wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

e.printStackTrace();
fail();
}
try (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

hash_t hash = BaseOperatorNode::Hash();
for (auto &expr : left_keys)
hash = HashUtil::CombineHashes(hash, expr->Hash());
for (auto &expr : right_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

return hash;
}

bool PhysicalHashJoin::operator==(const BaseOperatorNode &r) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that this method shares a lot of code with the PhysicalNLJoin::operator==. I'd recommend making the code reuasable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

vector<oid_t> left_keys;
vector<oid_t> right_keys;
for (auto &expr : op->left_keys) {
PELOTON_ASSERT(children_expr_map_[0].find(expr.get()) !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't immediately clear what is in children_expr_map_[0] vs [1]

It looks like this is how it is done in other parts of the code (not in your PR) but it may be worthwhile adding some documentation on why we only expect there to be two elements in this vector


vector<unique_ptr<const expression::AbstractExpression>> left_keys;
vector<unique_ptr<const expression::AbstractExpression>> right_keys;
vector<ExprMap> l_child_map{move(children_expr_map_[0])};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer left_child_map instead of l_child_map. generally good to use full words when possible (you wouldn't exceed the line length in this case)

Same can be applied to r_child_map

}
// Evaluate Expr for hash plan
vector<unique_ptr<const expression::AbstractExpression>> hash_keys;
for (auto &expr : op->right_keys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd imagine these can be const auto &expr based on what is going on here but could be wrong

@@ -70,6 +72,81 @@ TEST_F(OptimizerRuleTests, SimpleCommutativeRuleTest) {
EXPECT_EQ(output_join->Children()[1], left_get);
}

TEST_F(OptimizerRuleTests, LeftJoinCommutativeRuleTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment on what this test is trying to do

EXPECT_EQ(output_join->Op().As<LogicalJoin>()->type, JoinType::RIGHT);
}

TEST_F(OptimizerRuleTests, RightJoinCommutativeRuleTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

EXPECT_EQ(output_join->Op().As<LogicalJoin>()->type, JoinType::LEFT);
}

TEST_F(OptimizerRuleTests, OuterJoinCommutativeRuleTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same regarding comment

}

TEST_F(OptimizerRuleTests, RightJoinCommutativeRuleTest) {
// Build op plan node to match rule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code here looks largely similar to the LeftJoin test from above. i'd recommend trying to share as much as you can

}

TEST_F(OptimizerRuleTests, OuterJoinCommutativeRuleTest) {
// Build op plan node to match rule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same regarding code reuse

@apavlo
Copy link
Member

apavlo commented Apr 26, 2018

Good job. I am closing this PR. You can resubmit it at the end of the semester.

@nappelson @GustavoAngulo @dasteere Nice work.

@apavlo apavlo closed this Apr 26, 2018
@apavlo
Copy link
Member

apavlo commented Apr 27, 2018

I decided to open this back up

@apavlo apavlo reopened this Apr 27, 2018
@qwkai-zz qwkai-zz force-pushed the optimizer_project branch from 5b7650c to e1a4797 Compare April 30, 2018 04:04
qwkai-zz and others added 3 commits May 1, 2018 03:11
1. Refacored code, added necessary comments.
2. Removed PhysicalInnerNLJoin and PhysicalInnerHashJoin (left/right/outer as well).
@tli2
Copy link
Contributor

tli2 commented Jun 4, 2018

Who is currently responsible for merging this in? Are code reviews here outdated or still being addressed?

@apavlo
Copy link
Member

apavlo commented Jun 4, 2018

@saatviks is the PR czar this month.

@saatviks
Copy link
Contributor

saatviks commented Jun 4, 2018

@apavlo I was last month - For June its @tli2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.