Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

[15721] Blocking Schema Changes #1341

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
0d83528
add alter table catalog
DeanChensj Apr 2, 2018
f015e94
add change column catalog
DeanChensj Apr 5, 2018
583ec6a
Revert "Merge pull request #1 from DeanChensj/catlog"
Apr 5, 2018
469474b
This commit added parser support of alter table
Dingshilun Apr 5, 2018
7b2ae56
add rename column executor
sxzh93 Apr 7, 2018
f255f78
Catalog for change column name
DeanChensj Apr 6, 2018
fd0cb0b
resolve comments, add header, TODO, remove context in subclass
sxzh93 Apr 8, 2018
a3a7cfa
fix
sxzh93 Apr 8, 2018
9e07a95
add alter table catalog
DeanChensj Apr 2, 2018
e695ac2
add change column catalog
DeanChensj Apr 5, 2018
d843ae3
Revert "Merge pull request #1 from DeanChensj/catlog"
Apr 5, 2018
59e5982
This commit added parser support of alter table
Dingshilun Apr 5, 2018
dd42a3b
add rename column executor
sxzh93 Apr 7, 2018
a845135
Catalog for change column name
DeanChensj Apr 6, 2018
57e9a6e
resolve comments, add header, TODO, remove context in subclass
sxzh93 Apr 8, 2018
3575048
fix
sxzh93 Apr 8, 2018
6b5da72
AlterTable init test
DeanChensj Apr 8, 2018
b08fa9e
added headers, comments and TODO
Dingshilun Apr 10, 2018
0e46c90
remove comments
DeanChensj Apr 11, 2018
d07920e
add alter table catalog
DeanChensj Apr 2, 2018
88a397c
add change column catalog
DeanChensj Apr 5, 2018
f0ecc68
Revert "Merge pull request #1 from DeanChensj/catlog"
Apr 5, 2018
a2f73d9
add rename column executor
sxzh93 Apr 7, 2018
ed7a2a9
Catalog for change column name
DeanChensj Apr 6, 2018
397cf24
fix
sxzh93 Apr 8, 2018
e61451e
add alter table catalog
DeanChensj Apr 2, 2018
098a35a
add change column catalog
DeanChensj Apr 5, 2018
f344a55
Revert "Merge pull request #1 from DeanChensj/catlog"
Apr 5, 2018
e930b11
add rename column executor
sxzh93 Apr 7, 2018
c31ef3a
Catalog for change column name
DeanChensj Apr 6, 2018
74ca51b
fix
sxzh93 Apr 8, 2018
ed1a0de
AlterTable init test
DeanChensj Apr 8, 2018
ef6b9f4
Add transactional test
DeanChensj Apr 12, 2018
844e564
Fix mac compile error
DeanChensj Apr 28, 2018
8cdea77
Addressed comments in test and catalog
DeanChensj Apr 28, 2018
c70353e
add rename column executor
sxzh93 Apr 7, 2018
d3b1321
fix
sxzh93 Apr 8, 2018
6a294fc
This commit added parser support of alter table
Dingshilun Apr 5, 2018
f7af80f
added alter table drop column parser, planner, and catalog support, w…
Dingshilun Apr 16, 2018
d7465e3
add drop column executor
sxzh93 Apr 16, 2018
bf20b05
refacted the rename into alter (parser, planner)
Dingshilun Apr 28, 2018
d1c8437
Merge pull request #9 from sxzh93/refactor_rename_alter_parser
Dingshilun Apr 29, 2018
1ce6ed0
merge
DeanChensj Apr 29, 2018
66220b5
Merge pull request #10 from sxzh93/address-comment
DeanChensj Apr 29, 2018
32ec8fd
added parser support for change column type and add column
Dingshilun Apr 29, 2018
36f111a
Merge pull request #11 from sxzh93/add_column_parser_planner
DeanChensj Apr 29, 2018
253b471
new implement of rename
DeanChensj Apr 29, 2018
41039b8
Merge pull request #13 from sxzh93/new-rename
DeanChensj Apr 30, 2018
4df594b
alter table in catalog
DeanChensj Apr 30, 2018
cf0c5a6
Merge pull request #15 from sxzh93/alter-table
Dingshilun May 2, 2018
21ee68d
added change type into alter tabe
Dingshilun May 3, 2018
68f492f
add alter executor
sxzh93 May 1, 2018
9904788
resolve pr comments
sxzh93 May 3, 2018
a1e7265
Merge pull request #16 from sxzh93/alter_executor
Dingshilun May 3, 2018
fdce5cf
modified alter plan, fixed bugs
Dingshilun May 3, 2018
ae65f67
added change type, changed logic in alter_executor, varchar still not…
Dingshilun May 3, 2018
4e4dfdc
addressed comments and fix varchar
DeanChensj May 4, 2018
4d00de1
Merge pull request #17 from sxzh93/change_type
DeanChensj May 4, 2018
5e454d4
basic test for alterTable
DeanChensj May 4, 2018
bc26411
add alter table tests, change unsupported type, multiple operation in…
sxzh93 May 4, 2018
32a260f
Merge pull request #18 from sxzh93/alter_test
DeanChensj May 5, 2018
72c35ac
added alter varchar length support, changed the plan to use schema
Dingshilun May 5, 2018
733fa9b
Merge pull request #19 from sxzh93/varchar_length
Dingshilun May 5, 2018
a57a5df
rebase master
sxzh93 May 5, 2018
5b9818c
addressed the modification from cmu-db master
Dingshilun May 5, 2018
939df35
change type quick fix
DeanChensj May 13, 2018
20f5d27
add gc logic
DeanChensj May 13, 2018
dbf6d7d
added alter benchmark, leave all workload of benchmark to be empty
Dingshilun May 14, 2018
fc3e951
cache-only set to true when get db object
sxzh93 May 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
402 changes: 402 additions & 0 deletions script/testing/junit/AlterTableTest.java

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions src/binder/bind_node_visitor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ void BindNodeVisitor::Visit(parser::DropStatement *node) {
void BindNodeVisitor::Visit(parser::PrepareStatement *) {}
void BindNodeVisitor::Visit(parser::ExecuteStatement *) {}
void BindNodeVisitor::Visit(parser::TransactionStatement *) {}
void BindNodeVisitor::Visit(parser::AlterTableStatement *node) {
node->TryBindDatabaseName(default_database_name_);
}
void BindNodeVisitor::Visit(parser::AnalyzeStatement *node) {
node->TryBindDatabaseName(default_database_name_);
}
Expand Down
286 changes: 286 additions & 0 deletions src/catalog/catalog.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,16 @@
#include "catalog/table_metrics_catalog.h"
#include "catalog/trigger_catalog.h"
#include "concurrency/transaction_manager_factory.h"
#include "executor/executor_context.h"
#include "executor/insert_executor.h"
#include "executor/seq_scan_executor.h"
#include "function/date_functions.h"
#include "function/decimal_functions.h"
#include "function/old_engine_string_functions.h"
#include "function/timestamp_functions.h"
#include "index/index_factory.h"
#include "planner/insert_plan.h"
#include "planner/seq_scan_plan.h"
#include "settings/settings_manager.h"
#include "storage/storage_manager.h"
#include "storage/table_factory.h"
Expand Down Expand Up @@ -930,6 +935,287 @@ std::shared_ptr<SystemCatalogs> Catalog::GetSystemCatalogs(
return catalog_map_[database_oid];
}

//===--------------------------------------------------------------------===//
// ALTER TABLE
//===--------------------------------------------------------------------===//

/**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[good] Good comments!

* @brief Helper function for alter table, called internally
* @param database_oid database to which the table belongs to
* @param table_oid table to which the column belongs to
* @param new_schema the new table schema
* @param txn the transaction Context
* @return TransactionContext ResultType(SUCCESS or FAILURE)
*/
ResultType Catalog::AlterTable(oid_t database_oid, oid_t table_oid, const std::string &schema_name,
std::unique_ptr<catalog::Schema> &new_schema,
concurrency::TransactionContext *txn) {
LOG_TRACE("AlterTable in Catalog");

if (txn == nullptr)
throw CatalogException("Alter table requires transaction");
try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Why double try catch block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we'll fix this.

auto storage_manager = storage::StorageManager::GetInstance();
auto database = storage_manager->GetDatabaseWithOid(database_oid);
try {
auto old_table = database->GetTableWithOid(table_oid);
auto old_schema = old_table->GetSchema();
auto pg_index = catalog_map_[database_oid]->GetIndexCatalog();

// Step 1: build empty table with new schema

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[good] Informative comments

bool own_schema = true;
bool adapt_table = false;
auto new_table = storage::TableFactory::GetDataTable(
database_oid, table_oid,
catalog::Schema::CopySchema(new_schema.get()), old_table->GetName(),
DEFAULT_TUPLES_PER_TILEGROUP, own_schema, adapt_table);
// Step 2: Copy indexes
auto old_index_oids = pg_index->GetIndexObjects(table_oid, txn);
for (auto index_oid_pair : old_index_oids) {
oid_t index_oid = index_oid_pair.first;
// delete record in pg_index
pg_index->DeleteIndex(index_oid, txn);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not delete index in pg_index only if not all indexed columns still exists?

// Check if all indexed columns still exists
auto old_index = old_table->GetIndexWithOid(index_oid);
bool index_exist = true;
std::vector<oid_t> new_key_attrs;

for (oid_t column_id : old_index->GetMetadata()->GetKeyAttrs()) {
bool is_found = false;
std::string column_name = old_schema->GetColumn(column_id).GetName();
oid_t i = 0;
for (auto new_column : new_schema->GetColumns()) {
if (column_name == new_column.GetName()) {
is_found = true;
new_key_attrs.push_back(i);
break;
}
i++;
}
if (!is_found) {
index_exist = false;
break;
}
}
if (!index_exist) continue;

// construct index on new table
auto index_metadata = new index::IndexMetadata(
old_index->GetName(), index_oid, table_oid, database_oid,
old_index->GetMetadata()->GetIndexType(),
old_index->GetMetadata()->GetIndexConstraintType(),
new_schema.get(),
catalog::Schema::CopySchema(new_schema.get(), new_key_attrs),
new_key_attrs, old_index->GetMetadata()->HasUniqueKeys());

std::shared_ptr<index::Index> new_index(
index::IndexFactory::GetIndex(index_metadata));
new_table->AddIndex(new_index);

// reinsert record into pg_index
pg_index->InsertIndex(
Copy link

@star013 star013 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[race condition]
It is risky to manipulate index directly under multiple transactions, because it is not protected by any consistency control mechanism like locks. It is better to do such operations through an executor just like your step 4(use a SeqScanPlan). The index executor with lock mechanism will take care of consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually InsertIndex uses a InsertPlan to insert a new tuple into pg_attribute so we can get concurrency control for free.

index_oid, old_index->GetName(), table_oid, schema_name,
old_index->GetMetadata()->GetIndexType(),
old_index->GetMetadata()->GetIndexConstraintType(),
old_index->GetMetadata()->HasUniqueKeys(), new_key_attrs,
pool_.get(), txn);
}
std::unique_ptr<executor::ExecutorContext> context(
new executor::ExecutorContext(txn, {}));
// Step 3: build column mapping between old table and new table
// we're using column name as unique identifier
std::vector<oid_t> old_column_ids;
std::unordered_map<oid_t, oid_t> column_map;
for (oid_t old_column_id = 0;
old_column_id < old_schema->GetColumnCount(); old_column_id++) {
old_column_ids.push_back(old_column_id);
for (oid_t new_column_id = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[inefficient implementation]
It is not efficient to get a matched name in a for loop. It is better to use a hashmap to do such a thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for your suggestion.

new_column_id < new_schema->GetColumnCount(); new_column_id++) {
if (old_schema->GetColumn(old_column_id).GetName() ==
new_schema->GetColumn(new_column_id).GetName()) {
column_map[new_column_id] = old_column_id;
}
}
}
// Step 4: Get tuples from old table with sequential scan
// TODO: Try to reuse Sequential scan function and insert function in
// abstract catalog
planner::SeqScanPlan seq_scan_node(old_table, nullptr, old_column_ids);
executor::SeqScanExecutor seq_scan_executor(&seq_scan_node,
context.get());
seq_scan_executor.Init();
while (seq_scan_executor.Execute()) {
std::unique_ptr<executor::LogicalTile> result_tile(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Curious, what if the result is empty? Is it handled here?

seq_scan_executor.GetOutput());
for (size_t i = 0; i < result_tile->GetTupleCount(); i++) {
// Transform tuple into new schema
std::unique_ptr<storage::Tuple> tuple(
new storage::Tuple(new_schema.get(), true));

for (oid_t new_column_id = 0;
new_column_id < new_schema->GetColumnCount(); new_column_id++) {
auto it = column_map.find(new_column_id);
type::Value val;
if (it == column_map.end()) {
// new column, set value to null
val = type::ValueFactory::GetNullValueByType(
new_schema->GetColumn(new_column_id).GetType());
} else {
// otherwise, copy value in old table
val = result_tile->GetValue(i, it->second);
if (new_schema->GetColumn(new_column_id).GetType() !=
old_schema->GetColumn(it->second).GetType()) {
// change the value's type
LOG_TRACE(
"CASTED: %s TO %s", val.GetInfo().c_str(),
new_schema->GetColumn(new_column_id).GetInfo().c_str());
auto casted_val =
val.CastAs(new_schema->GetColumn(new_column_id).GetType());
tuple->SetValue(new_column_id, casted_val, pool_.get());
} else {
tuple->SetValue(new_column_id, val, pool_.get());
}
}
}
// insert new tuple into new table
planner::InsertPlan node(new_table, std::move(tuple));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Is it efficient to call planner inside a loop?

executor::InsertExecutor executor(&node, context.get());
executor.Init();
executor.Execute();
}
}
// Step 5: delete all the column(attribute) records in pg_attribute
// and reinsert them using new schema(column offset needs to change
// accordingly)
auto pg_attributes =
catalog_map_[database_oid]->GetColumnCatalog();
pg_attributes->DeleteColumns(table_oid, txn);
oid_t column_offset = 0;
for (auto new_column : new_schema->GetColumns()) {
pg_attributes->InsertColumn(
table_oid, new_column.GetName(), column_offset,
new_column.GetOffset(), new_column.GetType(),
new_column.IsInlined(), new_column.GetConstraints(), pool_.get(),
txn);
column_offset++;
}
// TODO: Add gc logic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you guys finished this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still working on this, sorry about that.

// txn->RecordDrop(database_oid, old_table->GetOid(), INVALID_OID);

// Final step of physical change should be moved to commit time
database->ReplaceTableWithOid(table_oid, new_table);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[memory leak]
old table object is not explicitly freed.
database->ReplaceTableWithOid just replace the table pointer to new_table but old_table still exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! We left a TODO here, and we are still looking for the correct way to handle this.


LOG_TRACE("Alter table with oid %d succeed.", table_oid);
} catch (CatalogException &e) {
LOG_TRACE("Alter table failed.");
return ResultType::FAILURE;
}
} catch (CatalogException &e) {
return ResultType::FAILURE;
}
return ResultType::SUCCESS;
}

/**
* @brief Add new columns to the table.
* @param database_name database to which the table belongs to
* @param table_name table to which the column belongs to
* @param columns the column to be added
* @param txn the transaction Context
* @return TransactionContext ResultType(SUCCESS or FAILURE)
*
*/
ResultType Catalog::AddColumn(
UNUSED_ATTRIBUTE const std::string &database_name,
UNUSED_ATTRIBUTE const std::string &table_name,
UNUSED_ATTRIBUTE const std::vector<std::string> &columns,
UNUSED_ATTRIBUTE concurrency::TransactionContext *txn) {
// TODO: perform ADD Operation
return ResultType::SUCCESS;
}

/**
* @brief Drop the column from the table.
* @param database_name database to which the table belongs to
* @param table_name table to which the columns belong to
* @param columns the columns to be dropped
* @param txn the transaction Context
* @return TransactionContext ResultType(SUCCESS or FAILURE)
*/

ResultType Catalog::DropColumn(UNUSED_ATTRIBUTE const std::string &database_name,
UNUSED_ATTRIBUTE const std::string &table_name,
UNUSED_ATTRIBUTE const std::vector<std::string> &columns,
UNUSED_ATTRIBUTE concurrency::TransactionContext *txn) {
return ResultType::SUCCESS;
}

/**
* @brief Change the column name in the catalog.
* @param database_name database to which the table belongs to
* @param table_name table to which the column belongs to
* @param columns the column to be dropped
* @param txn the transaction Context
* @return TransactionContext ResultType(SUCCESS or FAILURE)
*/
ResultType Catalog::RenameColumn(const std::string &database_name,
const std::string &table_name,
const std::string &old_name,
const std::string &new_name,
const std::string &schema_name,
concurrency::TransactionContext *txn) {
if (txn == nullptr) {
throw CatalogException("Change Column requires transaction.");
}

if (new_name.size() == 0) {
throw CatalogException("Name can not be empty string.");
}

LOG_TRACE("Change Column Name %s to %s", old_name.c_str(), new_name.c_str());

try {
// Get table from the name
auto table = Catalog::GetInstance()->GetTableWithName(database_name, schema_name,
table_name, txn);
auto schema = table->GetSchema();

// Currently we only support change the first column name!

// Check the validity of old name and the new name
oid_t columnId = schema->GetColumnID(new_name);
if (columnId != INVALID_OID) {
throw CatalogException("New column already exists in the table.");
}
columnId = schema->GetColumnID(old_name);
if (columnId == INVALID_OID) {
throw CatalogException("Old column does not exist in the table.");
}

// Change column name in the global schema
schema->ChangeColumnName(columnId, new_name);

// Modify the pg_table
oid_t table_oid = Catalog::GetInstance()
->GetTableObject(database_name, schema_name, table_name, txn)
->GetTableOid();
oid_t database_oid = Catalog::GetInstance()
->GetTableObject(database_name, schema_name, table_name, txn)
->GetDatabaseOid();
auto pg_attributes =
catalog_map_[database_oid]->GetColumnCatalog();
bool res = pg_attributes->RenameColumn(
database_oid, table_oid, old_name, new_name, txn);
if (!res) {
throw CatalogException("Change Column name failed.");
}

} catch (CatalogException &e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Maybe printing the stack trace here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix this!

return ResultType::FAILURE;
}
return ResultType::SUCCESS;
}

//===--------------------------------------------------------------------===//
// DEPRECATED
//===--------------------------------------------------------------------===//
Expand Down
1 change: 1 addition & 0 deletions src/catalog/column.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ void Column::SetInlined() {
switch (column_type) {
case type::TypeId::VARCHAR:
case type::TypeId::VARBINARY:
is_inlined = false;
break; // No change of inlined setting

default:
Expand Down
31 changes: 31 additions & 0 deletions src/catalog/column_catalog.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -252,5 +252,36 @@ ColumnCatalog::GetColumnObjects(oid_t table_oid,
return table_object->GetColumnObjects();
}

/*
* @brief Rename the column name to a new name.
* @ return whether the update succeed
*/
bool ColumnCatalog::RenameColumn(oid_t database_oid,
oid_t table_oid,
const std::string &column_name,
const std::string &new_name,
concurrency::TransactionContext *txn) {
std::vector<oid_t> update_columns({ColumnId::COLUMN_NAME});

std::vector<type::Value> update_values;
update_values.push_back(type::ValueFactory::GetVarcharValue(new_name).Copy());

// values to execute index scan
std::vector<type::Value> scan_values;
scan_values.push_back(type::ValueFactory::GetIntegerValue(table_oid).Copy());
scan_values.push_back(
type::ValueFactory::GetVarcharValue(column_name, nullptr).Copy());

// Index of table_oid & column_name
oid_t index_offset = IndexId::PRIMARY_KEY;

auto table_object =
Catalog::GetInstance()->GetTableObject(database_oid, table_oid, txn);
table_object->EvictColumnObject(column_name);

return UpdateWithIndexScan(update_columns, update_values, scan_values,
index_offset, txn);
}

} // namespace catalog
} // namespace peloton
Loading