Modified pg_column_stats initialization#1352
Modified pg_column_stats initialization#1352poojanilangekar wants to merge 8 commits intocmu-db:masterfrom
Conversation
|
@poojanilangekar Nice! |
a625053 to
92d95fa
Compare
|
@mengranwo @GustavoAngulo We need to merge this in ASAP as its blocking the performance benchmarks. Please take a look at it when you get the time. |
|
I think it might be good to keep As for the the schema, I'm not sure which inconsistency you're referring to @poojanilangekar ? @chenboy do you see an inconsistency? |
|
@GustavoAngulo I have modified the function to Regarding the inconsistency, the declaration creates two SKEY indexes and the table has no primary key. While the header file states that the table should contain only one index with the primary key. |
|
The peloton/src/catalog/column_stats_catalog.cpp Line 135 in d68ab71 peloton/src/catalog/column_stats_catalog.cpp Line 186 in d68ab71 So I think we should stick to the declaration, not the header. |
|
@GustavoAngulo @camellyx Can you please review these changes? |
|
@pervazea Can you take a look since these other people are out of town? |
|
This is an important PR that we should merge. We don't want to lose track of this. |
pervazea
left a comment
There was a problem hiding this comment.
Looks good.
Please add function headers / documentation as requested (and elsewhere too if I've missed anything).
Also, needs merge conflicts resolved :-( due to recent re-factors.
| //===--------------------------------------------------------------------===// | ||
| bool InsertColumnStats(oid_t database_id, oid_t table_id, oid_t column_id, | ||
| int num_rows, double cardinality, double frac_null, | ||
| bool InsertColumnStats(oid_t table_id, oid_t column_id, int num_rows, |
There was a problem hiding this comment.
Add comment documenting function purpose and arguments.
| bool has_index, type::AbstractPool *pool, | ||
| concurrency::TransactionContext *txn); | ||
| bool DeleteColumnStats(oid_t database_id, oid_t table_id, oid_t column_id, | ||
| bool DeleteColumnStats(oid_t table_id, oid_t column_id, |
There was a problem hiding this comment.
Add a comment documenting function purpose and arguments.
| std::unique_ptr<std::vector<type::Value>> GetColumnStats( | ||
| oid_t database_id, oid_t table_id, oid_t column_id, | ||
| concurrency::TransactionContext *txn); | ||
| oid_t table_id, oid_t column_id, concurrency::TransactionContext *txn); |
There was a problem hiding this comment.
Add comment document function, args and return information.
| } | ||
|
|
||
| ColumnStatsCatalog *GetColumnStatsCatalog() { | ||
| if (!pg_column_stats_) { |
There was a problem hiding this comment.
Is this a recoverable error? If not, this should be a PELOTON_ASSERT.
|
|
||
| ResultType AnalyzeStatsForAllTables( | ||
| concurrency::TransactionContext *txn = nullptr); | ||
| ResultType AnalyzeStatsForAllTablesWithDatabaseOid( |
There was a problem hiding this comment.
Add function header describing this function, args, etc.
| database_id, table_id, column_id, num_rows, cardinality, frac_null, | ||
| most_common_vals, most_common_freqs, histogram_bounds, column_name, | ||
| has_index, pool_.get(), txn); | ||
| pg_column_stats->DeleteColumnStats(table_id, column_id, txn); |
There was a problem hiding this comment.
DeleteColumnStats if they exist, I assume. May be helpful to add comment stating that, if correct.
| @@ -108,14 +110,18 @@ TEST_F(StatsStorageTests, InsertAndGetTableStatsTest) { | |||
| table_stats_collector.get()); | |||
|
|
|||
| VerifyAndPrintColumnStats(data_table.get(), 4); | |||
There was a problem hiding this comment.
If I understand correctly, this function VerifyAndPrint... does print but very little verify. Acknowledge that this wasn't modified, but it does mean some of the old tests are not particularly useful.
This PR modifies the
ColumnStatsCatalogconstructor to use a predefined schema instead of using DDL. Additionally, it creates thepg_column_statstable to a per database basis. Initially this was not done because of the dependencies of theStatsStorageon the view of "Global Stats".Additionally, it changes the
AnalyzeStatsForAllTablestoAnalyzeStatsForAllTablesWithDatabaseOid. Now that we maintainColumnStatson a per database basis, it makes sense to also collect the stats on a per database basis.