Skip to content

[Improvement] Optimize fetching table by name identifier #6638

Closed
@yuqi1129

Description

@yuqi1129

What would you like to be improved?

According to the CPU profiler,

Image

The following code is very time-consuming; we need to merge the logic of fetching the table as much as possible

public TableEntity getTableByIdentifier(NameIdentifier identifier) {
NameIdentifierUtil.checkTable(identifier);
Long schemaId =
CommonMetaService.getInstance().getParentEntityIdByNamespace(identifier.namespace());
TablePO tablePO = getTablePOBySchemaIdAndName(schemaId, identifier.name());
List<ColumnPO> columnPOs =
TableColumnMetaService.getInstance()
.getColumnsByTableIdAndVersion(tablePO.getTableId(), tablePO.getCurrentVersion());
return POConverters.fromTableAndColumnPOs(tablePO, columnPOs, identifier.namespace());
}

The SQL corresponding to getColumnsByTableIdAndVersion can be optimized:

https://github.com/apache/gravitino/blob/1297713992dfd376fc2a6fba805a6cdee61c4373/core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableColumnBaseSQLProvider.java#L28C17-L48

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version;
8 rows in set (0.28 sec)

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version
    ->   and table_id = 2716478369449788787;
8 rows in set (0.00 sec)

If we add a condition like table_id = xxxx in the end, it will be more efficient and only take a few milliseconds compared to 200+ milliseconds(see above).

How should we improve?

No response

Metadata

Metadata

Assignees

Labels

0.9.0Release v0.9.0improvementImprovements on everything

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions