Skip to content

[Improvement] Optimize fetching table by name identifier #6638

Open
@yuqi1129

Description

@yuqi1129

What would you like to be improved?

According to the CPU profiler,

Image

The following code is very time-consuming; we need to merge the logic of fetching the table as much as possible

public TableEntity getTableByIdentifier(NameIdentifier identifier) {
NameIdentifierUtil.checkTable(identifier);
Long schemaId =
CommonMetaService.getInstance().getParentEntityIdByNamespace(identifier.namespace());
TablePO tablePO = getTablePOBySchemaIdAndName(schemaId, identifier.name());
List<ColumnPO> columnPOs =
TableColumnMetaService.getInstance()
.getColumnsByTableIdAndVersion(tablePO.getTableId(), tablePO.getCurrentVersion());
return POConverters.fromTableAndColumnPOs(tablePO, columnPOs, identifier.namespace());
}

The SQL corresponding to getColumnsByTableIdAndVersion can be optimized:

https://github.com/apache/gravitino/blob/1297713992dfd376fc2a6fba805a6cdee61c4373/core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableColumnBaseSQLProvider.java#L28C17-L48

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version;
8 rows in set (0.28 sec)

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version
    ->   and table_id = 2716478369449788787;
8 rows in set (0.00 sec)

If we add a condition like table_id = xxxx in the end, It will more efficient.

How should we improve?

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementImprovements on everything

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions