Skip to content

[#10828] feat(catalog-glue): Implement schema and table CRUD operations#10829

Open
diqiu50 wants to merge 17 commits intomainfrom
glue-pr04
Open

[#10828] feat(catalog-glue): Implement schema and table CRUD operations#10829
diqiu50 wants to merge 17 commits intomainfrom
glue-pr04

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented Apr 21, 2026

What changes were proposed in this pull request?

Implement schema CRUD, table CRUD, partition support, and exception mapping for the catalog-glue module:

  • GlueExceptionConverter — maps EntityNotFoundException/AlreadyExistsException/InvalidInputException to Gravitino exceptions
  • GlueCatalogOperations — implements SupportsSchemas and TableCatalog backed by AWS Glue API
  • GlueTableOperations — implements SupportsPartitions backed by Glue partition APIs
  • GlueTablenewOps() wired to return GlueTableOperations

Why are the changes needed?

These classes implement the core metadata operations required to make the catalog-glue module functional.

Fix: #10828

Does this PR introduce any user-facing change?

No. This is internal implementation of the catalog-glue module.

How was this patch tested?

Unit tests covering schema/table/partition CRUD with mocked GlueClient.

Copilot AI review requested due to automatic review settings April 21, 2026 08:13
@diqiu50 diqiu50 self-assigned this Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the core AWS Glue-backed catalog functionality for the catalog-glue module, adding schema/table CRUD, partition operations, and Glue→Gravitino exception/type conversions, with unit tests using mocked Glue clients and synthetic SDK model objects.

Changes:

  • Add GlueCatalogOperations implementing SupportsSchemas and TableCatalog using the AWS Glue API (including pagination and optional catalogId).
  • Add table/partition conversion and operations (GlueTable, GlueTableOperations, GlueSchema, GlueColumn, GlueTypeConverter, GlueExceptionConverter, constants/properties metadata).
  • Add unit tests for schema/table/partition CRUD and conversion logic; add optional AWS-backed tests gated by env vars.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueCatalogOperations.java Implements schema + table CRUD on top of Glue API, pagination, format filtering, and change application.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueTableOperations.java Implements Glue-backed partition listing/get/add/drop.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueTable.java Converts Glue Table→Gravitino and wires table ops to partition operations.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueSchema.java Converts Glue Database→Gravitino schema.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueColumn.java Converts Glue Column→Gravitino column using a type converter.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueTypeConverter.java Converts between Glue/Hive type strings and Gravitino Type.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueExceptionConverter.java Maps common Glue SDK exceptions to Gravitino exceptions.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueConstants.java Adds connector config keys and StorageDescriptor-derived table property keys.
catalogs/catalog-glue/src/main/java/org/apache/gravitino/catalog/glue/GlueTablePropertiesMetadata.java Updates property description for table format values.
catalogs/catalog-glue/src/test/java/org/apache/gravitino/catalog/glue/*.java Adds mocked CRUD tests plus shared conversion test scenarios; adds optional AWS endpoint tests gated by env vars.
catalogs/catalog-glue/build.gradle.kts Adjusts test task behavior (removes -PskipITs conditional).

Comment thread catalogs/catalog-glue/build.gradle.kts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Code Coverage Report

Overall Project 65.44% +0.11% 🟢
Files changed 79.49% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.27% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 82.69% -8.59% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 43.93% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.27% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.63% 🟢
common 48.67% 🟢
core 81.49% 🟢
filesystem-hadoop3 76.97% 🟢
flink 40.55% 🟢
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.14% 🟢
iceberg-common 55.2% 🟢
iceberg-rest-server 67.25% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.62% 🟢
server-common 69.76% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 34.27% 🔴
Files
Module File Coverage
catalog-glue GlueTableOperations.java 90.35% 🟢
GlueTable.java 89.36% 🟢
GlueCatalogOperations.java 74.38% 🟢
GlueExceptionConverter.java 57.14% 🔴

diqiu50 added 12 commits April 23, 2026 18:21
…etadata

- Add GlueConstants for all catalog and table property keys
- Add GlueClientProvider: static creds / DefaultCredentialChain selection,
  region, and endpoint override (for VPC endpoints / LocalStack)
- Implement GlueCatalogPropertiesMetadata: required aws-region + aws-glue-catalog-id,
  optional credentials (hidden), endpoint, default-table-format, table-type-filter
- Implement GlueCatalogCapability: case-insensitive names, no NOT NULL, no DEFAULT
- Implement GlueTablePropertiesMetadata: table_type, metadata_location, location
- Add TestGlueClientProvider unit tests
- Fix stale JavaDoc: DEFAULT_TABLE_FORMAT "Defaults to iceberg" -> "Defaults to hive"
- GlueClientProvider: fail-fast on partial credentials (one key without the other)
- GlueClientProvider: wrap URI.create with property-context error message
- GlueCatalogCapability: remove COLUMN from case-insensitive scope (no AWS docs backing)
- GlueTablePropertiesMetadata: remove ephemeral PR-05 forward reference from comment
- TestGlueClientProvider: use try-with-resources; update partial-cred test to expect exception;
  add tests for secret-only and invalid endpoint cases
- Add TestGlueCatalogCapability: covers all capability method contracts
- Add TestGlueCatalogPropertiesMetadata: covers required/hidden/immutable flags and defaults
- Use StringUtils.isNotBlank() to reject blank region and credential values
- Make aws-glue-catalog-id optional (Glue defaults to caller's account ID)
- Add casing note to TABLE_FORMAT_TYPE JavaDoc distinguishing Glue uppercase from filter lowercase
- Clarify deferred validation for default-table-format and table-type-filter
- Rename TABLE_TYPE_FILTER_ALL to DEFAULT_TABLE_TYPE_FILTER
- Add default credential chain order comment in GlueClientProvider
- Remove try-catch wrapping on URI.create for endpoint validation
- Make aws-glue-catalog-id optional
- Add casing note to TABLE_FORMAT_TYPE JavaDoc
- Clarify deferred validation for default-table-format and table-type-filter
- Add StringUtils.isNotBlank() for blank value checks
…ests

Add model layer for catalog-glue (PR-03):
- GlueConstants: storage descriptor and table-format constants
- GlueTypeConverter: Glue/Hive type string to Gravitino Type mapping
- GlueSchema: maps AWS Glue Database -> Gravitino BaseSchema
- GlueColumn: maps AWS Glue Column -> Gravitino BaseColumn
- GlueTable: maps AWS Glue Table -> Gravitino BaseTable
  (columns, partitioning, distribution, sort orders, properties)

Test architecture: abstract base class + two implementations:
- SyntheticGlueXxxTest: SDK builder, no network, always runs
- AwsGlueXxxIT: real AWS Glue API, tagged gravitino-aws-test, skipped by default
…Table

- Rename test classes to follow TestXxx naming convention
- Add error handling for malformed type strings in GlueTypeConverter
- Fix NPE in GlueTable distribution and sort order null-checks
- Add null-safe handling for GlueSchema parameters
- Use GlueException for AWS cleanup errors
- Add aws.glue test dependency for SdkException class
- Fix TABLE_FORMAT description to use uppercase values
Glue/Hive timestamps are timezoneless. fromGravitino now throws
IllegalArgumentException for TimestampType.withTimeZone(), consistent
with HiveDataTypeConverter.

Ref: https://cwiki.apache.org/confluence/display/hive/languagemanual+types
…ests

Replace @tag("gravitino-aws-test") + Gradle excludeTags with
@EnabledIfEnvironmentVariable(named = "AWS_ACCESS_KEY_ID"). Tests now
auto-skip when the env var is absent, no Gradle flag change needed.
diqiu50 added 5 commits April 23, 2026 18:33
…t during rebase

Restore GlueTypeConverter to the full DataTypeConverter implementation from
glue-pr03 (now in main), including complex type support (array, map, struct,
uniontype) and the splitTopLevel helper. Update GlueColumn.fromGlueColumn and
GlueTable.fromGlueTable to accept a GlueTypeConverter instance. Add a typeConverter
field to GlueCatalogOperations and make toGlueColumn non-static to use it.
Restore AbstractGlueTableTest and TestGlueTypeConverter from main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] catalog-glue: Implement schema and table CRUD operations

2 participants