Skip to content

Conversation

@yzeng1618
Copy link
Contributor

Purpose of this pull request

During hive2hbase synchronization, if the upstream table contains DECIMAL/DATE/TIME/TIMESTAMP fields, the HBase sink fails at write time with HbaseConnectorException(COMMON-07 UNSUPPORTED_DATA_TYPE). A short-term workaround is to use transform/cast (e.g., cast DECIMAL/DATE/TIMESTAMP to STRING or BIGINT).

However, from a consistency perspective, the read side (HBaseDeserializationFormat) already attempts to support DATE/TIME/TIMESTAMP/DECIMAL (and the previous DECIMAL-as-float behavior was also unreasonable), while the write side does not, creating an obvious read/write asymmetry. Therefore, this PR completes sink-side support and unifies read/write encoding rules to provide a consistent “semantic closure” within the connector and make common hive2hbase scenarios more out-of-the-box.

Does this PR introduce any user-facing change?

  • Adds DECIMAL/DATE/TIME/TIMESTAMP/BYTES support to HBase sink serialization with consistent encoding rules (string bytes for precision/safety).

  • Fixes DECIMAL deserialization to parse BigDecimal from string first, with a backward-compatible float fallback for legacy data.

  • Adds unit and e2e coverage for these types.

How was this patch tested?

  • Unit: HbaseSinkWriterTypeConvertTest.java

  • E2E: fake-to-hbase-with-date-time-decimal.conf + hbase-to-assert-with-date-time-decimal.conf and IT method testHbaseSinkWithDateTimeDecimal

Check list

@zhangshenghang
Copy link
Member

How about updating the relevant documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants