Hive: $row_id is not unique on non-transactional tables but is declared as the table's unique column

TL;DR The synthesized hidden `$row_id` column is **not unique per row** on non-transactional Hive tables, yet the Hive connector reports `$row_id` as the table's unique column (`ConnectorTableLayout.getUniqueColumn()`), which is consumed by `utilize-unique-property-in-query-planning` (enabled by default). Planning optimizations that rely on `$row_id` uniqueness can therefore produce **incorrect results**.

Observed on **both `TEXTFILE` and `ORC`** (non-transactional) tables.

## Reproduction

```sql
-- non-transactional table; reproduces on both ORC and TEXTFILE
CREATE TABLE t_rowid (col bigint, val varchar, ds varchar)
WITH (format = 'ORC', partitioned_by = ARRAY['ds'], bucketed_by = ARRAY['col'], bucket_count = 8);

INSERT INTO t_rowid
SELECT col, 'v' || CAST(col AS varchar), ds
FROM UNNEST(sequence(1, 100)) AS x(col)
CROSS JOIN (VALUES '2024-01-01','2024-01-02','2024-01-03') AS d(ds);  -- 300 rows

SELECT count(*), count(DISTINCT "$row_id") FROM t_rowid;
-- observed: [300, 4]      expected: [300, 300]

-- self-join on $row_id should return 300 (each row matches itself) but is many-to-many:
SELECT count(*) FROM t_rowid a JOIN t_rowid b ON a."$row_id" = b."$row_id";
-- observed: 22500 (many-to-many) instead of 300
```

The same is observed with `format = 'TEXTFILE'`. (We have not confirmed the behavior on **transactional/ACID** tables, where `$row_id` encodes `{writeId, bucketId, rowId}` and is presumably unique — maintainers please advise whether non-transactional `$row_id` is intended to be unique at all.)

## Expected

`$row_id` should be unique per row. If it cannot be unique for non-transactional tables, the Hive connector should **not** advertise it as the table's unique column (so uniqueness-based planning does not kick in) for those tables.

## Impact

- `utilize-unique-property-in-query-planning` (default ON) treats `$row_id` as unique; on non-transactional tables that assumption is false, so optimizations built on it can return incorrect results.
- Any feature relying on `$row_id` identity/uniqueness (row-level operations, joins/grouping on `$row_id`) is unsound on non-transactional tables.

## Environment

- Presto master (0.29x), Hive connector, non-transactional tables (`TEXTFILE` and `ORC`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hive: $row_id is not unique on non-transactional tables but is declared as the table's unique column #28062

Reproduction

Expected

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Hive: $row_id is not unique on non-transactional tables but is declared as the table's unique column #28062

Description

Reproduction

Expected

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions