Skip to content

Unexpected collisions in checksum aggregate function over maps #24878

Open
@mbasmanova

Description

@mbasmanova

Checksum algorithm doesn't distinguish between map keys and values. Checksum({1: 10, 2: 20}) is the same as checksum({10: 1, 20: 2}).

presto:di> select map(array[1, 2], array[10, 20]);
    _col0
--------------
 {1=10, 2=20}

Query 20250406_205840_36247_w9a6u, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
[Latency: client-side: 161ms, server-side: 56ms] [0 rows, 0B] [0 rows/s, 0B/s]

presto:di> select checksum(map(array[1, 2], array[10, 20]));
          _col0
-------------------------
 75 f8 a5 51 6e 26 d1 6c

presto:di> select map(array[10, 20], array[1, 2]);
    _col0
--------------
 {20=2, 10=1}

presto:di> select checksum(map(array[10, 20], array[1, 2]));
          _col0
-------------------------
 75 f8 a5 51 6e 26 d1 6c

This is because MapType.hash uses XOR to combine hashes of the key and value.

com/facebook/presto/common/type/MapType.java

    @Override
    public long hash(Block block, int position)
    {
        Block mapBlock = getObject(block, position);
        long result = 0;

        for (int i = 0; i < mapBlock.getPositionCount(); i += 2) {
            result += hashPosition(keyType, mapBlock, i) ^ hashPosition(valueType, mapBlock, i + 1);
        }
        return result;
    }

A more robust implementation would be the same as ArrayType.hash

    @Override
    public long hash(Block block, int position)
    {
        Block array = getObject(block, position);
        long hash = 0;
        for (int i = 0; i < array.getPositionCount(); i++) {
            hash = 31 * hash + hashPosition(elementType, array, i);
        }
        return hash;
    }

Activity

mbasmanova

mbasmanova commented on Apr 6, 2025

@mbasmanova
ContributorAuthor
changed the title [-]Too many collisions in checksum aggregate function over maps[/-] [+]Unexpected collisions in checksum aggregate function over maps[/+] on Apr 6, 2025
amitkdutta

amitkdutta commented on Apr 7, 2025

@amitkdutta
Contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Unexpected collisions in checksum aggregate function over maps · Issue #24878 · prestodb/presto