Skip to content

DISTINCT collision when string values contain commas #965

@ting668

Description

@ting668

Reporter: ting668

Environment

  • OS: CentOS Linux 7 (Core), inside Docker container
  • CPU/Architecture: x86_64
  • Docker image: tugraph/tugraph-runtime-centos7:latest
  • TuGraph-DB Version: 4.5.2 (tugraph-4.5.2-1.x86_64)

Description

While testing TuGraph using a method based on attribute-constraint analysis, I found that DISTINCT can merge different rows when string values contain commas.

Different tuples appear to be serialized into the same internal distinct key.

How to Reproduce and Expected Behavior

Example query:

UNWIND [{x:'a,b', y:'c'}, {x:'a', y:'b,c'}] AS row
RETURN DISTINCT row.x AS x, row.y AS y;

Expected behavior: TuGraph should return 2 rows:

('a,b', 'c')
('a', 'b,c')

Actual behavior: TuGraph returns only 1 row:

('a,b', 'c')

The two different tuples appear to collide because both can be represented as the comma-joined string a,b,c.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions