Skip to content

bug: BIGQUERY backend generates invalid query when calling .distinct on subset on table with array column #10553

Open
@greg-offerfit

Description

@greg-offerfit

What happened?

Given a GBQ table with the following schema:

id: int
int_ids: array<!int64>  (int64 REPEATED)

Calling table.distinct(on=["id"]).execute() fails with the following error:

google.api_core.exceptions.BadRequest: 400 The argument to ARRAY_AGG must not be an array type but was ARRAY at [7:5]; reason: invalidQuery, location: query, message: The argument to ARRAY_AGG must not be an array type but was ARRAY at [7:5]

I expected this to generate a valid query and return a DataFrame.

See below comment for minimal reproduction.

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

google.api_core.exceptions.BadRequest: 400 The argument to ARRAY_AGG must not be an array type but was ARRAY<INT64> at [7:5]; reason: invalidQuery, location: query, message: The argument to ARRAY_AGG must not be an array type but was ARRAY<INT64> at [7:5]

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions