Skip to content

bug: Inconsistent I/O behavior in BigQuery backend regarding dataset specification #10547

Open
@everdark

Description

@everdark

What happened?

Ibis has an inconsistent behavior when it comes to reading and writing tables to BigQuery dataset.

When reading a table, ibis does not require a connection to specify the dataset_id in ibis.bigquery.connect. We can then specify the dataset with either a namespaced table name such as dataset_name.table_name or use the database argument in reading a table. And ibis will raise if we specify both. It will also raise when no dataset_id is specified in connection and no namespace or database are provided in reading table.

For example, the following code will raise IbisInputError: Cannot specify database both in the table name and as an argument:

conn = ibis.bigquery.connect("project", location="region")
conn.table("test1.test", data, database="test2")

and the followings are fine, both can read the table test from dataset test1:

conn = ibis.bigquery.connect("project", location="region")
conn.table("test1.test", data)
conn.table("test", data, database="test1")

So far so good, however, things are very different as we are writing a table.

When writing a table, namespaced table name does NOT work, which means a dataset must be specified as either connection argument (dataset_id) or a saving argument (database). And the later overwrite the former. A surprising behavior is that, when BOTH a namespaced table and a database argument are specified, the dataset in namespace overwrites the argument.

For example, the following will raise ValueError: Unable to determine BigQuery dataset.:

conn = ibis.bigquery.connect("project", location="region")
conn.create_table("test1.test", data)

and this will (surprisingly) save the table test to test1.

conn = ibis.bigquery.connect("project", location="region")
conn.create_table("test1.test", data, database="test2")

which is rather confusing.

The expected behavior (for consistency) should be that we can either save the table using namespaced table name without dataset argument, or we can save it without namespaced table name but with a dataset argument, and it should raise when both are specified.

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bigqueryThe BigQuery backendbugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions