Skip to content

feat: Support for DuckDB’s RANGE function with two timestamps for BigQuery Compatibility #10487

Open
@BugaM

Description

@BugaM

Is your feature request related to a problem?

In DuckDB, the RANGE function requires a third argument (an interval) when working with timestamps. This differs from BigQuery’s implementation, where the RANGE function can take two timestamp arguments without an interval. This incompatibility prevents the following query, which works in BigQuery, from running in DuckDB.

import ibis
import pandas as pd
from datetime import datetime

conn = ibis.duckdb.connect()

data = pd.DataFrame({
    'id': [1, 2, 1, 2],
    'value': [10, 20, 30, 40],
    'timestamp': [
        datetime(2023, 1, 1, 12, 0),
        datetime(2023, 1, 2, 12, 0),
        datetime(2023, 1, 3, 12, 0),
        datetime(2023, 1, 4, 12, 0)
    ]
})

conn.create_table('test_table', data)

query = conn.sql("""
    SELECT id, value, RANGE(timestamp, LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp ASC)) as timestamp_range
    FROM test_table 
""", dialect="bigquery")

Error:

duckdb.duckdb.BinderException: Binder Error: No function matches the given name and argument types 'range(TIMESTAMP, TIMESTAMP)'. You might need to add explicit type casts.
        Candidate functions:
        range(BIGINT) -> BIGINT[]
        range(BIGINT, BIGINT) -> BIGINT[]
        range(BIGINT, BIGINT, BIGINT) -> BIGINT[]
        range(TIMESTAMP, TIMESTAMP, INTERVAL) -> TIMESTAMP[]
        range(TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH TIME ZONE, INTERVAL) -> TIMESTAMP WITH TIME ZONE[]

LINE 1: DESCRIBE SELECT id, value, RANGE(timestamp, LEAD(timestamp) OVER (...

The same query works in BigQuery:

WITH test_table AS (
    SELECT 1 AS id, 10 AS value, TIMESTAMP("2023-01-01 12:00:00") AS timestamp UNION ALL
    SELECT 2 AS id, 20 AS value, TIMESTAMP("2023-01-02 12:00:00") AS timestamp UNION ALL
    SELECT 1 AS id, 30 AS value, TIMESTAMP("2023-01-03 12:00:00") AS timestamp UNION ALL
    SELECT 2 AS id, 40 AS value, TIMESTAMP("2023-01-04 12:00:00") AS timestamp
)

 
SELECT id, value, RANGE(timestamp, LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp ASC)) as timestamp_range
    FROM test_table 

Resulting:

image

What is the motivation behind your request?

I’m using Ibis to test BigQuery queries locally. This feature is needed to make sure BigQuery operators work the same way in Ibis. Specifically, it would support doing as of joins as suggested by the documentation. Enabling it also envolves adding the RANGE_CONTAINS function.

Describe the solution you'd like

Add support to BQ's RANGE and RANGE_CONTAINS functions when BigQuery dialect is chosen.

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

DuckDB, BigQuery

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions