Skip to content

feat(csharp/src/Drivers/BigQuery): support evaluation kind and statement type setting #2698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

qifanzhang-ms
Copy link
Contributor

Support evaluation kind and statement type setting for GBQ driver, and improve code at BigQueryStatement.cs.

@davidhcoe
Copy link
Contributor

Is there a specific multi-statement test that should be added (presumably from both the DriverTests and the ClientTests)?

@qifanzhang-ms
Copy link
Contributor Author

Is there a specific multi-statement test that should be added (presumably from both the DriverTests and the ClientTests)?
I tried to add a test. Since our current test framework requires developers to write query statements in json files themselves, the added test would be a bit abrupt.

@davidhcoe
Copy link
Contributor

davidhcoe commented Apr 14, 2025

Is there a specific multi-statement test that should be added (presumably from both the DriverTests and the ClientTests)?
I tried to add a test. Since our current test framework requires developers to write query statements in json files themselves, the added test would be a bit abrupt.

The way I would do something like this is for any environment that can execute a query. For example, something like:

 [SkippableFact, Order(6)]
 public void CanExecuteMultiStatementQuery()
 {
     foreach (BigQueryTestEnvironment environment in _environments)
     {
         AdbcConnection adbcConnection = GetAdbcConnection(environment.Name);
         AdbcStatement statement = adbcConnection.CreateStatement();
         
          string query1 = "SELECT " +
                  "CAST(1 as INT64) as id, " +
                  "CAST(1.23 as FLOAT64) as number, " +
                  "PARSE_NUMERIC(\"4.56\") as decimal, " +
                  "PARSE_BIGNUMERIC(\"7.89000000000000000000000000000000000000\") as big_decimal, " +
                  "CAST(True as BOOL) as is_active, " +
                  "'John Doe' as name, " +
                  "FROM_BASE64('YWJjMTIz') as data, " +
                  "CAST('2023-09-08' as DATE) as date, " +
                  "CAST('12:34:56' as TIME) as time, " +
                  "CAST('2023-09-08 12:34:56' as DATETIME) as datetime, " +
                  "CAST('2023-09-08 12:34:56+00:00' as TIMESTAMP) as timestamp, " +
                  "ST_GEOGPOINT(1, 2) as point, " +
                  "ARRAY[1, 2, 3] as numbers, " +
                  "STRUCT('John Doe' as name, 30 as age) as person," +
                  "PARSE_JSON('{\"name\":\"Jane Doe\",\"age\":29}') as json";

         string query2 = "SELECT " +
                   "CAST(1.7976931348623157e+308 as FLOAT64) as number, " +
                   "PARSE_NUMERIC(\"9.99999999999999999999999999999999E+28\") as decimal, " +
                 
       "PARSE_BIGNUMERIC(\"5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+37\")

         string combinedQuery = query1 + ";" + query2 + ";"
         statement.SqlQuery = combinedQuery;

         QueryResult queryResult = statement.ExecuteQuery();

       // TODO: Assert the expected results from the two queries
     }
 }

This will work without any tables being created and you can validate the result(s) of the multi-statement evaluation.

An alternative way could be to use the BigQuery public datasets to retrieve data as well.

@qifanzhang-ms
Copy link
Contributor Author

Is there a specific multi-statement test that should be added (presumably from both the DriverTests and the ClientTests)?
I tried to add a test. Since our current test framework requires developers to write query statements in json files themselves, the added test would be a bit abrupt.

The way I would do something like this is for any environment that can execute a query. For example, something like:

 [SkippableFact, Order(6)]
 public void CanExecuteMultiStatementQuery()
 {
     foreach (BigQueryTestEnvironment environment in _environments)
     {
         AdbcConnection adbcConnection = GetAdbcConnection(environment.Name);
         AdbcStatement statement = adbcConnection.CreateStatement();
         
          string query1 = "SELECT " +
                  "CAST(1 as INT64) as id, " +
                  "CAST(1.23 as FLOAT64) as number, " +
                  "PARSE_NUMERIC(\"4.56\") as decimal, " +
                  "PARSE_BIGNUMERIC(\"7.89000000000000000000000000000000000000\") as big_decimal, " +
                  "CAST(True as BOOL) as is_active, " +
                  "'John Doe' as name, " +
                  "FROM_BASE64('YWJjMTIz') as data, " +
                  "CAST('2023-09-08' as DATE) as date, " +
                  "CAST('12:34:56' as TIME) as time, " +
                  "CAST('2023-09-08 12:34:56' as DATETIME) as datetime, " +
                  "CAST('2023-09-08 12:34:56+00:00' as TIMESTAMP) as timestamp, " +
                  "ST_GEOGPOINT(1, 2) as point, " +
                  "ARRAY[1, 2, 3] as numbers, " +
                  "STRUCT('John Doe' as name, 30 as age) as person," +
                  "PARSE_JSON('{\"name\":\"Jane Doe\",\"age\":29}') as json";

         string query2 = "SELECT " +
                   "CAST(1.7976931348623157e+308 as FLOAT64) as number, " +
                   "PARSE_NUMERIC(\"9.99999999999999999999999999999999E+28\") as decimal, " +
                 
       "PARSE_BIGNUMERIC(\"5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+37\")

         string combinedQuery = query1 + ";" + query2 + ";"
         statement.SqlQuery = combinedQuery;

         QueryResult queryResult = statement.ExecuteQuery();

       // TODO: Assert the expected results from the two queries
     }
 }

This will work without any tables being created and you can validate the result(s) of the multi-statement evaluation.

An alternative way could be to use the BigQuery public datasets to retrieve data as well.

Thanks, have added.

@CurtHagenlocher
Copy link
Contributor

I feel like I don't entirely understand this change, so it would be nice to get a little more explanation.

Today, there's a limitation in ADBC which prevents a single execution batch from returning multiple results. Now obviously, multiple statements doesn't have to mean multiple results because a statement could be e.g. DDL or BEGIN TRAN or some other thing which impacts session state. But accepting multiple statements implies that they could each be returning results, and that's where things feel a little sketchy. It looks like the two added parameters are intended to filter down the set of results in order to pick just one. But they do so in a way that doesn't (to me) obviously ensure that only one result will match the criteria. What if there are two results with the same statement type and evaluation kind? Picking the first one feels a little arbitrary.

What I'd naively expect if I passed multiple statements and needed to indicate which one's results I wanted would be to supply the index of the statement. So for "statement1; statement2; statement3", if I wanted the results from statement2 I might pass either 1 or 2 depending on how I feel about zero-indexing vs one-indexing.

Is the current design something that users of the e.g. BigQuery ODBC driver would already be familiar with?
Do certain statement types and evaluation kinds never come with real result data and can therefore be omitted by default if e.g. an index wasn't specified?

@CurtHagenlocher
Copy link
Contributor

(Also, it would be nice if we tried to maintain some alignment between the C# BigQuery driver and the Go BigQuery driver -- though it may already be a little late for that :(. CC: @lidavidm for possible additional feedback.)

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks mechanically fine; I just have questions around the API.

@davidhcoe
Copy link
Contributor

(Also, it would be nice if we tried to maintain some alignment between the C# BigQuery driver and the Go BigQuery driver -- though it may already be a little late for that :(. CC: @lidavidm for possible additional feedback.)

We started the C# one before the Go one was available and I haven’t followed the Go one too closely.

@qifanzhang-ms
Copy link
Contributor Author

qifanzhang-ms commented Apr 18, 2025

I feel like I don't entirely understand this change, so it would be nice to get a little more explanation.

Today, there's a limitation in ADBC which prevents a single execution batch from returning multiple results. Now obviously, multiple statements doesn't have to mean multiple results because a statement could be e.g. DDL or BEGIN TRAN or some other thing which impacts session state. But accepting multiple statements implies that they could each be returning results, and that's where things feel a little sketchy. It looks like the two added parameters are intended to filter down the set of results in order to pick just one. But they do so in a way that doesn't (to me) obviously ensure that only one result will match the criteria. What if there are two results with the same statement type and evaluation kind? Picking the first one feels a little arbitrary.

What I'd naively expect if I passed multiple statements and needed to indicate which one's results I wanted would be to supply the index of the statement. So for "statement1; statement2; statement3", if I wanted the results from statement2 I might pass either 1 or 2 depending on how I feel about zero-indexing vs one-indexing.

Is the current design something that users of the e.g. BigQuery ODBC driver would already be familiar with? Do certain statement types and evaluation kinds never come with real result data and can therefore be omitted by default if e.g. an index wasn't specified?

Your understanding is very accurate. The behavior of picking the first one statement is designed because the Connector implemented by the previous BigQuery ODBC driver has such behavior, which is also the behavior that customers want.

Supplying the index of the statement is indeed a reasonable feature, but there is no specific customer demand at present. I can add it, you can take a look first.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CurtHagenlocher CurtHagenlocher merged commit 6027c11 into apache:main Apr 18, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants