feat(go/adbc/driver/flightsql): support bulk ingest #3808

prmoore77 · 2025-12-15T20:54:35Z

Addresses #1107

…ht SQL driver. Addresses apache#1107

lidavidm · 2025-12-15T21:00:54Z

python/adbc_driver_flightsql/pyproject.toml

We currently use docker-compose to manage this and I'd rather keep it consistent...

Hi, @lidavidm - I've changed the code to use docker-compose, per your feedback.

…lidavidm

…lidavidm

…lidavidm

…lidavidm

prmoore77 · 2025-12-16T18:48:43Z

FWIW - I tested the locally built driver/wheel from Python against a remote (Azure) GizmoSQL server - and it seems to work pretty well:

import os
import time

import duckdb
from adbc_driver_flightsql import dbapi as gizmosql
from codetiming import Timer
from dotenv import load_dotenv

from config import get_logger

# Timer logging setup
TIMER_TEXT = "{name}: Elapsed time: {:.4f} seconds"


def main():
    load_dotenv()

    logger = get_logger()
    timer_logger = logger.info
    with Timer(name=f"Overall program",
               text=TIMER_TEXT,
               initial_text=True,
               logger=timer_logger
               ):
        with Timer(name=f"  Generate TPCH data and load into DuckDB (1GB)",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            # Connect to DuckDB (memory only)
            duckdb_conn = duckdb.connect()
            duckdb_conn.install_extension("tpch")
            duckdb_conn.load_extension("tpch")
            duckdb_conn.execute(query="CALL dbgen(sf=1.0)")

        with Timer(name=f"  Get RecordBatch reader for the DuckDB lineitem table",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            lineitem_arrow_reader = duckdb_conn.table("lineitem").fetch_arrow_reader(batch_size=10_000)

        with Timer(name=f"  Bulk ingest the data into GizmoSQL",
                   text=TIMER_TEXT,
                   initial_text=True,
                   logger=timer_logger
                   ):
            with gizmosql.connect(
                    uri="grpc+tls://try-gizmosql-adbc.gizmodata.com:31337",
                    db_kwargs={"username": os.environ["GIZMOSQL_USERNAME"],
                               "password": os.environ["GIZMOSQL_PASSWORD"]
                               },
                    autocommit=True
            ).cursor() as cursor:
                ingest_start = time.perf_counter()
                rows_loaded = cursor.adbc_ingest(
                    table_name="bulk_ingest_lineitem",
                    data=lineitem_arrow_reader,
                    mode="replace"
                )
                ingest_seconds = time.perf_counter() - ingest_start

                rows_per_sec = (rows_loaded / ingest_seconds) if ingest_seconds > 0 else float("inf")
                logger.info(msg=f"Loaded rows: {rows_loaded:,}")
                logger.info(msg=f"Ingest time: {ingest_seconds:.4f} s")
                logger.info(msg=f"Rows/sec: {rows_per_sec:,.2f}")


if __name__ == "__main__":
    main()

Result:

2025-12-16 13:39:36,290 - INFO     Timer Overall program started
2025-12-16 13:39:36,290 - INFO     Timer   Generate TPCH data and load into DuckDB (1GB) started
2025-12-16 13:39:38,723 - INFO       Generate TPCH data and load into DuckDB (1GB): Elapsed time: 2.4328 seconds
2025-12-16 13:39:38,723 - INFO     Timer   Get RecordBatch reader for the DuckDB lineitem table started
2025-12-16 13:39:38,726 - INFO       Get RecordBatch reader for the DuckDB lineitem table: Elapsed time: 0.0029 seconds
2025-12-16 13:39:38,726 - INFO     Timer   Bulk ingest the data into GizmoSQL started
2025-12-16 13:40:11,055 - INFO     Loaded rows: 6,001,215
2025-12-16 13:40:11,056 - INFO     Ingest time: 33.3162 s
2025-12-16 13:40:11,056 - INFO     Rows/sec: 180,129.18
2025-12-16 13:40:11,058 - INFO       Bulk ingest the data into GizmoSQL: Elapsed time: 33.9063 seconds
2025-12-16 13:40:11,058 - INFO     Overall program: Elapsed time: 36.3427 seconds

prmoore77 · 2025-12-22T23:20:50Z

Well, the integration tests are failing in the pipeline. They "worked on my laptop" - but I'll investigate to see what is happening.

prmoore77 · 2025-12-23T21:40:36Z

Well, the integration tests are failing in the pipeline. They "worked on my laptop" - but I'll investigate to see what is happening.

I believe the integration tests related to Bulk Ingestion are working now...

prmoore77 · 2025-12-24T21:34:06Z

hi @zeroshade - I think the integration tests are working well now. I do see some CI failures, but I think they are unrelated to my changes.

lidavidm · 2025-12-29T06:13:18Z

go/adbc/driver/flightsql/flightsql_bulk_ingest.go

+// executeIngestWithReader performs the bulk ingest operation with the given record reader.
+func executeIngestWithReader(
+	ctx context.Context,
+	client *flightsql.Client,
+	rdr array.RecordReader,
+	opts *flightsql.ExecuteIngestOpts,
+	callOpts ...grpc.CallOption,
+) (int64, error) {
+	return client.ExecuteIngest(ctx, rdr, opts, callOpts...)
+}


This is a function with one line that is only ever used once; just inline it?

lidavidm · 2025-12-29T06:13:50Z

go/adbc/driver/flightsql/flightsql_bulk_ingest.go

+	rdr, err := array.NewRecordReader(batch.Schema(), []arrow.RecordBatch{batch})
+	if err != nil {
+		return nil, adbc.Error{
+			Msg:  fmt.Sprintf("[Flight SQL Statement] failed to create record reader: %s", err.Error()),


nit: I think there's no need to explicitly call err.Error()

lidavidm · 2025-12-29T06:15:34Z

go/adbc/driver/flightsql/flightsql_statement.go

+	case adbc.OptionKeyIngestTargetTable:
+		s.query.sqlQuery = ""
+		s.query.substraitPlan = nil
+		s.targetTable = val


We need to clear s.prepared too?

lidavidm · 2025-12-29T06:15:51Z

go/adbc/driver/flightsql/flightsql_statement.go

setSqlQuery needs to clear targetTable

lidavidm · 2025-12-29T06:20:23Z

python/adbc_driver_flightsql/tests/test_bulk_ingest.py

+class TestBulkIngest:
+    """Test bulk ingest functionality."""


nit: is there need for the class? It's not required in pytest

lidavidm · 2025-12-29T06:21:08Z

python/adbc_driver_flightsql/tests/test_bulk_ingest.py

+            result = reader.read_all()
+            assert result.column("cnt")[0].as_py() == 5
+
+    def test_ingest_various_types(self, gizmosql):


Is this super necessary? It's more a test of gizmosql than the driver

lidavidm · 2025-12-29T06:21:18Z

python/adbc_driver_flightsql/tests/test_bulk_ingest.py

+class TestBulkIngestDBAPI:
+    """Test bulk ingest using the DBAPI interface."""


Same question here. The test names all encode the difference anyways, so why add the extra layer?

prmoore77 · 2025-12-29T17:50:49Z

hi @lidavidm - I've made edits per your feedback. Thanks for your help.

lidavidm · 2025-12-30T00:50:31Z

go/adbc/driver/flightsql/flightsql_statement.go

 			}
 		}
+	case adbc.OptionKeyIngestTargetTable:
+		s.prepared = nil


Use s.closePreparedStatement

lidavidm · 2025-12-30T00:51:45Z

go/adbc/driver/flightsql/flightsql_adbc_server_test.go

+func (srv *BulkIngestTestServer) GetIngestedData() []arrow.RecordBatch {
+	srv.mu.Lock()
+	defer srv.mu.Unlock()
+	return srv.ingestedData
+}


The lock doesn't really do anything here unless you also copy the slice

lidavidm · 2025-12-30T00:52:08Z

go/adbc/driver/flightsql/flightsql_adbc_server_test.go

+func (srv *BulkIngestTestServer) GetIngestRequests() []flightsql.StatementIngest {
+	srv.mu.Lock()
+	defer srv.mu.Unlock()
+	return srv.ingestRequests
+}


Initial attempt at implementing support for Bulk Ingest for ADBC Flig…

213cf6e

…ht SQL driver. Addresses apache#1107

prmoore77 requested review from lidavidm and zeroshade as code owners December 15, 2025 20:54

github-actions bot modified the milestone: ADBC Libraries 22 Dec 15, 2025

prmoore77 changed the title ~~Implement support for Bulk Ingest for ADBC Flight SQL driver~~ feat: Implement support for Bulk Ingest for ADBC Flight SQL driver Dec 15, 2025

lidavidm reviewed Dec 15, 2025

View reviewed changes

prmoore77 added 2 commits December 15, 2025 16:37

Updated to use Docker compose for GizmoSQL container per feedback by @…

5b002b2

…lidavidm

Updated to use Docker compose for GizmoSQL container per feedback by @…

d69fd2d

…lidavidm

prmoore77 requested a review from lidavidm December 15, 2025 21:42

prmoore77 and others added 3 commits December 23, 2025 15:46

Merge branch 'apache:main' into feature/flightsql-driver-bulk-ingest

9ce7a59

Added GizmoSQL to integration test setup for CI

ba52deb

More CI fixes

34483d6

lidavidm changed the title ~~feat: Implement support for Bulk Ingest for ADBC Flight SQL driver~~ feat(go/adbc/driver/flightsql): support bulk ingest Dec 29, 2025

lidavidm reviewed Dec 29, 2025

View reviewed changes

Made updates per Pull Request review feedback

30d345c

prmoore77 requested a review from lidavidm December 29, 2025 17:50

lidavidm reviewed Dec 30, 2025

View reviewed changes

		class TestBulkIngestDBAPI:
		"""Test bulk ingest using the DBAPI interface."""

feat(go/adbc/driver/flightsql): support bulk ingest #3808

Are you sure you want to change the base?

feat(go/adbc/driver/flightsql): support bulk ingest #3808

Uh oh!

Conversation

prmoore77 commented Dec 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prmoore77 commented Dec 16, 2025

Uh oh!

prmoore77 commented Dec 22, 2025

Uh oh!

prmoore77 commented Dec 23, 2025

Uh oh!

prmoore77 commented Dec 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prmoore77 commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants