Skip to content

duckdb/odbc-scanner

Repository files navigation

DuckDB ODBC Scanner

ODBC Scanner extension allows to connect to other databases (using their ODBC drivers) and run queries with odbc_query table function.

Outline:

Installation

odbc_scanner is built on DuckDB C API and can be installed on DuckDB version 1.2.0 or any newer version the following way (use the URL with 1.2.0 version in it even if you are running later version of DuckDB):

INSTALL 'http://nightly-extensions.duckdb.org/v1.2.0/<platform>/odbc_scanner.duckdb_extension.gz';

Where the <platform> is one of:

  • linux_amd64
  • linux_arm64
  • linux_amd64_musl
  • osx_amd64
  • osx_arm64
  • windows_amd64
  • windows_arm64

To update installed extension to the latest version run:

FORCE INSTALL 'http://nightly-extensions.duckdb.org/v1.2.0/<platform>/odbc_scanner.duckdb_extension.gz';

Installed version (commit ID) can be checked using the following query:

SELECT * FROM duckdb_extensions() WHERE extension_name = 'odbc_scanner';

Usage example

-- load extension
LOAD odbc_scanner;

-- open ODBC connection to a remote DB
SET VARIABLE conn = odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger;');

-- simple query
SELECT * FROM odbc_query(getvariable('conn'), 'SELECT SYSTIMESTAMP FROM dual');

-- query with parameters
SELECT * FROM odbc_query(getvariable('conn'), 
  'SELECT CAST(? AS NVARCHAR2(2)) || CAST(? AS VARCHAR2(5)) FROM dual',
  params=row('🦆', 'quack')
);

-- close connection
SELECT odbc_close(getvariable('conn'));

DBMS-specific types support status

Functions

odbc_bind_params

odbc_bind_params(conn_handle BIGINT, params_handle BIGINT, params STRUCT) -> BIGINT

Binds specified parameter values to the specified parameters handle. Only necessary with 2-step parameters binding, see Query parameters for details.

Parameters:

  • conn_handle (BIGINT): ODBC connection handle created with odbc_connect
  • params_handle (BIGINT): parameters handle created with odbc_create_params
  • params (STRUCT): parameters values

Returns:

Parameters handle (BIGINT), the same one that was passed as a second argument.

Example:

SELECT odbc_bind_params(getvariable('conn'), getvariable('params1'), row(42, 'foo'))

odbc_close

odbc_close(conn_handle BIGINT) -> VARCHAR

Closes specified ODBC connection to a remote DBMS. Does not throw errors if the connection is already closed.

Parameters:

  • conn_handle (BIGINT): ODBC connection handle created with odbc_connect

Returns:

Always returns NULL (VARCHAR).

Example:

SELECT odbc_close(getvariable('conn'))

odbc_connect

odbc_connect(conn_string VARCHAR) -> BIGINT

Opens an ODBC connection to a remote DBMS.

Parameters:

  • conn_string (VARCHAR): ODBC connection string, passed to the Driver Manager.

Returns:

Connection handle that can be placed into a VARIABLE. Connection is not closed automatically, must be closed with odbc_close.

Example:

SET VARIABLE conn = odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=system;PWD=tiger;')

odbc_create_params

odbc_create_params() -> BIGINT

Creates a parameters handle. Only necessary with 2-step parameters binding, see Query parameters for details.

Parameters:

None.

Returns:

Parameters handle (BIGINT). When the handle is passed to odbc_query it gets tied to the underlying prepared statement and is closed automatically when the statement is closed.

Example:

SET VARIABLE params1 = odbc_create_params()

odbc_list_data_sources

odbc_list_data_sources() -> TABLE(name VARCHAR, description VARCHAR, type VARCHAR)

Returns the list of ODBC data sources registered in the OS. Uses driver manager call SQLDataSources.

Parameters:

None.

Returns:

A table with the following columns:

  • name (VARCHAR): data source name
  • description (VARCHAR): data source description
  • type (VARCHAR): data source type, USER or SYSTEM

Example:

SELECT * FROM odbc_list_data_sources()

odbc_list_drivers

odbc_list_drivers() -> TABLE(description VARCHAR, attributes MAP(VARCHAR, VARCHAR))

Returns the list of ODBC drivers registered in the OS. Uses driver manager call SQLDrivers.

Parameters:

None.

Returns:

A table with the following columns:

  • description (VARCHAR): driver description
  • attributes (MAP(VARCHAR, VARCHAR)): driver attributes as a name->value map

Example:

SELECT * FROM odbc_list_drivers()

odbc_query

odbc_query(conn_handle BIGINT, query VARCHAR[, <optional named parameters>]) -> TABLE

Runs specified query in a remote DBMS and returns the query results table.

Parameters:

  • conn_handle (BIGINT): ODBC connection handle created with odbc_connect
  • query (VARCHAR): SQL query, passed to the remote DBMS

Optional named parameters that can be used to pass query parameters:

  • params (STRUCT): query parameters to pass to remote DBMS
  • params_handle (BIGINT): parameters handle created with odbc_create_params. Only used with 2-step parameters binding, see Query parameters for details.

Optional named parameters that can change types mapping:

odbc_scanner supports a number of options that can be used to change how the query parameters are passed and how the resulting data is handled. For known DBMSes these options are set automatically. They also can be passed as named parameters to odbc_query function to override the autoconfiguration:

  • decimal_columns_as_chars (BOOLEAN, default: false): read DECIMAL values as VARCHARs that are parsed back into DECIMALs before returning them to client
  • decimal_columns_precision_through_ard (BOOLEAN, default: false): when reading a DECIMAL specify its precision and scale through "Application Row Descriptor"
  • decimal_params_as_chars (BOOLEAN, default: false): pass DECIMAL parameters as VARCHARs
  • reset_stmt_before_execute (BOOLEAN, default: false): reset the prepared statement (using SQLFreeStmt(h, SQL_CLOSE)) before executing it
  • time_params_as_ss_time2 (BOOLEAN, default: false): pass TIME parameters as SQL Server's TIME2 values
  • timestamp_columns_as_timestamp_ns (BOOLEAN, default: false): read TIMESTAMP-like (TIMESTAMP WITH LOCAL TIME ZONE, DATETIME2, TIMESTAMP_NTZ etc) columns with nanosecond precision (with nine fractional digits)
  • timestamp_columns_with_typename_date_as_date (BOOLEAN, default: false): read TIMESTAMP columns that have a type name DATE as DuckDB DATEs
  • timestamp_max_fraction_precision (UTINYINT, default: 9): maximum number of fractional digits to use when reading a TIMESTAMP column with nanosecond precision
  • timestamp_params_as_sf_timestamp_ntz (BOOLEAN, default: false): pass TIMESTAMP parameters as Snowflake's TIMESTAMP_NTZ
  • timestamptz_params_as_ss_timestampoffset (BOOLEAN, default: false): pass TIMESTAMP_TZ parameters as SQL Server's DATETIMEOFFSET
  • var_len_data_single_part (BOOLEAN, default: false): read long VARCHAR or VARBINARY values as a single read (used when a driver does not support Retrieving Variable-Length Data in Parts)
  • var_len_params_long_threshold_bytes (UINTEGER, default: 4000): a length threshold after that SQL_WVARCHAR parameters are passed as SQL_WLONGVARCHAR

Returns:

A table with the query result.

Example:

SELECT * FROM odbc_query(getvariable('conn'), 
  'SELECT CAST(? AS NVARCHAR2(2)) || CAST(? AS VARCHAR2(5)) FROM dual',
  params=row('🦆', 'quack')
)

Connection string examples

Oracle:

Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger;

SQL Server:

Driver={MSSQL Driver};Server=tcp:127.0.0.1,1433;UID=sa;PWD=pwd;TrustServerCertificate=Yes;Database=test_db;

Snowflake:

Driver={Snowflake Driver};Server=foobar-ab12345.snowflakecomputing.com;Database=SNOWFLAKE_SAMPLE_DATA;UID=username;PWD=pwd;

DB2:

Driver={DB2 Driver};HostName=127.0.0.1;Port=50000;Database=testdb;UID=db2inst1;PWD=pwd;

PostgreSQL:

Driver={PostgreSQL Driver};Server=127.0.0.1;Port=5432;Username=postgres;Password=postgres;Database=test_db;

MySQL/MariaDB:

Driver={MariaDB Driver};SERVER=127.0.0.1;PORT=3306;USER=root;PASSWORD=root;DATABASE=test_db;

ClickHouse:

Driver={ClickHouse Driver};Server=127.0.0.1;Port=8123;

Spark:

Driver={Spark Driver};Host=127.0.0.1;Port=10000;

Arrow Flight SQL (Dremio ODBC + GizmoSQL):

Driver={FlightSQL Driver};Host=127.0.0.1;Port=31337;UID=gizmosql_username;PWD=gizmosql_password;useEncryption=true;

Note: the Driver parameter depends on how the driver is registered in odbcinst.ini (or Windows registry) and may be different. odbc_list_drivers can be used to get drivers names.

Query parameters

When a DuckDB query is run using prepared statement, it is possible to pass input parameters from the client code. odbc_scanner allows to forward such input parameters over ODBC API to the queries to remote databases.

odbc_scanner supports 2 methods of passing query parameters, using either params or params_handle named argument to odbc_query function.

params argument takes a STRUCT value as an input. Struct field names are ignored, so the row() function can be used to create a STRUCT value inline:

SELECT * FROM odbc_query(
  getvariable('conn'),
  '
    SELECT CAST(? AS VARCHAR2(3)) || CAST(? AS VARCHAR2(3)) FROM dual
  ', 
  params=row(?, ?))

If we prepare this query with duckdb_prepare(), bind foo and bar VARCHAR values to it with duckdb_bind_value() and execute it with duckdb_execute_prepared() - the input parameters foo and bar will be forwarded to the ODBC query in the remote DB.

The problem with this approach, is that DuckDB is unable to resolve parameter types (specified in the outer query) before duckdb_execute_prepared() is called - such types may be different in subsequent invocations of duckdb_execute_prepared() and there is no way to specify these types explicitly.

This will result in re-preparing the inner query in remote DB every time duckdb_execute_prepared() is called.

To avoid this problem is it possible to use 2-step parameter binding with params_handle named argument to odbc_query:

-- create parameters handle
SET VARIABLE params = odbc_create_params();

-- when 'duckdb_prepare()' is called, the inner query will be prepared in the remote DB
SELECT * FROM odbc_query(
  getvariable('conn'),
  '
    SELECT CAST(? AS VARCHAR2(3)) || CAST(? AS VARCHAR2(3)) FROM dual
  ', 
  params_handle=getvariable('params'));

-- now we can repeatedly bind new parameters to the handle using 'odbc_bind_params()'
-- and call 'duckdb_execute_prepared()' to run the prepared query with
-- these new parameters in remote DB
SELECT odbc_bind_params(getvariable('conn'), getvariable('params'), row(?, ?));

Parameter handle is tied to the prepared statement and will be freed when the statement is destroyed.

Connections and concurrency

DuckDB uses a multi-threaded execution engine to run parts of the query in parallel. ODBC drivers may or may not support using the same connection from different threads concurrently. To prevent possible concurrency problems odbc_scanner does not allow to use the same connection from multiple threads. For example, the following query:

SELECT * FROM odbc_query(getvariable('conn'), 'SELECT ''foo'' col1 FROM dual')
UNION ALL
SELECT * FROM odbc_query(getvariable('conn'), 'SELECT ''bar'' col1 FROM dual')

will fail with:

Invalid Input Error:
'odbc_query' error: ODBC connection not found on global init, id: 139760181976192

This can be avoided by using multiple ODBC connections:

SELECT * FROM odbc_query(getvariable('conn1'), 'SELECT ''foo'' col1 FROM dual')
UNION ALL
SELECT * FROM odbc_query(getvariable('conn2'), 'SELECT ''bar'' col1 FROM dual')

Or by disabling mutli-threaded execution setting threads DuckDB option to 1.

Performance

ODBC is not a high-performance API, odbc_scanner uses multiple API calls per-row and performs UCS-2 to UTF-8 conversion for every VARCHAR value. Besides that, odbc_query processing is strictly single-threaded.

When submitting issues related only to performance please check the performance in comparable scenarios, for example with pyodbc.

Building

Dependencies:

  • Python3
  • Python3-venv
  • GNU Make
  • CMake
  • unixODBC

Building is a two-step process. Firstly run:

make configure

This will ensure a Python venv is set up with DuckDB and DuckDB's test runner installed. Additionally, depending on configuration, DuckDB will be used to determine the correct platform for which you are compiling.

Then, to build the extension run:

make debug

This produces a shared library in target/debug/<shared_lib_name>. After this step, a script is run to transform the shared library into a loadable extension by appending a binary footer. The resulting extension is written to the build/debug directory.

To create optimized release binaries, simply run make release instead.

About

DuckDB ODBC Scanner extension

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •