DbType2 by Jolanrensen · Pull Request #1632 · Kotlin/dataframe

Jolanrensen · 2025-12-11T15:47:20Z

Fixes #1273
Fixes #1587
Fixes #461
Fixes #537

Helps #387

Takes care of #462, but only for DuckDB.

Follows up on #1266 and #462

`DbType` was overhauled greatly:

Removed convertSqlTypeToColumnSchemaValue and convertSqlTypeToKType in favor of getExpectedJdbcType
removed extractValueFromResultSet in favor of getValueFromResultSet
introduced a new streamlined structure for reading from a ResultSet in DbType:
- getValueFromResultSet() retrieves a value from a ResultSet given a column and row with the type given by getExpectedJdbcType()
- Next, the value is optionally preprocessed by preprocessValue() with the new type given by getPreprocessedValueType()
- Finally, the values are turned into a DataFrame column by buildDataColumn() with the new schema given by getTargetColumnSchema()
In debug mode, the schema is checked at runtime.
Column names might change, according to Use ColumnNameGenerator for consistent column name repair across IO readers #387
By default, we now preprocess Java's LocalDateTime to Kotlin's LocalDateTime, java.sql.TimeStamp to Kotlin Instant and UUIDs to Kotlin Uuid. More preprocessing might follow later.

AdvancedDbType

Introduced AdvancedDbType + JdbcToDataFrameConverter for easier Type-conversion logic

DuckDB

Completely rewritten to AdvancedDbType
Supports all DuckDB types including Structs, Maps, Lists/Arrays (with recursive conversion!) and JSON Define how JSON database types should be mapped to DataFrame (String vs ColumnGroup) #462
This now makes it the most advanced SQL database we support completely!

Other:

BuildConfig now includes the module name to not clash with other modules.
Added support for all PostgreSQL extension types Add support for PostgreSQL geometry types #537
Ran all local DB tests

…ableColumnMetadata` in `generateTypeInformation()`. This contains potential pre- and post-processing logic for any type

…recursive preprocessing

…ypes

…s, that use JdbcTypeMapping

…some types by default

zaleslaw · 2026-01-27T10:09:07Z

-    val columnKTypes = buildColumnKTypes(tableColumns, dbType)
-    val columnData = readAllRowsFromResultSet(rs, tableColumns, columnKTypes, dbType, limit)
-    val dataFrame = buildDataFrameFromColumnData(columnData, tableColumns, columnKTypes, dbType, inferNullability)
+    val expectedJdbcTypes = getExpectedJdbcTypes(


ExpectedJdbcTypes could be more complex structure, but better to keep order of column according indicies for debugging, not name-based, it could be edge object with fields: index, name, KType for example

That's certainly possible! Though that would look a bit more like AdvancedDbType which, for each column, requires you to provide an AnyJdbcToDataFrameConverter containing all information needed to read and convert that column. Maybe the concept could be merged

zaleslaw · 2026-01-27T10:10:24Z

+        dbType = dbType,
+        tableColumns = tableColumns,
+    )
+    val preprocessedValueTypes = getPreprocessedValueTypes(


Describe somewhere the processes

Expected->Preprocessed (what's the difference and why we need this step, what does it give)

zaleslaw · 2026-01-27T10:21:25Z

 ): DataFrameSchema {
    val determinedDbType = dbType ?: extractDBTypeFromConnection(connection)

+    // TODO don't need to read 1 row, take it just from TableColumnMetadatas


It's very safe and cheap way for any database, I moved from taking info from TableColumnMetadata, also lead to less error - prone in our codebase and make it flexible

Except for empty databases that just have a schema and no data. When building just the schema, we shouldn't have to look at the actual database contents. It's up to the DbType implementor to provide a watertight getExpectedJdbcType(), getPreprocessedValueType(), and getTargetColumnSchema() from just TableColumnMetadata. So if we manage to construct TableColumnMetadatas from connection, tableName and dbType, we can simply call those functions to get a schema without accessing any data.

...and for databases you may have no query access to

zaleslaw · 2026-01-27T10:25:20Z

+        columnIndex: Int,
+        tableColumnMetadata: TableColumnMetadata,
+        expectedJdbcType: KType,
+    ): J? =


J-DBC. It's clearer in the context of AdvancedDbType:

J-DBC type

D-ataFrame type

P-ost processed type

…ed. Added `getDataFrameCompatibleColumnNames()` function to handle missing or duplicate names, as they can apparently appear from sql but will break DF

…d lots of kdocs

…lder-like pattern

Jolanrensen · 2026-02-12T14:51:27Z

@zaleslaw I'm not sure what to do about java.time.OffsetDateTime and java.time.OffsetTime. Do you know if there's a kotlin equivalent?

…g the process

…classes for each module

Copilot

Pull request overview

This PR represents a major refactoring of the JDBC module's type inference and conversion system, introducing a new three-stage pipeline architecture for converting database types to DataFrame columns. The changes address several long-standing issues related to type conversions, nested types (STRUCT, ARRAY, MAP), and date/time handling.

Changes:

Introduced JdbcToDataFrameConverter to encapsulate the three-stage type conversion pipeline (JDBC type → preprocessed value → final DataColumn)
Added AdvancedDbType abstract class for databases with complex type systems (like DuckDB)
Refactored DbType API: removed convertSqlTypeToKType/convertSqlTypeToColumnSchemaValue in favor of getExpectedJdbcType/getPreprocessedValueType/getTargetColumnSchema/buildDataColumn
Updated DuckDB implementation to support STRUCT, MAP, ARRAY, and JSON types with proper conversions
Changed column name deduplication from underscore-separated ("name_1") to no separator ("name1")
Updated BuildConfig package naming to include project name
Updated DuckDB version from 1.3.1.0 to 1.4.2.0
Converted Java types to Kotlin equivalents (java.sql.Timestamp → kotlin.time.Instant, java.util.UUID → kotlin.uuid.Uuid, etc.)

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt	Core refactoring: new API methods for type conversion pipeline, removed old methods
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/AdvancedDbType.kt	New abstract class for databases with complex type mappings using JdbcToDataFrameConverter
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/JdbcToDataFrameConverter.kt	New converter infrastructure with type-safe preprocessing and column building
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt	Major refactoring to extend AdvancedDbType, add support for STRUCT/MAP/ARRAY/JSON types
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt	Refactored to use new three-stage pipeline, added schema validation in DEBUG mode
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readDataFrameSchema.kt	Updated to use new API methods
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/*.kt	Updated all database implementations (PostgreSql, MySql, MariaDb, MsSql, H2, Sqlite) to new API
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/local/duckDbTest.kt	Updated tests for new type conversions and nested types support
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/h2/*.kt	Updated expected types to match Kotlin conversions (Timestamp → Instant)
build-logic/src/main/kotlin/dfbuild.buildConfig.gradle.kts	Changed BuildConfig package naming to include project name
gradle/libs.versions.toml	Updated DuckDB version to 1.4.2.0
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/ColumnNameGenerator.kt	Column name deduplication now uses numeric suffixes without separators

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…d tests

zaleslaw · 2026-02-17T14:54:07Z

        }

-        schema.compare(Person.expected.schema()).isSuperOrMatches() shouldBe true
+        withClue({


It looks like we have code in commonTestScenarios with the same intention, could we move this schema validation there unify it together somehow?

Jolanrensen · 2026-02-19T14:41:13Z

@zaleslaw I reverted DbType.getValueFromResultset() back to its original implementation. It turns out rs.getObject(i, javaClass) fails when javaClass is a supertype the actual type of the column in question. That delivers a whole bunch of unnecessary exceptions.

Instead, I fixed #537 and its "money" type by only calling rs.getObject(i, javaClass) inside PostgreSql : DbType for the PGobject extension types. This satisfies all tests consistently without causing potential issues with other SQL types.

All local tests pass now

Jolanrensen added 6 commits December 9, 2025 15:00

attempt at converting duckdb to column conversions. WIP

67a1ef8

Refactored DbType to use DbColumnTypeInformation, generated from `T…

210af02

…ableColumnMetadata` in `generateTypeInformation()`. This contains potential pre- and post-processing logic for any type

converted DuckDb to new preprocessing DbType. Turns out I might need …

d21dc79

…recursive preprocessing

wip DuckDb nested preprocessing

0ee7024

added memoization for TableColumnMetadata -> AnyDbColumnTypeInformation

f96234d

renaming, added extra constructors for TypeInformation, restricting t…

d1b655b

…ypes

Jolanrensen added enhancement New feature or request databases JDBC related issues labels Dec 11, 2025

fixed duckdb tests

4d8acfb

Jolanrensen force-pushed the DbType2 branch from c4f7549 to 4d8acfb Compare December 11, 2025 15:49

koperagen reviewed Dec 11, 2025

View reviewed changes

Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt Outdated

added jdbc source type parameter

6539830

Jolanrensen force-pushed the DbType2 branch 2 times, most recently from 6b39cc6 to 9c9a699 Compare December 15, 2025 13:27

struct parsing for duckdb working!

e114fca

Jolanrensen force-pushed the DbType2 branch from 9c9a699 to e114fca Compare December 15, 2025 13:29

Jolanrensen added 6 commits December 16, 2025 12:21

merging column creation and post processing dbType

1a9f581

created AdvancedDbType so we can have "simple" and "advanced" db type…

6bf27fb

…s, that use JdbcTypeMapping

added resultSetReader option to JdbcToDataFrameConverter, converting …

d678032

…some types by default

exploring struct/composite types for postgresql

736fe79

added duckDb STRUCT[] column to FrameColumn conversion

be83cc5

Merge branch 'master' into DbType2

2a6b620

zaleslaw requested changes Jan 27, 2026

View reviewed changes

to support runtime json parsing for duckdb, allow targetSchema = null.

7b1c5af

Jolanrensen force-pushed the DbType2 branch from 31a4914 to 7b1c5af Compare January 28, 2026 12:34

Jolanrensen added 5 commits February 10, 2026 21:35

reverted changes from name-based jdbc columns back to order/index bas…

1d2494f

…ed. Added `getDataFrameCompatibleColumnNames()` function to handle missing or duplicate names, as they can apparently appear from sql but will break DF

Merge branch 'master' into DbType2

df9dd90

Merge branch 'master' into DbType2

0aa0c15

made checkSchema run only in debug builds

a9aee70

simplified DbType and JdbcToDataFrameConverter typing situation, adde…

5c8ccdc

…d lots of kdocs

simplified api for creating JdbcToDataFrameConverter instances to bui…

05539d9

…lder-like pattern

Jolanrensen force-pushed the DbType2 branch from 0e7523b to 05539d9 Compare February 12, 2026 14:48

Jolanrensen marked this pull request as ready for review February 12, 2026 14:49

Jolanrensen requested review from koperagen and zaleslaw February 12, 2026 14:49

Jolanrensen added 2 commits February 12, 2026 15:54

enabled buildConfig for dataframe-jupyter too

bf00824

Added some more kdocs to fetchAndConvertDataFromResultSet() explainin…

6e04da6

…g the process

Jolanrensen force-pushed the DbType2 branch from 3e34c41 to 6e04da6 Compare February 12, 2026 15:42

changed buildConfig convention plugin to generate unique BuildConfig …

cc64e3b

…classes for each module

Jolanrensen force-pushed the DbType2 branch from 53c7d7e to cc64e3b Compare February 12, 2026 20:43

Jolanrensen requested a review from Copilot February 12, 2026 22:19

Copilot started reviewing on behalf of Jolanrensen February 12, 2026 22:20 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Jolanrensen added 2 commits February 13, 2026 13:37

added CacheKey for AdvancedDbType as recommended by copilot

0ef8f55

fixed parsing Struct types for DuckDb as recommended by copilot. Adde…

d26940a

…d tests

Jolanrensen force-pushed the DbType2 branch from 53d134a to d26940a Compare February 13, 2026 12:42

Jolanrensen changed the title ~~Work in progress: DbType2~~ DbType2 Feb 13, 2026

zaleslaw requested changes Feb 17, 2026

View reviewed changes

Merge branch 'master' into DbType2

53e6efa

Jolanrensen force-pushed the DbType2 branch from ed38da8 to 6b398b4 Compare February 18, 2026 13:45

fixing types for local db tests

f8359a6

Jolanrensen force-pushed the DbType2 branch 2 times, most recently from 9159c44 to 6b1802e Compare February 18, 2026 16:52

fixing postgres types and enabling all extension types #537

3ab6bd0

Jolanrensen force-pushed the DbType2 branch from 6b1802e to 3ab6bd0 Compare February 19, 2026 14:35

Jolanrensen added this to the 1.0.0-Beta5 milestone Feb 19, 2026

Jolanrensen merged commit d29de66 into master Feb 19, 2026
8 of 9 checks passed

Jolanrensen mentioned this pull request Feb 19, 2026

Update JDBC documentation for Beta5 #1703

Closed

Conversation

Jolanrensen commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DbType was overhauled greatly:

AdvancedDbType

DuckDB

Other:

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jolanrensen commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jolanrensen commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jolanrensen commented Dec 11, 2025 •

edited

Loading

`DbType` was overhauled greatly:

Jolanrensen commented Feb 19, 2026 •

edited

Loading