Skip to content

DbType2#1632

Merged
Jolanrensen merged 33 commits into
masterfrom
DbType2
Feb 19, 2026
Merged

DbType2#1632
Jolanrensen merged 33 commits into
masterfrom
DbType2

Conversation

@Jolanrensen

@Jolanrensen Jolanrensen commented Dec 11, 2025

Copy link
Copy Markdown
Collaborator

Fixes #1273
Fixes #1587
Fixes #461
Fixes #537

Helps #387

Takes care of #462, but only for DuckDB.

Follows up on #1266 and #462

DbType was overhauled greatly:

  • Removed convertSqlTypeToColumnSchemaValue and convertSqlTypeToKType in favor of getExpectedJdbcType
  • removed extractValueFromResultSet in favor of getValueFromResultSet
  • introduced a new streamlined structure for reading from a ResultSet in DbType:
    • getValueFromResultSet() retrieves a value from a ResultSet given a column and row with the type given by getExpectedJdbcType()
    • Next, the value is optionally preprocessed by preprocessValue() with the new type given by getPreprocessedValueType()
    • Finally, the values are turned into a DataFrame column by buildDataColumn() with the new schema given by getTargetColumnSchema()
  • In debug mode, the schema is checked at runtime.
  • Column names might change, according to Use ColumnNameGenerator for consistent column name repair across IO readers #387
  • By default, we now preprocess Java's LocalDateTime to Kotlin's LocalDateTime, java.sql.TimeStamp to Kotlin Instant and UUIDs to Kotlin Uuid. More preprocessing might follow later.

AdvancedDbType

  • Introduced AdvancedDbType + JdbcToDataFrameConverter for easier Type-conversion logic

DuckDB

Other:

@Jolanrensen Jolanrensen added enhancement New feature or request databases JDBC related issues labels Dec 11, 2025
Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt Outdated
@Jolanrensen Jolanrensen force-pushed the DbType2 branch 2 times, most recently from 6b39cc6 to 9c9a699 Compare December 15, 2025 13:27
Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt Outdated
val columnKTypes = buildColumnKTypes(tableColumns, dbType)
val columnData = readAllRowsFromResultSet(rs, tableColumns, columnKTypes, dbType, limit)
val dataFrame = buildDataFrameFromColumnData(columnData, tableColumns, columnKTypes, dbType, inferNullability)
val expectedJdbcTypes = getExpectedJdbcTypes(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpectedJdbcTypes could be more complex structure, but better to keep order of column according indicies for debugging, not name-based, it could be edge object with fields: index, name, KType for example

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly possible! Though that would look a bit more like AdvancedDbType which, for each column, requires you to provide an AnyJdbcToDataFrameConverter containing all information needed to read and convert that column. Maybe the concept could be merged

dbType = dbType,
tableColumns = tableColumns,
)
val preprocessedValueTypes = getPreprocessedValueTypes(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe somewhere the processes

Expected->Preprocessed (what's the difference and why we need this step, what does it give)

): DataFrameSchema {
val determinedDbType = dbType ?: extractDBTypeFromConnection(connection)

// TODO don't need to read 1 row, take it just from TableColumnMetadatas

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very safe and cheap way for any database, I moved from taking info from TableColumnMetadata, also lead to less error - prone in our codebase and make it flexible

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for empty databases that just have a schema and no data. When building just the schema, we shouldn't have to look at the actual database contents. It's up to the DbType implementor to provide a watertight getExpectedJdbcType(), getPreprocessedValueType(), and getTargetColumnSchema() from just TableColumnMetadata. So if we manage to construct TableColumnMetadatas from connection, tableName and dbType, we can simply call those functions to get a schema without accessing any data.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and for databases you may have no query access to

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnIndex: Int,
tableColumnMetadata: TableColumnMetadata,
expectedJdbcType: KType,
): J? =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J-DBC. It's clearer in the context of AdvancedDbType:

  • J-DBC type
  • D-ataFrame type
  • P-ost processed type

@Jolanrensen Jolanrensen marked this pull request as ready for review February 12, 2026 14:49
@Jolanrensen

Copy link
Copy Markdown
Collaborator Author

@zaleslaw I'm not sure what to do about java.time.OffsetDateTime and java.time.OffsetTime. Do you know if there's a kotlin equivalent?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR represents a major refactoring of the JDBC module's type inference and conversion system, introducing a new three-stage pipeline architecture for converting database types to DataFrame columns. The changes address several long-standing issues related to type conversions, nested types (STRUCT, ARRAY, MAP), and date/time handling.

Changes:

  • Introduced JdbcToDataFrameConverter to encapsulate the three-stage type conversion pipeline (JDBC type → preprocessed value → final DataColumn)
  • Added AdvancedDbType abstract class for databases with complex type systems (like DuckDB)
  • Refactored DbType API: removed convertSqlTypeToKType/convertSqlTypeToColumnSchemaValue in favor of getExpectedJdbcType/getPreprocessedValueType/getTargetColumnSchema/buildDataColumn
  • Updated DuckDB implementation to support STRUCT, MAP, ARRAY, and JSON types with proper conversions
  • Changed column name deduplication from underscore-separated ("name_1") to no separator ("name1")
  • Updated BuildConfig package naming to include project name
  • Updated DuckDB version from 1.3.1.0 to 1.4.2.0
  • Converted Java types to Kotlin equivalents (java.sql.Timestamp → kotlin.time.Instant, java.util.UUID → kotlin.uuid.Uuid, etc.)

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt Core refactoring: new API methods for type conversion pipeline, removed old methods
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/AdvancedDbType.kt New abstract class for databases with complex type mappings using JdbcToDataFrameConverter
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/JdbcToDataFrameConverter.kt New converter infrastructure with type-safe preprocessing and column building
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt Major refactoring to extend AdvancedDbType, add support for STRUCT/MAP/ARRAY/JSON types
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt Refactored to use new three-stage pipeline, added schema validation in DEBUG mode
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readDataFrameSchema.kt Updated to use new API methods
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/*.kt Updated all database implementations (PostgreSql, MySql, MariaDb, MsSql, H2, Sqlite) to new API
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/local/duckDbTest.kt Updated tests for new type conversions and nested types support
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/h2/*.kt Updated expected types to match Kotlin conversions (Timestamp → Instant)
build-logic/src/main/kotlin/dfbuild.buildConfig.gradle.kts Changed BuildConfig package naming to include project name
gradle/libs.versions.toml Updated DuckDB version to 1.4.2.0
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/ColumnNameGenerator.kt Column name deduplication now uses numeric suffixes without separators

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt Outdated
Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DuckDb.kt Outdated
Comment thread build-logic/src/main/kotlin/dfbuild.buildConfig.gradle.kts
Comment thread gradle/libs.versions.toml
Comment thread dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/DbType.kt Outdated
@Jolanrensen Jolanrensen changed the title Work in progress: DbType2 DbType2 Feb 13, 2026
}

schema.compare(Person.expected.schema()).isSuperOrMatches() shouldBe true
withClue({

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have code in commonTestScenarios with the same intention, could we move this schema validation there unify it together somehow?

@Jolanrensen Jolanrensen force-pushed the DbType2 branch 2 times, most recently from 9159c44 to 6b1802e Compare February 18, 2026 16:52
@Jolanrensen

Jolanrensen commented Feb 19, 2026

Copy link
Copy Markdown
Collaborator Author

@zaleslaw I reverted DbType.getValueFromResultset() back to its original implementation. It turns out rs.getObject(i, javaClass) fails when javaClass is a supertype the actual type of the column in question. That delivers a whole bunch of unnecessary exceptions.

Instead, I fixed #537 and its "money" type by only calling rs.getObject(i, javaClass) inside PostgreSql : DbType for the PGobject extension types. This satisfies all tests consistently without causing potential issues with other SQL types.

All local tests pass now

@Jolanrensen Jolanrensen added this to the 1.0.0-Beta5 milestone Feb 19, 2026
@Jolanrensen Jolanrensen merged commit d29de66 into master Feb 19, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

databases JDBC related issues enhancement New feature or request

Projects

None yet

4 participants