Skip to content

Conversation

@mach-kernel
Copy link

Greetings!

We're writing a large amount of binary data and wanted to (for performance) eliminate a copy/serialization step that took our Java byte[] and made them in to a hex code escaped string. To achieve this with the JDBC driver, we:

  • Added a DuckDBAppender#append method that accepts byte[]
  • Made the corresponding JNI call treat the value as BLOB_RAW
    • In Java: without getBytes and an encoding
    • In CXX: without going from std::string -> cstr
  • Added a test to ensure byte[] can be round-tripped losslessly.

To better illustrate the issue, a modified version of this test fails on trunk (using a deprecated String constructor that AFAICT does not apply any encoding) :

    public static void test_appender_roundtrip_blob() throws Exception {
        DuckDBConnection conn = DriverManager.getConnection(JDBC_URL).unwrap(DuckDBConnection.class);
        Statement stmt = conn.createStatement();

        stmt.execute("CREATE TABLE data (a BLOB)");

        DuckDBAppender appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "data");
        SecureRandom random = SecureRandom.getInstanceStrong();
        byte[] data = new byte[512];
        random.nextBytes(data);

        appender.beginRow();
        // This does not apply a charset encoding
        appender.append(new String(data, 0));
        appender.endRow();
        appender.flush();
        appender.close();

        ResultSet results = stmt.executeQuery("SELECT * FROM data");
        assertTrue(results.next());

        byte[] resultBlob = results.getBlob(1).getBinaryStream().readAllBytes();
        assertTrue(Arrays.equals(resultBlob, data), "byte[] data is round tripped untouched");

        results.close();
        stmt.close();
        conn.close();
    }
TestDuckDBJDBC#test_appender_roundtrip_blob failed with java.sql.SQLException: Invalid Input Error: Failed to cast value: Invalid byte encountered in STRING örÇoÀùuãÅUïϾ1è
java.sql.SQLException: Invalid Input Error: Failed to cast value: Invalid byte encountered in STRING -> BLOB conversion of string "#íiæþb­ð}´éA¡a
 þ;·p°g ,Çüó[,@Ä9hMÑú×zÇ       ý2?)8Lâ¦PôÞIõ!»C\         Y". All non-ascii characters must be escaped with hex codes (e.g. \xAA)
        at org.duckdb.DuckDBNative.duckdb_jdbc_appender_append_string(Native Method)
        at org.duckdb.DuckDBAppender.append(DuckDBAppender.java:84)
        at org.duckdb.TestDuckDBJDBC.test_appender_roundtrip_blob(TestDuckDBJDBC.java:2775)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.duckdb.test.Runner.runTests(Runner.java:45)
        at org.duckdb.TestDuckDBJDBC.main(TestDuckDBJDBC.java:4637)

David Stancu and others added 2 commits December 5, 2024 13:28
@Mause
Copy link
Member

Mause commented Dec 10, 2024

Heya, thanks for submitting this! Are you able to look into the build failure? Looks like you're using a method that isn't available in Java 8

@mach-kernel
Copy link
Author

Sure!! Just pushed up a fix, let me know if it doesn't work. 😸

@Mause Mause merged commit 998d0e4 into duckdb:main Dec 13, 2024
7 checks passed
@Mause
Copy link
Member

Mause commented Dec 13, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants