Skip to content

tidbcloud/lake-jdbc

Repository files navigation

TiDB Cloud Lake JDBC

Apache License 2.0 lake-jdbc

Highlights

  • Lake-specific interfaces to stream files into tables or stages with loadStreamToTable, uploadStream, and downloadStream.
  • Temporal APIs use session timezone to avoid depending on JVM default zone and support modern java.time.

Prerequisites

The TiDB Cloud Lake JDBC driver requires Java 8 or later. If the minimum required version of Java is not installed on the client machines where the JDBC driver is installed, you must install either Oracle Java or OpenJDK.

Installation

Maven

Add following code block as a dependency

<dependency>
    <groupId>com.tidbcloud</groupId>
    <artifactId>lake-jdbc</artifactId>
    <version>0.4.6</version>
</dependency>

Build from source

Note: build from source requires Java 11+, Maven 3.6.3+

cd lake-jdbc
mvn clean install -DskipTests

Testing

Start the local integration test environment from tests/Makefile:

cd tests
make up
make test

The default make test command runs lake-jdbc tests only.

Run integration tests with Arrow

To run tests with Arrow result pages, set LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow:

cd tests
make test LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow TEST_MVN_ARGS='-Dgroups=IT -DexcludedGroups=FLAKY'

When Arrow mode is enabled through make test, the required JVM options are added automatically:

--add-opens=java.base/java.nio=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true

If you run Maven directly instead of make test, you must set both the Arrow test environment variable and the JVM options yourself:

JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow \
mvn -pl lake-jdbc test -Dgroups=IT -DexcludedGroups=FLAKY

CI note:

  • Standalone Test runs the regular suite and an extra Arrow IT pass.
  • Cluster Tests runs the regular suite and an extra Arrow IT pass for each cluster matrix entry.

Download jar from maven central

You can download the latest version of the lake-jdbc driver here.

How to use

Set the connection environment variables before running the examples:

export LAKE_JDBC_URL='jdbc:lake://host:443/default?warehouse=your-warehouse&ssl=true&sslmode=enable'
export LAKE_JDBC_USER='user'
export LAKE_JDBC_PASSWORD='password'
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class Main {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL");
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        try ( Connection conn = openConnection();
              Statement statement = conn.createStatement()
            ) {
            statement.execute("SELECT number from numbers(200000) order by number");
            try(ResultSet rs = statement.getResultSet()){
                // ** We must call `rs.next()` otherwise the query may be canceled **
                while (rs.next()) {
                    System.out.println(rs.getInt(1));
                }
            }
        }
    }
}

Important Notes

  1. Close Connection/Statement/ResultSet to release resources faster.
  2. Because the select, copy into, merge into are query type SQL, they will return a ResultSet object, you must call rs.next() before accessing the data. Otherwise, the query may be canceled. If you do not want get the result, you can call while(r.next(){}) to iterate over the result set.
  3. For other SQL such as create/drop table non-query type SQL, you can call statement.execute() directly.

Connection Parameters

For detailed references, please take a look at the following Links:

  1. Connection Parameters : detailed documentation about how to use connection parameters in a jdbc connection

JDBC Java type mapping

The Lake type is mapped to Java type as follows:

Lake Type Java Type
TINYINT Byte
SMALLINT Short
INT Integer
BIGINT Long
UInt8 Short
UInt16 Integer
UInt32 Long
UInt64 BigInteger
Float32 Float
Float64 Double
Decimal BigDecimal
String String
Date LocalDate
Timestamp ZonedDateTime
Timestamp_TZ OffsetDateTime
Interval Duration
Geometry byte[]
Bitmap byte[]
Array String
Tuple String
Map String
VARIANT String

Temporal types

we recommend using java.time to avoid ambiguity and set/get values via these APIs:

void setObject(int parameterIndex, Object x)
<T> T getObject(int columnIndex, Class<T> type)
  • TIMESTAMP_TZ and TIMESTAMP map to OffsetDateTime, ZonedDateTime, Instant and LocalDateTime (TIMESTAMP_TZ can return OffsetDateTime but not ZonedDateTime).
  • Date maps to LocalDate
  • When parameters do not contain a timezone, Lake uses the session timezone (not the JVM zone) when storing/returning dates on lake-jdbc ≥ 0.4.3 AND Lake server ≥ 1.2.844.
  • Interval map to java.time.Duration.

old Timestamp/Date are also supported, note that:

  • getTimestamp(int, Calendar cal) is equivalent to getTimestamp(int) (the cal is omitted) and getObject(int, Instant.classes).toTimestamp()
  • setTimestamp(int, Calendar cal) is diff with setTimestamp(int), the epoch is adjusted according to timezone in cal
  • setDate/getDate still use the JVM timezone, getDate(1) is equivalent to Date.valueOf(getObject(1, LocalDate.class)), setDate(1, date) is equivalent to setObject(1, date.toLocalDate()).

Unwrapping to Lake-specific interfaces

interface LakeConnection

The following code shows how to unwrap a JDBC Connection object to expose the methods of the LakeConnection interface.

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.SQLException;
import com.tidbcloud.jdbc.LakeConnection;

public class UnwrapExample {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL");
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        try (Connection conn = openConnection()) {
            LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
        }
    }
}

method loadStreamToTable

int loadStreamToTable(String sql, InputStream inputStream, long fileSize, LoadMethod loadMethod) throws SQLException;

Load data from a stream directly into a table, using either a staging or streaming approach.

Available with lake-jdbc >= 0.4.1 AND Lake server >= 1.2.791.

Parameters:

  • sql: SQL statement with specific syntax for data loading, use special stage _databend_load
  • inputStream: The input stream of the file data to load
  • fileSize: The size of the file being loaded
  • loadMethod: LoadMethod.STREAMING or LoadMethod.STAGE
    • STAGE: first upload file to a special path in user stage, then load the file in stage in to table, Limited by the max object size of storage of the stage.
      • the upload method is determined by connection parameter presigned_url_disabled.
    • STREAMING load data to while transforming data in one http request. Limited by server memory when load large Parquet/Orc file, whose meta is at the file end.

Returns: Number of rows successfully loaded

example:

import java.io.ByteArrayInputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import com.tidbcloud.jdbc.LakeConnection;

public class LoadStreamExample {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL") + "&presigned_url_disabled=true";
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        byte[] csv = "1,hello\n2,world\n".getBytes(java.nio.charset.StandardCharsets.UTF_8);
        String tableName = "readme_load_example";
        try (Connection conn = openConnection();
             Statement stmt = conn.createStatement();
             ByteArrayInputStream fileStream = new ByteArrayInputStream(csv)) {
            LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
            stmt.execute("create or replace table " + tableName + " (id int, value string)");

            // use special stage `_databend_load`
            String sql = "insert into " + tableName + " from @_databend_load file_format=(type=csv)";

            lakeConnection.loadStreamToTable(
                    sql,
                    fileStream,
                    csv.length,
                    LakeConnection.LoadMethod.STAGE);
            stmt.executeQuery("select count(*) from " + tableName);
        }
    }
}

Use Arrow Result Format

By default, the driver fetches query results in JSON format. To enable Arrow over HTTP, add query_result_format=arrow to the JDBC URL:

String url = System.getenv("LAKE_JDBC_URL");
if (!url.contains("query_result_format=")) {
    url = url + (url.contains("?") ? "&" : "?") + "query_result_format=arrow";
}
Connection conn = DriverManager.getConnection(
    url,
    System.getenv("LAKE_JDBC_USER"),
    System.getenv("LAKE_JDBC_PASSWORD"));

Arrow mode is intended for query result fetching. Internal control queries still use JSON when needed.

Requirements:

  1. Lake server must support Arrow result pages.
  2. The JVM must allow Arrow to access java.nio internals.

Before starting your application, set:

export JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true'

If you do not want to set JAVA_TOOL_OPTIONS globally, pass the same options directly to java:

java --add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -jar your-app.jar

If query_result_format is not specified, the driver uses JSON.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages