TiDB Cloud Lake JDBC

Highlights

Lake-specific interfaces to stream files into tables or stages with loadStreamToTable, uploadStream, and downloadStream.
Temporal APIs use session timezone to avoid depending on JVM default zone and support modern java.time.

Prerequisites

The TiDB Cloud Lake JDBC driver requires Java 8 or later. If the minimum required version of Java is not installed on the client machines where the JDBC driver is installed, you must install either Oracle Java or OpenJDK.

Installation

Maven

Add following code block as a dependency

<dependency>
    <groupId>com.tidbcloud</groupId>
    <artifactId>lake-jdbc</artifactId>
    <version>0.4.6</version>
</dependency>

Build from source

Note: build from source requires Java 11+, Maven 3.6.3+

cd lake-jdbc
mvn clean install -DskipTests

Testing

Start the local integration test environment from tests/Makefile:

cd tests
make up
make test

The default make test command runs lake-jdbc tests only.

Run integration tests with Arrow

To run tests with Arrow result pages, set LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow:

cd tests
make test LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow TEST_MVN_ARGS='-Dgroups=IT -DexcludedGroups=FLAKY'

When Arrow mode is enabled through make test, the required JVM options are added automatically:

--add-opens=java.base/java.nio=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true

If you run Maven directly instead of make test, you must set both the Arrow test environment variable and the JVM options yourself:

JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow \
mvn -pl lake-jdbc test -Dgroups=IT -DexcludedGroups=FLAKY

CI note:

Standalone Test runs the regular suite and an extra Arrow IT pass.
Cluster Tests runs the regular suite and an extra Arrow IT pass for each cluster matrix entry.

Download jar from maven central

You can download the latest version of the lake-jdbc driver here.

How to use

Set the connection environment variables before running the examples:

export LAKE_JDBC_URL='jdbc:lake://host:443/default?warehouse=your-warehouse&ssl=true&sslmode=enable'
export LAKE_JDBC_USER='user'
export LAKE_JDBC_PASSWORD='password'

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class Main {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL");
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        try ( Connection conn = openConnection();
              Statement statement = conn.createStatement()
            ) {
            statement.execute("SELECT number from numbers(200000) order by number");
            try(ResultSet rs = statement.getResultSet()){
                // ** We must call `rs.next()` otherwise the query may be canceled **
                while (rs.next()) {
                    System.out.println(rs.getInt(1));
                }
            }
        }
    }
}

Important Notes

Close Connection/Statement/ResultSet to release resources faster.
Because the select, copy into, merge into are query type SQL, they will return a ResultSet object, you must call rs.next() before accessing the data. Otherwise, the query may be canceled. If you do not want get the result, you can call while(r.next(){}) to iterate over the result set.
For other SQL such as create/drop table non-query type SQL, you can call statement.execute() directly.

Connection Parameters

For detailed references, please take a look at the following Links:

Connection Parameters : detailed documentation about how to use connection parameters in a jdbc connection

JDBC Java type mapping

The Lake type is mapped to Java type as follows:

Lake Type	Java Type
TINYINT	Byte
SMALLINT	Short
INT	Integer
BIGINT	Long
UInt8	Short
UInt16	Integer
UInt32	Long
UInt64	BigInteger
Float32	Float
Float64	Double
Decimal	BigDecimal
String	String
Date	LocalDate
Timestamp	ZonedDateTime
Timestamp_TZ	OffsetDateTime
Interval	Duration
Geometry	byte[]
Bitmap	byte[]
Array	String
Tuple	String
Map	String
VARIANT	String

Temporal types

we recommend using java.time to avoid ambiguity and set/get values via these APIs:

void setObject(int parameterIndex, Object x)
<T> T getObject(int columnIndex, Class<T> type)

TIMESTAMP_TZ and TIMESTAMP map to OffsetDateTime, ZonedDateTime, Instant and LocalDateTime (TIMESTAMP_TZ can return OffsetDateTime but not ZonedDateTime).
Date maps to LocalDate
When parameters do not contain a timezone, Lake uses the session timezone (not the JVM zone) when storing/returning dates on lake-jdbc ≥ 0.4.3 AND Lake server ≥ 1.2.844.
Interval map to java.time.Duration.

old Timestamp/Date are also supported, note that:

getTimestamp(int, Calendar cal) is equivalent to getTimestamp(int) (the cal is omitted) and getObject(int, Instant.classes).toTimestamp()
setTimestamp(int, Calendar cal) is diff with setTimestamp(int), the epoch is adjusted according to timezone in cal
setDate/getDate still use the JVM timezone, getDate(1) is equivalent to Date.valueOf(getObject(1, LocalDate.class)), setDate(1, date) is equivalent to setObject(1, date.toLocalDate()).

Unwrapping to Lake-specific interfaces

interface LakeConnection

The following code shows how to unwrap a JDBC Connection object to expose the methods of the LakeConnection interface.

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.SQLException;
import com.tidbcloud.jdbc.LakeConnection;

public class UnwrapExample {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL");
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        try (Connection conn = openConnection()) {
            LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
        }
    }
}

method `loadStreamToTable`

int loadStreamToTable(String sql, InputStream inputStream, long fileSize, LoadMethod loadMethod) throws SQLException;

Load data from a stream directly into a table, using either a staging or streaming approach.

Available with lake-jdbc >= 0.4.1 AND Lake server >= 1.2.791.

Parameters:

sql: SQL statement with specific syntax for data loading, use special stage _databend_load
inputStream: The input stream of the file data to load
fileSize: The size of the file being loaded
loadMethod: LoadMethod.STREAMING or LoadMethod.STAGE
- STAGE: first upload file to a special path in user stage, then load the file in stage in to table, Limited by the max object size of storage of the stage.
  - the upload method is determined by connection parameter presigned_url_disabled.
- STREAMING load data to while transforming data in one http request. Limited by server memory when load large Parquet/Orc file, whose meta is at the file end.

Returns: Number of rows successfully loaded

example:

import java.io.ByteArrayInputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import com.tidbcloud.jdbc.LakeConnection;

public class LoadStreamExample {
    private static Connection openConnection() throws SQLException {
        String url = System.getenv("LAKE_JDBC_URL") + "&presigned_url_disabled=true";
        String user = System.getenv("LAKE_JDBC_USER");
        String password = System.getenv("LAKE_JDBC_PASSWORD");
        return DriverManager.getConnection(url, user, password);
    }

    public static void main(String[] args) throws SQLException {
        byte[] csv = "1,hello\n2,world\n".getBytes(java.nio.charset.StandardCharsets.UTF_8);
        String tableName = "readme_load_example";
        try (Connection conn = openConnection();
             Statement stmt = conn.createStatement();
             ByteArrayInputStream fileStream = new ByteArrayInputStream(csv)) {
            LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
            stmt.execute("create or replace table " + tableName + " (id int, value string)");

            // use special stage `_databend_load`
            String sql = "insert into " + tableName + " from @_databend_load file_format=(type=csv)";

            lakeConnection.loadStreamToTable(
                    sql,
                    fileStream,
                    csv.length,
                    LakeConnection.LoadMethod.STAGE);
            stmt.executeQuery("select count(*) from " + tableName);
        }
    }
}

Use Arrow Result Format

By default, the driver fetches query results in JSON format. To enable Arrow over HTTP, add query_result_format=arrow to the JDBC URL:

String url = System.getenv("LAKE_JDBC_URL");
if (!url.contains("query_result_format=")) {
    url = url + (url.contains("?") ? "&" : "?") + "query_result_format=arrow";
}
Connection conn = DriverManager.getConnection(
    url,
    System.getenv("LAKE_JDBC_USER"),
    System.getenv("LAKE_JDBC_PASSWORD"));

Arrow mode is intended for query result fetching. Internal control queries still use JSON when needed.

Requirements:

Lake server must support Arrow result pages.
The JVM must allow Arrow to access java.nio internals.

Before starting your application, set:

export JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true'

If you do not want to set JAVA_TOOL_OPTIONS globally, pass the same options directly to java:

java --add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -jar your-app.jar

If query_result_format is not specified, the driver uses JSON.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
config/checkstyle		config/checkstyle
docs		docs
lake-jdbc		lake-jdbc
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.java-version		.java-version
LICENSE		LICENSE
README.md		README.md
lombok.config		lombok.config
pom.xml		pom.xml
testng.xml		testng.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TiDB Cloud Lake JDBC

Highlights

Prerequisites

Installation

Maven

Build from source

Testing

Run integration tests with Arrow

Download jar from maven central

How to use

Important Notes

Connection Parameters

JDBC Java type mapping

Temporal types

Unwrapping to Lake-specific interfaces

interface LakeConnection

method `loadStreamToTable`

Use Arrow Result Format

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TiDB Cloud Lake JDBC

Highlights

Prerequisites

Installation

Maven

Build from source

Testing

Run integration tests with Arrow

Download jar from maven central

How to use

Important Notes

Connection Parameters

JDBC Java type mapping

Temporal types

Unwrapping to Lake-specific interfaces

interface LakeConnection

method loadStreamToTable

Use Arrow Result Format

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

method `loadStreamToTable`

Packages