- Lake-specific interfaces to stream files into tables or stages with
loadStreamToTable,uploadStream, anddownloadStream. - Temporal APIs use session timezone to avoid depending on JVM default zone and support modern
java.time.
The TiDB Cloud Lake JDBC driver requires Java 8 or later. If the minimum required version of Java is not installed on the client machines where the JDBC driver is installed, you must install either Oracle Java or OpenJDK.
Add following code block as a dependency
<dependency>
<groupId>com.tidbcloud</groupId>
<artifactId>lake-jdbc</artifactId>
<version>0.4.6</version>
</dependency>Note: build from source requires Java 11+, Maven 3.6.3+
cd lake-jdbc
mvn clean install -DskipTestsStart the local integration test environment from tests/Makefile:
cd tests
make up
make testThe default make test command runs lake-jdbc tests only.
To run tests with Arrow result pages, set LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow:
cd tests
make test LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow TEST_MVN_ARGS='-Dgroups=IT -DexcludedGroups=FLAKY'When Arrow mode is enabled through make test, the required JVM options are added automatically:
--add-opens=java.base/java.nio=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=trueIf you run Maven directly instead of make test, you must set both the Arrow test environment variable and the JVM options yourself:
JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
LAKE_JDBC_TEST_QUERY_RESULT_FORMAT=arrow \
mvn -pl lake-jdbc test -Dgroups=IT -DexcludedGroups=FLAKYCI note:
Standalone Testruns the regular suite and an extra Arrow IT pass.Cluster Testsruns the regular suite and an extra Arrow IT pass for each cluster matrix entry.
You can download the latest version of the lake-jdbc driver here.
Set the connection environment variables before running the examples:
export LAKE_JDBC_URL='jdbc:lake://host:443/default?warehouse=your-warehouse&ssl=true&sslmode=enable'
export LAKE_JDBC_USER='user'
export LAKE_JDBC_PASSWORD='password'import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class Main {
private static Connection openConnection() throws SQLException {
String url = System.getenv("LAKE_JDBC_URL");
String user = System.getenv("LAKE_JDBC_USER");
String password = System.getenv("LAKE_JDBC_PASSWORD");
return DriverManager.getConnection(url, user, password);
}
public static void main(String[] args) throws SQLException {
try ( Connection conn = openConnection();
Statement statement = conn.createStatement()
) {
statement.execute("SELECT number from numbers(200000) order by number");
try(ResultSet rs = statement.getResultSet()){
// ** We must call `rs.next()` otherwise the query may be canceled **
while (rs.next()) {
System.out.println(rs.getInt(1));
}
}
}
}
}- Close Connection/Statement/ResultSet to release resources faster.
- Because the
select,copy into,merge intoare query type SQL, they will return aResultSetobject, you must callrs.next()before accessing the data. Otherwise, the query may be canceled. If you do not want get the result, you can callwhile(r.next(){})to iterate over the result set. - For other SQL such as
create/drop tablenon-query type SQL, you can callstatement.execute()directly.
For detailed references, please take a look at the following Links:
- Connection Parameters : detailed documentation about how to use connection parameters in a jdbc connection
The Lake type is mapped to Java type as follows:
| Lake Type | Java Type |
|---|---|
| TINYINT | Byte |
| SMALLINT | Short |
| INT | Integer |
| BIGINT | Long |
| UInt8 | Short |
| UInt16 | Integer |
| UInt32 | Long |
| UInt64 | BigInteger |
| Float32 | Float |
| Float64 | Double |
| Decimal | BigDecimal |
| String | String |
| Date | LocalDate |
| Timestamp | ZonedDateTime |
| Timestamp_TZ | OffsetDateTime |
| Interval | Duration |
| Geometry | byte[] |
| Bitmap | byte[] |
| Array | String |
| Tuple | String |
| Map | String |
| VARIANT | String |
we recommend using java.time to avoid ambiguity and set/get values via these APIs:
void setObject(int parameterIndex, Object x)
<T> T getObject(int columnIndex, Class<T> type)
- TIMESTAMP_TZ and TIMESTAMP map to
OffsetDateTime,ZonedDateTime,InstantandLocalDateTime(TIMESTAMP_TZ can returnOffsetDateTimebut notZonedDateTime). - Date maps to
LocalDate - When parameters do not contain a timezone, Lake uses the session timezone (not the JVM zone) when storing/returning dates on lake-jdbc ≥ 0.4.3 AND Lake server ≥ 1.2.844.
- Interval map to
java.time.Duration.
old Timestamp/Date are also supported, note that:
getTimestamp(int, Calendar cal)is equivalent togetTimestamp(int)(the cal is omitted) andgetObject(int, Instant.classes).toTimestamp()setTimestamp(int, Calendar cal)is diff withsetTimestamp(int), the epoch is adjusted according to timezone in calsetDate/getDatestill use the JVM timezone,getDate(1)is equivalent toDate.valueOf(getObject(1, LocalDate.class)),setDate(1, date)is equivalent tosetObject(1, date.toLocalDate()).
The following code shows how to unwrap a JDBC Connection object to expose the methods of the LakeConnection interface.
import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.SQLException;
import com.tidbcloud.jdbc.LakeConnection;
public class UnwrapExample {
private static Connection openConnection() throws SQLException {
String url = System.getenv("LAKE_JDBC_URL");
String user = System.getenv("LAKE_JDBC_USER");
String password = System.getenv("LAKE_JDBC_PASSWORD");
return DriverManager.getConnection(url, user, password);
}
public static void main(String[] args) throws SQLException {
try (Connection conn = openConnection()) {
LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
}
}
}int loadStreamToTable(String sql, InputStream inputStream, long fileSize, LoadMethod loadMethod) throws SQLException;Load data from a stream directly into a table, using either a staging or streaming approach.
Available with lake-jdbc >= 0.4.1 AND Lake server >= 1.2.791.
Parameters:
sql: SQL statement with specific syntax for data loading, use special stage_databend_loadinputStream: The input stream of the file data to loadfileSize: The size of the file being loadedloadMethod: LoadMethod.STREAMING or LoadMethod.STAGESTAGE: first upload file to a special path in user stage, then load the file in stage in to table, Limited by the max object size of storage of the stage.- the upload method is determined by connection parameter
presigned_url_disabled.
- the upload method is determined by connection parameter
STREAMINGload data to while transforming data in one http request. Limited by server memory when load large Parquet/Orc file, whose meta is at the file end.
Returns: Number of rows successfully loaded
example:
import java.io.ByteArrayInputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import com.tidbcloud.jdbc.LakeConnection;
public class LoadStreamExample {
private static Connection openConnection() throws SQLException {
String url = System.getenv("LAKE_JDBC_URL") + "&presigned_url_disabled=true";
String user = System.getenv("LAKE_JDBC_USER");
String password = System.getenv("LAKE_JDBC_PASSWORD");
return DriverManager.getConnection(url, user, password);
}
public static void main(String[] args) throws SQLException {
byte[] csv = "1,hello\n2,world\n".getBytes(java.nio.charset.StandardCharsets.UTF_8);
String tableName = "readme_load_example";
try (Connection conn = openConnection();
Statement stmt = conn.createStatement();
ByteArrayInputStream fileStream = new ByteArrayInputStream(csv)) {
LakeConnection lakeConnection = conn.unwrap(LakeConnection.class);
stmt.execute("create or replace table " + tableName + " (id int, value string)");
// use special stage `_databend_load`
String sql = "insert into " + tableName + " from @_databend_load file_format=(type=csv)";
lakeConnection.loadStreamToTable(
sql,
fileStream,
csv.length,
LakeConnection.LoadMethod.STAGE);
stmt.executeQuery("select count(*) from " + tableName);
}
}
}By default, the driver fetches query results in JSON format. To enable Arrow over HTTP, add
query_result_format=arrow to the JDBC URL:
String url = System.getenv("LAKE_JDBC_URL");
if (!url.contains("query_result_format=")) {
url = url + (url.contains("?") ? "&" : "?") + "query_result_format=arrow";
}
Connection conn = DriverManager.getConnection(
url,
System.getenv("LAKE_JDBC_USER"),
System.getenv("LAKE_JDBC_PASSWORD"));Arrow mode is intended for query result fetching. Internal control queries still use JSON when needed.
Requirements:
- Lake server must support Arrow result pages.
- The JVM must allow Arrow to access
java.niointernals.
Before starting your application, set:
export JAVA_TOOL_OPTIONS='--add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true'If you do not want to set JAVA_TOOL_OPTIONS globally, pass the same options directly to java:
java --add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -jar your-app.jarIf query_result_format is not specified, the driver uses JSON.