Skip to content

Latest commit

 

History

History
368 lines (276 loc) · 11.8 KB

File metadata and controls

368 lines (276 loc) · 11.8 KB

Java SDK for Spice.ai

For full documentation visit Spice.ai OSS Docs.

Installation

Maven

Add the following dependency to your Maven project:

<dependency>
  <groupId>ai.spice</groupId>
  <artifactId>spiceai</artifactId>
  <version>0.6.0</version>
  <scope>compile</scope>
</dependency>

Gradle

Add the following dependency to your Gradle project:

implementation 'ai.spice:spiceai:0.6.0'

Manual installation

Pre-built jars are available from a public maven repository. To build a .jar, execute the command below from the repository root:

mvn package -Dmaven.test.skip=true

Supported Java Versions

This library supports the following Java implementations:

Distribution Versions
OpenJDK (Microsoft Build) 11, 17, 21 (LTS)
OpenJDK (Eclipse Temurin) 21 (LTS), 23, 24
Oracle JDK 17, 21 (LTS), 23, 24, 25 (LTS)

Usage

With locally running Spice.ai OSS

Follow the quickstart guide to install and run Spice locally:

import org.apache.arrow.flight.FlightStream;
import ai.spice.SpiceClient;

public class Example {

    public static void main(String[] args) {
        try (SpiceClient client = SpiceClient.builder()
                .build()) {

            FlightStream stream = client.query("SELECT * FROM taxi_trips LIMIT 10;");

            while (stream.next()) {
                try (VectorSchemaRoot batches = stream.getRoot()) {
                    System.out.println(batches.contentToTSVString());
                }
            }
        } catch (Exception e) {
            System.err.println("An unexpected error occurred: " + e.getMessage());
        }
    }
}

Create a free Spice.ai account to obtain an API_KEY

import org.apache.arrow.flight.FlightStream;
import ai.spice.SpiceClient;

public class Example {
    final static String API_KEY = "api-key";

    public static void main(String[] args) {
        try (SpiceClient client = SpiceClient.builder()
                .withApiKey(API_KEY)
                .withSpiceCloud()
                .build()) {

            FlightStream stream = client.query("SELECT * FROM eth.recent_blocks LIMIT 10;");

            while (stream.next()) {
                try (VectorSchemaRoot batches = stream.getRoot()) {
                    System.out.println(batches.contentToTSVString());
                }
            }
        } catch (Exception e) {
            System.err.println("An unexpected error occurred: " + e.getMessage());
        }
    }
}

Connection retry

The SpiceClient implements a connection retry mechanism with 3 attempts by default. The number of attempts can be configured with withMaxRetries:

SpiceClient client = SpiceClient.builder()
    .withMaxRetries(5) // Setting to 0 will disable retries
    .build();

Retries are performed for connection and system internal errors. It is the SDK user's responsibility to properly handle other errors, for example RESOURCE_EXHAUSTED (HTTP 429).

Parameterized Queries (Recommended)

The SDK supports parameterized queries using ADBC (Arrow Database Connectivity), which is the recommended approach for queries with user input to prevent SQL injection:

import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;
import ai.spice.SpiceClient;
import ai.spice.Param;

public class Example {
    public static void main(String[] args) {
        try (SpiceClient client = SpiceClient.builder().build()) {

            // Query with automatic type inference
            ArrowReader reader = client.queryWithParams(
                "SELECT * FROM taxi_trips WHERE trip_distance > $1 LIMIT 10",
                5.0);  // Double is inferred as Float64

            while (reader.loadNextBatch()) {
                VectorSchemaRoot root = reader.getVectorSchemaRoot();
                System.out.println(root.contentToTSVString());
            }
            reader.close();

        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
        }
    }
}

Multiple Parameters

Use positional placeholders ($1, $2, etc.) for multiple parameters:

ArrowReader reader = client.queryWithParams(
    "SELECT * FROM taxi_trips WHERE trip_distance > $1 AND fare_amount > $2 LIMIT 10",
    5.0, 20.0);

Explicit Type Control

For precise control over Arrow types, use the Param factory methods:

import ai.spice.Param;

// Explicit type specification
ArrowReader reader = client.queryWithParams(
    "SELECT * FROM orders WHERE order_id = $1 AND amount >= $2",
    Param.int64(12345),
    Param.decimal128(new BigDecimal("99.99"), 10, 2));

Available typed parameter constructors:

  • Integers: int8, int16, int32, int64, uint8, uint16, uint32, uint64
  • Floating point: float16, float32, float64
  • Strings: string, largeString
  • Binary: binary, largeBinary, fixedSizeBinary
  • Boolean: bool
  • Date/Time: date32, date64, time32, time64, timestamp, duration
  • Decimals: decimal128, decimal256
  • Null: nullValue

Or use the generic constructors:

  • Param.of(value) - Creates a parameter with automatic type inference
  • Param.of(value, arrowType) - Creates a parameter with explicit Arrow type

Supported parameter types with automatic type inference:

  • Integers: int, byte, short, long
  • Floating point: float, double
  • String: String
  • Boolean: boolean
  • Binary: byte[]
  • Temporal: LocalDate, LocalTime, LocalDateTime, Duration
  • Decimal: BigDecimal
  • Null: null

Memory Configuration

The SpiceClient uses an Arrow RootAllocator for managing off-heap memory. By default, it uses all available memory. You can configure the memory limit using megabytes:

SpiceClient client = SpiceClient.builder()
    .withArrowMemoryLimitMB(1024) // 1GB limit
    .build();

Long-lived Clients and Transport Resilience

The SpiceClient is designed for long-lived reuse. The underlying gRPC channel uses dns:/// resolution, which periodically re-resolves hostnames so clients automatically recover from load-balancer IP rotation (e.g. AWS NLB). HTTP/2 keep-alive is enabled by default (30s interval, 10s timeout) to detect dead connections quickly.

For the rare case where the transport becomes permanently stuck (e.g. TLS handshake to a wrong backend, persistent UNAVAILABLE after retries), use reset() to discard the bad connection and immediately establish a fresh one:

SpiceClient client = SpiceClient.builder()
    .withApiKey(API_KEY)
    .withSpiceCloud()
    .build();

// Long-lived usage with transport recovery.
// isTransportFailure() is application-defined; check for
// io.grpc.StatusRuntimeException with Status.UNAVAILABLE,
// SSLHandshakeException, or similar transport-level errors.
try {
    try (FlightStream stream = client.query(sql)) {
        // process results...
    }
} catch (ExecutionException e) {
    if (isTransportFailure(e.getCause())) {
        client.reset();                     // discard bad transport, reconnect immediately
        try (FlightStream stream = client.query(sql)) {
            // process results with fresh connection...
        }
    } else {
        throw e;
    }
}

DNS cache TTL: The gRPC DnsNameResolver respects the JVM's DNS cache TTL. For more aggressive DNS refresh (recommended for cloud-deployed clients), set the JVM property:

-Dnetworkaddress.cache.ttl=30

Iterating Through Results

For more control over query results, you can iterate through rows and access individual field values:

import org.apache.arrow.flight.FlightStream;
import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.Float8Vector;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.types.pojo.Field;

try (SpiceClient client = SpiceClient.builder().build()) {
    FlightStream stream = client.query("SELECT * FROM taxi_trips LIMIT 10;");

    while (stream.next()) {
        try (VectorSchemaRoot root = stream.getRoot()) {
            int rowCount = root.getRowCount();

            // Print column names and types
            for (Field field : root.getSchema().getFields()) {
                System.out.printf("Column: %s, Type: %s%n", field.getName(), field.getType());
            }

            // Iterate through rows generically
            for (int row = 0; row < rowCount; row++) {
                for (FieldVector vector : root.getFieldVectors()) {
                    String columnName = vector.getName();
                    Object value = vector.isNull(row) ? null : vector.getObject(row);
                    System.out.printf("%s = %s%n", columnName, value);
                }
            }

            // Access specific columns with type safety
            FieldVector fareVector = root.getVector("fare_amount");
            if (fareVector instanceof Float8Vector) {
                Float8Vector fareVec = (Float8Vector) fareVector;
                for (int row = 0; row < rowCount; row++) {
                    if (!fareVec.isNull(row)) {
                        double fare = fareVec.get(row);
                        System.out.printf("Fare: $%.2f%n", fare);
                    }
                }
            }

            // Access string columns
            FieldVector vendorVector = root.getVector("vendor_id");
            if (vendorVector instanceof VarCharVector) {
                VarCharVector strVec = (VarCharVector) vendorVector;
                for (int row = 0; row < rowCount; row++) {
                    if (!strVec.isNull(row)) {
                        String vendorId = new String(strVec.get(row), java.nio.charset.StandardCharsets.UTF_8);
                        System.out.printf("Vendor: %s%n", vendorId);
                    }
                }
            }
        }
    }
}

See ExampleIteratingResults.java for a comprehensive example.

Spice.ai Runtime commands

Accelerated dataset refresh

Use refresh method to perform Accelerated Dataset refresh. See full dataset refresh example.

SpiceClient client = SpiceClient.builder()
    ..
    .build();

client.refresh("taxi_trips")

Logging

The SDK uses SLF4J for logging, allowing you to plug in your preferred logging implementation (Logback, Log4j2, java.util.logging, etc.).

Adding a logging implementation (Maven):

<!-- Using Logback -->
<dependency>
    <groupId>ch.qos.logback</groupId>
    <artifactId>logback-classic</artifactId>
    <version>1.5.18</version>
</dependency>

<!-- Or using SLF4J Simple (console output) -->
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-simple</artifactId>
    <version>2.0.17</version>
</dependency>

Log levels used:

  • DEBUG - Client initialization, query execution, connection lifecycle
  • WARN - Recoverable errors during resource cleanup
  • ERROR - Query failures, connection errors

To enable debug logging with slf4j-simple, set the system property:

-Dorg.slf4j.simpleLogger.defaultLogLevel=debug

🤝 Connect with us

Use issues, hey@spice.ai or Slack to send us feedback, suggestions, or if you need help installing or using the library.