SolaceProducts
diff --git a/‎pom.xml‎
Lines changed: 10 additions & 3 deletions b/‎pom.xml‎
Lines changed: 10 additions & 3 deletions
diff --git a/‎src/docs/asciidoc/User-Guide.adoc‎
Lines changed: 68 additions & 7 deletions b/‎src/docs/asciidoc/User-Guide.adoc‎
Lines changed: 68 additions & 7 deletions
diff --git a/‎src/docs/sections/general/configuration/solace-spark-source-config.adoc‎
Lines changed: 2 additions & 2 deletions b/‎src/docs/sections/general/configuration/solace-spark-source-config.adoc‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/docs/sections/general/quick-start/quick-start.adoc‎
Lines changed: 1 addition & 0 deletions b/‎src/docs/sections/general/quick-start/quick-start.adoc‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/main/java/com/solacecoe/connectors/spark/SolaceScan.java‎
Lines changed: 1 addition & 1 deletion b/‎src/main/java/com/solacecoe/connectors/spark/SolaceScan.java‎
Lines changed: 1 addition & 1 deletion
@@ -68,6 +68,13 @@
         </developer>
     </developers>
 
+    <distributionManagement>
+        <site>
+            <id>solace-pubsubplus-spark-connector-site</id>
+            <url>https://solace.com/integration-hub/apache-spark</url>
+        </site>
+    </distributionManagement>
+
     <dependencyManagement>
         <dependencies>
             <dependency>
@@ -92,17 +99,17 @@
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-streaming_2.12</artifactId>
-            <version>3.5.1</version>
+            <version>3.5.2</version>
         </dependency>
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-core_2.12</artifactId>
-            <version>3.5.1</version>
+            <version>3.5.2</version>
         </dependency>
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-sql_2.12</artifactId>
-            <version>3.5.1</version>
+            <version>3.5.2</version>
         </dependency>
         <dependency>
             <groupId>org.apache.logging.log4j</groupId>
 
@@ -26,19 +26,25 @@ endif::[]
 
 == Getting Started
 
-This guide assumes you are familiar with Spark set up and Spark Structured Streaming concepts. In the following sections we will show how to set up Solace Spark Connector to stream data from Solace to Spark.
+This guide assumes you are familiar with Spark set up and Spark Structured Streaming concepts. In the following sections we will show how to set up Solace Spark Connector to stream data from Solace to Spark and publish events from Spark to Solace.
 
 === Prerequisites
 
 * https://solace.com/products/event-broker/[Solace PubSub+ Event Broker]
-* Apache Spark 3.5.1, Scala 2.12
+* Apache Spark 3.5.2 and Scala 2.12
+
+=== Supported Platforms
+
+The connector is built on the Spark Structured Streaming API and has been tested on Azure Databricks(15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) with photon acceleration disabled). Since the Databricks runtime is consistent across all supported cloud platforms(AWS & Google Cloud), it is expected to behave similarly in other Databricks environments. Additionally, the connector has been validated on vanilla Apache Spark, ensuring compatibility with any platform that supports standard Spark deployments.
 
 === Quick Start common steps
 
 include::{docdir}/../sections/general/quick-start/quick-start.adoc[leveloffset=+2]
 
 NOTE: Above sample code used parquet as example data source. You can configure your required data source to write data.
 
+NOTE: In case of databricks deployment, it is recommended to store and retrieve sensitive credentials from Databricks secrets. Please refer to <<Using Databricks Secret Management>> on how to configure secrets and use them in notebook.
+
 === Databricks Considerations
 
 In case if you are using Shared compute cluster, make sure your cluster has https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html[appropriate permissions] to install connector from maven central and access the jars. Please contact your Databricks administrator for required permissions.
@@ -49,15 +55,15 @@ Solace Spark connector relies on Spark Checkpointing mechanism to resume from la
 
 === Checkpoint Handling
 
-Starting from version 3.1.0 connector, solace connection is now executed on worker node instead of driver node. This give us the ability to utilize cluster resource efficiently and also improves processing performance. The connector uses Solace LVQ to communicate checkpoint information from worker nodes to driver node(commit to checkpoint location) as they run on different JVM's.
+Starting from version 3.1.0 connector, solace connection is now executed on worker node instead of driver node. This give us the ability to utilize cluster resource efficiently and also improves processing performance. The connector uses Solace LVQ to store checkpoint along with Spark Checkpoint.
 
 NOTE: In case of recovery, connector uses offset state from LVQ to identify last successfully processed messages. Hence, it is recommended not to delete or modify offset state in LVQ.
 
 In some cases, there might be checkpoint failures as spark may fail to write to checkpoint during instance crash or unavailability or other reasons. Though the connector will handle duplicates in most cases, we recommend to keep your downstream systems idempotent.
 
 === User Authentication
 
-Solace Spark Connector supports Basic and OAuth authentication to Solace. Client Credentials flow is supported when connecting using OAuth.
+Solace Spark Connector supports Basic, Client Certificate and OAuth authentication to Solace. Client Credentials flow is supported when connecting using OAuth.
 
 If OAuth server is available use below options to fetch access token from endpoint. For property description please refer to <<Configuration>> section.
 
@@ -81,18 +87,75 @@ If rotating access token is present in file accessible by connector use below op
 [source,scala]
 ----
     spark.readStream.format("solace").option("host", "")
-    .option("vpn", "default")
+    .option("vpn", "")
     .option("solace.apiProperties.AUTHENTICATION_SCHEME", "AUTHENTICATION_SCHEME_OAUTH2")
     .option("solace.oauth.client.access-token", "<absolute-path-to-token-file>")
     .option("solace.oauth.client.token.refresh.interval", 110)
 ----
 
 NOTE: When access token is read from file, it may lose some of it's expiry time by the time it is accessed by connector. It is recommended to have minimal time difference between writing to file and access by the connector so that a valid new token is updated in solace session before expiry of old token.
 
+Below is an example on how to use client certificate authentication when connecting to Solace.
+
+[source,scala]
+----
+    sparkSession.readStream().format("solace")
+    .option("host", "")
+    .option("vpn", "default")
+    .option("username", "")
+    .option("solace.apiProperties.AUTHENTICATION_SCHEME", "AUTHENTICATION_SCHEME_CLIENT_CERTIFICATE")
+    .option("solace.apiProperties.SSL_TRUST_STORE", "<path-to-jks-file>")
+    .option("solace.apiProperties.SSL_TRUST_STORE_FORMAT", "jks")
+    .option("solace.apiProperties.SSL_TRUST_STORE_PASSWORD", "")
+    .option("solace.apiProperties.SSL_KEY_STORE", "<path-to-jks-file>")
+    .option("solace.apiProperties.SSL_KEY_STORE_FORMAT", "jks")
+    .option("solace.apiProperties.SSL_KEY_STORE_PASSWORD", "")
+----
+
+For more properties please refer to https://docs.solace.com/API-Developer-Online-Ref-Documentation/java/constant-values.html#com.solacesystems.jcsmp.JCSMPProperties[Solace Java API documentation for com.solacesystems.jcsmp.JCSMPProperties]
+
+==== Using Databricks Secret Management
+
+If Solace Spark Connector is deployed in Databricks, it is recommended to use Databricks secrets to store sensitive credentials.
+
+To configure secrets refer to the https://docs.databricks.com/aws/en/security/secrets/[Databricks documentation].
+
+You can reference those secrets in your Spark cluster using the same Spark config options:
+
+Below is an example on how to retrieve username and password from Databricks secrets and connect to Solace.
+[source,scala]
+----
+    spark.readStream.format("solace").option("host", dbutils.secrets.get(scope = "solace-dev-credentials", key = "host"))
+    .option("vpn", "default")
+    .option("username", dbutils.secrets.get(scope = "solace-dev-credentials", key = "username"))
+    .option("password", dbutils.secrets.get(scope = "solace-dev-credentials", key = "password"))
+----
+
+OAuth based authentication to Solace using Databricks secrets. The certificates can be stored in cloud object storage, and you can restrict access to the certificates only to cluster that can access Solace. See https://docs.databricks.com/aws/en/data-governance/[Data governance with Unity Catalog].
+
+[source,scala]
+----
+    spark.readStream.format("solace").option("host", dbutils.secrets.get(scope = "solace-dev-credentials", key = "host"))
+    .option("vpn", "default")
+    .option("solace.apiProperties.AUTHENTICATION_SCHEME", "AUTHENTICATION_SCHEME_OAUTH2")
+    .option("solace.oauth.client.auth-server-url", "")
+    .option("solace.oauth.client.client-id", dbutils.secrets.get(scope = "solace-dev-credentials", key = "client-id"))
+    .option("solace.oauth.client.credentials.client-secret", dbutils.secrets.get(scope = "solace-dev-credentials", key = "client-secret"))
+    .option("solace.oauth.client.auth-server.client-certificate.file", "")
+    .option("solace.oauth.client.auth-server.truststore.file", "")
+    .option("solace.oauth.client.auth-server.truststore.password", dbutils.secrets.get(scope = "solace-dev-credentials", key = "truststore-password"))
+    .option("solace.oauth.client.auth-server.ssl.validate-certificate", false)
+    .option("solace.oauth.client.token.refresh.interval", 110)
+----
+
 === Message Replay
 
 Solace Spark Connector can replay messages using Solace Replay Log. Connector can replay all messages or after specific replication group message id or after specific timestamp. Please refer to https://docs.solace.com/Features/Replay/Msg-Replay-Concepts-Config.htm[Message Replay Configuration] to enable replay log in Solace PubSub+ broker.
 
+=== Parallel Processing
+
+The Solace Spark Connector supports automatic scaling of consumers based on the number of worker nodes or can be configured to use a fixed number of consumers. To control this behavior, use the partition property in the Solace Spark Connector Source configuration. Setting this property to 0 enables automatic scaling, where the number of consumers matches the number of worker nodes.
+
 === Solace Spark Streaming Source Schema Structure
 
 Solace Spark Connector transforms the incoming message to Spark row with below schema definition.
@@ -169,8 +232,6 @@ include::{docdir}/../sections/general/configuration/solace-spark-source-config.a
 
 include::{docdir}/../sections/general/configuration/solace-spark-sink-config.adoc[leveloffset=+2]
 
-NOTE: This connector is tested on Databricks environment with Cluster Version 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)
-
 == License
 
 This project is licensed under the Solace Community License, Version 1.0. - See the `LICENSE` file for details.
 
@@ -152,7 +152,7 @@ solace.apiProperties.client_channel_properties.keepAliveIntervalInMillis=3000
 | int
 | any
 | 1
-| Set number of messages to be processed in batch. The connector can stream data in batches to Spark based on configured size.
+| Set number of messages to be processed in batch. The connector can stream data in batches to Spark based on configured size. For optimal throughput, configure the Solace queue's 'Maximum Delivered Unacknowledged Messages per Flow' property to a value equal to twice the batch size.
 
 | replayStrategy
 | string
@@ -210,7 +210,7 @@ Note: Default value uses replication group message ID property as offset indicat
 | int
 | any
 | 1
-| Sets the number of consumers for configured queue. If more the one worker node is present, consumers are split across worker nodes for efficient processing.
+| Sets the number of consumers for configured queue. If more the one worker node is present, consumers are split across worker nodes for efficient processing. If set to 0 the connector will create consumers equal to number of worker nodes and will scale if more worker nodes are added.
 
 | createFlowsOnSameSession(deprecated)
 | boolean
 
@@ -58,6 +58,7 @@ NOTE: Before installing latest version of connector make sure earlier versions o
 
     query.awaitTermination()
 ----
+TIP: For optimal throughput, configure the Solace queue's 'Maximum Delivered Unacknowledged Messages per Flow' property to a value equal to twice the batch size.
 .. Finally, let's read data from the parquet file from the location configured above
 +
 [source,scala]
 
@@ -41,6 +41,6 @@ public Batch toBatch() {
 
     @Override
     public MicroBatchStream toMicroBatchStream(String checkpointLocation) {
-        return new SolaceMicroBatch(properties);
+        return new SolaceMicroBatch(properties, checkpointLocation);
     }
 }
Original file line number	Diff line number	Diff line change
`@@ -58,6 +58,7 @@ NOTE: Before installing latest version of connector make sure earlier versions o`
`58`	`58`
`59`	`59`	`query.awaitTermination()`
`60`	`60`	`----`
	`61`	`+TIP: For optimal throughput, configure the Solace queue's 'Maximum Delivered Unacknowledged Messages per Flow' property to a value equal to twice the batch size.`
`61`	`62`	`.. Finally, let's read data from the parquet file from the location configured above`
`62`	`63`	`+`
`63`	`64`	`[source,scala]`
Original file line number	Diff line number	Diff line change
`@@ -41,6 +41,6 @@ public Batch toBatch() {`
`41`	`41`
`42`	`42`	`@Override`
`43`	`43`	`public MicroBatchStream toMicroBatchStream(String checkpointLocation) {`
`44`		`- return new SolaceMicroBatch(properties);`
	`44`	`+ return new SolaceMicroBatch(properties, checkpointLocation);`
`45`	`45`	`}`
`46`	`46`	`}`