You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/docs/asciidoc/User-Guide.adoc
+68-7Lines changed: 68 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,19 +26,25 @@ endif::[]
26
26
27
27
== Getting Started
28
28
29
-
This guide assumes you are familiar with Spark set up and Spark Structured Streaming concepts. In the following sections we will show how to set up Solace Spark Connector to stream data from Solace to Spark.
29
+
This guide assumes you are familiar with Spark set up and Spark Structured Streaming concepts. In the following sections we will show how to set up Solace Spark Connector to stream data from Solace to Spark and publish events from Spark to Solace.
The connector is built on the Spark Structured Streaming API and has been tested on Azure Databricks(15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) with photon acceleration disabled). Since the Databricks runtime is consistent across all supported cloud platforms(AWS & Google Cloud), it is expected to behave similarly in other Databricks environments. Additionally, the connector has been validated on vanilla Apache Spark, ensuring compatibility with any platform that supports standard Spark deployments.
NOTE: Above sample code used parquet as example data source. You can configure your required data source to write data.
41
45
46
+
NOTE: In case of databricks deployment, it is recommended to store and retrieve sensitive credentials from Databricks secrets. Please refer to <<Using Databricks Secret Management>> on how to configure secrets and use them in notebook.
47
+
42
48
=== Databricks Considerations
43
49
44
50
In case if you are using Shared compute cluster, make sure your cluster has https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html[appropriate permissions] to install connector from maven central and access the jars. Please contact your Databricks administrator for required permissions.
@@ -49,15 +55,15 @@ Solace Spark connector relies on Spark Checkpointing mechanism to resume from la
49
55
50
56
=== Checkpoint Handling
51
57
52
-
Starting from version 3.1.0 connector, solace connection is now executed on worker node instead of driver node. This give us the ability to utilize cluster resource efficiently and also improves processing performance. The connector uses Solace LVQ to communicate checkpoint information from worker nodes to driver node(commit to checkpoint location) as they run on different JVM's.
58
+
Starting from version 3.1.0 connector, solace connection is now executed on worker node instead of driver node. This give us the ability to utilize cluster resource efficiently and also improves processing performance. The connector uses Solace LVQ to store checkpoint along with Spark Checkpoint.
53
59
54
60
NOTE: In case of recovery, connector uses offset state from LVQ to identify last successfully processed messages. Hence, it is recommended not to delete or modify offset state in LVQ.
55
61
56
62
In some cases, there might be checkpoint failures as spark may fail to write to checkpoint during instance crash or unavailability or other reasons. Though the connector will handle duplicates in most cases, we recommend to keep your downstream systems idempotent.
57
63
58
64
=== User Authentication
59
65
60
-
Solace Spark Connector supports Basic and OAuth authentication to Solace. Client Credentials flow is supported when connecting using OAuth.
66
+
Solace Spark Connector supports Basic, Client Certificate and OAuth authentication to Solace. Client Credentials flow is supported when connecting using OAuth.
61
67
62
68
If OAuth server is available use below options to fetch access token from endpoint. For property description please refer to <<Configuration>> section.
63
69
@@ -81,18 +87,75 @@ If rotating access token is present in file accessible by connector use below op
NOTE: When access token is read from file, it may lose some of it's expiry time by the time it is accessed by connector. It is recommended to have minimal time difference between writing to file and access by the connector so that a valid new token is updated in solace session before expiry of old token.
91
97
98
+
Below is an example on how to use client certificate authentication when connecting to Solace.
For more properties please refer to https://docs.solace.com/API-Developer-Online-Ref-Documentation/java/constant-values.html#com.solacesystems.jcsmp.JCSMPProperties[Solace Java API documentation for com.solacesystems.jcsmp.JCSMPProperties]
116
+
117
+
==== Using Databricks Secret Management
118
+
119
+
If Solace Spark Connector is deployed in Databricks, it is recommended to use Databricks secrets to store sensitive credentials.
120
+
121
+
To configure secrets refer to the https://docs.databricks.com/aws/en/security/secrets/[Databricks documentation].
122
+
123
+
You can reference those secrets in your Spark cluster using the same Spark config options:
124
+
125
+
Below is an example on how to retrieve username and password from Databricks secrets and connect to Solace.
OAuth based authentication to Solace using Databricks secrets. The certificates can be stored in cloud object storage, and you can restrict access to the certificates only to cluster that can access Solace. See https://docs.databricks.com/aws/en/data-governance/[Data governance with Unity Catalog].
Solace Spark Connector can replay messages using Solace Replay Log. Connector can replay all messages or after specific replication group message id or after specific timestamp. Please refer to https://docs.solace.com/Features/Replay/Msg-Replay-Concepts-Config.htm[Message Replay Configuration] to enable replay log in Solace PubSub+ broker.
95
154
155
+
=== Parallel Processing
156
+
157
+
The Solace Spark Connector supports automatic scaling of consumers based on the number of worker nodes or can be configured to use a fixed number of consumers. To control this behavior, use the partition property in the Solace Spark Connector Source configuration. Setting this property to 0 enables automatic scaling, where the number of consumers matches the number of worker nodes.
| Set number of messages to be processed in batch. The connector can stream data in batches to Spark based on configured size.
155
+
| Set number of messages to be processed in batch. The connector can stream data in batches to Spark based on configured size. For optimal throughput, configure the Solace queue's 'Maximum Delivered Unacknowledged Messages per Flow' property to a value equal to twice the batch size.
156
156
157
157
| replayStrategy
158
158
| string
@@ -210,7 +210,7 @@ Note: Default value uses replication group message ID property as offset indicat
210
210
| int
211
211
| any
212
212
| 1
213
-
| Sets the number of consumers for configured queue. If more the one worker node is present, consumers are split across worker nodes for efficient processing.
213
+
| Sets the number of consumers for configured queue. If more the one worker node is present, consumers are split across worker nodes for efficient processing. If set to 0 the connector will create consumers equal to number of worker nodes and will scale if more worker nodes are added.
Copy file name to clipboardExpand all lines: src/docs/sections/general/quick-start/quick-start.adoc
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,6 +58,7 @@ NOTE: Before installing latest version of connector make sure earlier versions o
58
58
59
59
query.awaitTermination()
60
60
----
61
+
TIP: For optimal throughput, configure the Solace queue's 'Maximum Delivered Unacknowledged Messages per Flow' property to a value equal to twice the batch size.
61
62
.. Finally, let's read data from the parquet file from the location configured above
0 commit comments