You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This library contains the source code for Azure Data Explorer Data Source and Data Sink Connector for Apache Spark.
10
10
11
11
Azure Data Explorer (A.K.A. [Kusto](https://azure.microsoft.com/services/data-explorer/)) is a lightning-fast indexing and querying service.
12
12
13
13
[Spark](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
14
14
15
-
Making Azure Data Explorer and Spark work together enables building fast and scalable applications, targeting a variety of Machine Learning, Extract-Transform-Load, Log Analytics and other data-driven scenarios.
15
+
Making Azure Data Explorer and Spark work together enables building fast and scalable applications, targeting a variety of Machine Learning, Extract-Transform-Load, Log Analytics and other data-driven scenarios.
16
+
17
+
This connector works with the following spark environments:
[Azure Synapse Data Explorer](https://docs.microsoft.com/azure/synapse-analytics/data-explorer/data-explorer-overview) and
20
+
[Real time analytics in Fabric](https://learn.microsoft.com/fabric/real-time-analytics/overview)
16
21
17
22
## Changelog
18
23
24
+
**Breaking changes in versions 5.2.x** - From these versions, the published packages are shaded and packaged as a self contained jar. This is to avoid issues with common OSS libraries, spark runtimes and/or application dependencies.
25
+
19
26
For major changes from previous releases, please refer to [Releases](https://github.com/Azure/azure-kusto-spark/releases).
20
27
For known or new issues, please refer to the [issues](https://github.com/Azure/azure-kusto-spark/issues) section.
21
-
> Note: Use the 4.x series only if you are using JDK 11 and 3.x in JDK 8
28
+
> Note: Use the 4.x series only if you are using JDK 11. Versions 3.x and 5.x will work with JDK8 and all versions up
29
+
From versions 5.2.0 and up, the connector is packaged as an uber jar to avoid conflicts with other jars that are added as part of the spark job definitions.
22
30
23
31
## Usage
24
32
@@ -33,14 +41,14 @@ link your application with the artifact below to use the Azure Data Explorer Con
@@ -95,23 +103,50 @@ To use the connector, you need:
95
103
> Note: when working with Spark version 2.3 or lower, build the jar locally from branch "2.4" and
96
104
simply change the spark version in the pom file.
97
105
106
+
## Local Run - Build Setup
107
+
108
+
The newer options in the connector have tests pertaining to Blob storage, providing support for user impersonation based data export and also providing a custom blob storage for ingestion.
109
+
110
+
These are set up on the CI already. To configure these on local machines, set up is required on the machine. The following are commands to be executed on AzCli, the setup can be done through the Azure portal as well.
111
+
112
+
```
113
+
az login
114
+
az ad signed-in-user show --query "id" --output json
115
+
```
116
+
This will usually output a GUID
117
+
118
+
```
119
+
"10ac405f-8d3f-4f95-a012-201801b257d2"
120
+
```
121
+
This ID can then be used to grant access to storage as follows
122
+
123
+
```shell
124
+
az role assignment create --assignee 10ac405f-8d3f-4f95-a012-201801b257d2 --role "Storage Blob Delegator" --scope /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.Storage/storageAccounts/<storageacc>
125
+
126
+
az role assignment create --assignee 10ac405f-8d3f-4f95-a012-201801b257d2 --role "Storage Blob Data Contributor" --scope /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.Storage/storageAccounts/<storageacc>/containers/<container-name>
127
+
```
128
+
129
+
These commands will set up test storage accounts required for tests.
130
+
131
+
Once this is set up, you can use the following commands to build and run the tests
* Version 5.2.0 and up of the library publish uber jars to maven. This is because of conflicts between custom jars that are added as part of the job and the exclude/include process that has to be followed to avoid conflicts.
115
150
116
151
## Dependencies
117
152
Spark Azure Data Explorer connector depends on [Azure Data Explorer Data Client Library](https://mvnrepository.com/artifact/com.microsoft.azure.kusto/kusto-data)
0 commit comments