Skip to content

Commit 8425407

Browse files
authored
Merge pull request #283 from marklogic/release/2.3.1
Merge release/2.3.1 into master
2 parents 0eaf6b1 + b7c585d commit 8425407

File tree

231 files changed

+964
-988
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

231 files changed

+964
-988
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,7 @@ logs
1616
.ipynb_checkpoints
1717
venv
1818
.venv
19-
docker
19+
docker/marklogic
20+
docker/sonarqube/data
21+
docker/sonarqube/logs
2022
export

CONTRIBUTING.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,13 @@ you've introduced on the feature branch you're working on. You can then click on
8383
Note that if you only need results on code smells and vulnerabilities, you can repeatedly run `./gradlew sonar`
8484
without having to re-run the tests.
8585

86+
Our Sonar instance is also configured to scan for dependency vulnerabilities
87+
[via the dependency-check plugin](https://github.com/dependency-check/dependency-check-sonar-plugin). For more
88+
information, see the `dependencyCheck` block in this project's `build.gradle` file. To include dependency check results,
89+
just run the following (it's not included by default when running the `sonar` task):
90+
91+
./gradlew dependencyCheckAnalyze sonar
92+
8693
## Accessing MarkLogic logs in Grafana
8794

8895
This project's `docker-compose-3nodes.yaml` file includes
@@ -116,7 +123,7 @@ This will produce a single jar file for the connector in the `./build/libs` dire
116123

117124
You can then launch PySpark with the connector available via:
118125

119-
pyspark --jars build/libs/marklogic-spark-connector-2.3.0.jar
126+
pyspark --jars build/libs/marklogic-spark-connector-2.3-SNAPSHOT.jar
120127

121128
The below command is an example of loading data from the test application deployed via the instructions at the top of
122129
this page.
@@ -192,7 +199,7 @@ The Spark master GUI is at <http://localhost:8080>. You can use this to view det
192199

193200
Now that you have a Spark cluster running, you just need to tell PySpark to connect to it:
194201

195-
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.jar
202+
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3-SNAPSHOT.jar
196203

197204
You can then run the same commands as shown in the PySpark section above. The Spark master GUI will allow you to
198205
examine details of each of the commands that you run.
@@ -211,12 +218,12 @@ You will need the connector jar available, so run `./gradlew clean shadowJar` if
211218
You can then run a test Python program in this repository via the following (again, change the master address as
212219
needed); note that you run this outside of PySpark, and `spark-submit` is available after having installed PySpark:
213220

214-
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.jar src/test/python/test_program.py
221+
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3-SNAPSHOT.jar src/test/python/test_program.py
215222

216223
You can also test a Java program. To do so, first move the `com.marklogic.spark.TestProgram` class from `src/test/java`
217224
to `src/main/java`. Then run `./gradlew clean shadowJar` to rebuild the connector jar. Then run the following:
218225

219-
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.3.0.jar
226+
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.3-SNAPSHOT.jar
220227

221228
Be sure to move `TestProgram` back to `src/test/java` when you are done.
222229

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright © 2024 MarkLogic Corporation.
1+
Copyright © 2024 MarkLogic Corporation. All Rights Reserved.
22

33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
44

NOTICE.txt

Lines changed: 30 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,12 @@
11
MarkLogic® Connector for Spark
22

3-
Copyright © 2023 MarkLogic Corporation. MarkLogic and MarkLogic logo are trademarks or registered trademarks of MarkLogic Corporation in the United States and other countries. All other trademarks are the property of their respective owners.
4-
5-
This project is licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at
3+
Copyright © 2024 MarkLogic Corporation. All Rights Reserved.
64

7-
http://www.apache.org/licenses/LICENSE-2.0
8-
9-
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
10-
11-
To the extent required by the applicable open-source license, a complete machine-readable copy of the source code corresponding to such code is available upon request. This offer is valid to anyone in receipt of this information and shall expire three years following the date of the final distribution of this product version by MarkLogic Corporation. To obtain such source code, send an email to [email protected]. Please specify the product and version for which you are requesting source code.
12-
13-
The following software may be included in this project (last updated October 3, 2023):
5+
To the extent required by the applicable open-source license, a complete machine-readable copy of the source code
6+
corresponding to such code is available upon request. This offer is valid to anyone in receipt of this information and
7+
shall expire three years following the date of the final distribution of this product version by
8+
Progress Software Corporation. To obtain such source code, send an email to [email protected].
9+
Please specify the product and version for which you are requesting source code.
1410

1511
-------------------------------------------------------------------------
1612
Third Party Components
@@ -43,8 +39,8 @@ Apache License
4339

4440
1. Definitions.
4541

46-
"License" shall mean the terms and conditions for use,
47-
reproduction, and distribution as defined by Sections 1 through 9
42+
"License" shall mean the terms and conditions for use,
43+
reproduction, and distribution as defined by Sections 1 through 9
4844
of this document.
4945

5046
"Licensor" shall mean the copyright owner or entity authorized by
@@ -75,20 +71,20 @@ Apache License
7571

7672
"Derivative Works" shall mean any work, whether in Source or Object
7773
form, that is based on (or derived from) the Work and for which the
78-
editorial revisions, annotations, elaborations, or other
79-
modifications represent, as a whole, an original work of
80-
authorship. For the purposes of this License, Derivative Works
81-
shall not include works that remain separable from, or merely link
82-
(or bind by name) to the interfaces of, the Work and Derivative
74+
editorial revisions, annotations, elaborations, or other
75+
modifications represent, as a whole, an original work of
76+
authorship. For the purposes of this License, Derivative Works
77+
shall not include works that remain separable from, or merely link
78+
(or bind by name) to the interfaces of, the Work and Derivative
8379
Works thereof.
8480

8581
"Contribution" shall mean any work of authorship, including the
8682
original version of the Work and any modifications or additions
8783
to that Work or Derivative Works thereof, that is intentionally
8884
submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
8985

90-
"Contributor" shall mean Licensor and any individual or Legal
91-
Entity on behalf of whom a Contribution has been received by
86+
"Contributor" shall mean Licensor and any individual or Legal
87+
Entity on behalf of whom a Contribution has been received by
9288
Licensor and subsequently incorporated within the Work.
9389

9490
2. Grant of Copyright License. Subject to the terms and conditions of
@@ -102,16 +98,16 @@ submitted to Licensor for inclusion in the Work by the copyright owner or by an
10298
this License, each Contributor hereby grants to You a perpetual,
10399
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
104100
(except as stated in this section) patent license to make, have
105-
made, use, offer to sell, sell, import, and otherwise transfer the
106-
Work, where such license applies only to those patent claims
107-
licensable by such Contributor that are necessarily infringed by
108-
their Contribution(s) alone or by combination of their
109-
Contribution(s) with the Work to which such Contribution(s) was
110-
submitted. If You institute patent litigation against any entity
111-
(including a cross-claim or counterclaim in a lawsuit) alleging
112-
that the Work or a Contribution incorporated within the Work
113-
constitutes direct or contributory patent infringement, then any
114-
patent licenses granted to You under this License for that Work
101+
made, use, offer to sell, sell, import, and otherwise transfer the
102+
Work, where such license applies only to those patent claims
103+
licensable by such Contributor that are necessarily infringed by
104+
their Contribution(s) alone or by combination of their
105+
Contribution(s) with the Work to which such Contribution(s) was
106+
submitted. If You institute patent litigation against any entity
107+
(including a cross-claim or counterclaim in a lawsuit) alleging
108+
that the Work or a Contribution incorporated within the Work
109+
constitutes direct or contributory patent infringement, then any
110+
patent licenses granted to You under this License for that Work
115111
shall terminate as of the date such litigation is filed.
116112

117113
4. Redistribution. You may reproduce and distribute copies of the
@@ -173,11 +169,11 @@ submitted to Licensor for inclusion in the Work by the copyright owner or by an
173169
agreed to in writing, Licensor provides the Work (and each
174170
Contributor provides its Contributions) on an "AS IS" BASIS,
175171
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
176-
implied, including, without limitation, any warranties or
177-
conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS
178-
FOR A PARTICULAR PURPOSE. You are solely responsible for
179-
determining the appropriateness of using or redistributing the Work
180-
and assume any risks associated with Your exercise of permissions
172+
implied, including, without limitation, any warranties or
173+
conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS
174+
FOR A PARTICULAR PURPOSE. You are solely responsible for
175+
determining the appropriateness of using or redistributing the Work
176+
and assume any risks associated with Your exercise of permissions
181177
under this License.
182178

183179
8. Limitation of Liability. In no event and under no legal theory,

build.gradle

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@ plugins {
77
id 'signing'
88
id "jacoco"
99
id "org.sonarqube" version "4.4.1.3373"
10+
id "org.owasp.dependencycheck" version "10.0.3"
1011
}
1112

1213
group 'com.marklogic'
13-
version '2.3.0'
14+
version '2.3.1'
1415

1516
java {
1617
// To support reading RDF files, Apache Jena is used - but that requires Java 11. If we want to do a 2.2.0 release
@@ -21,6 +22,10 @@ java {
2122

2223
repositories {
2324
mavenCentral()
25+
mavenLocal()
26+
maven {
27+
url "https://bed-artifactory.bedford.progress.com:443/artifactory/ml-maven-snapshots/"
28+
}
2429
}
2530

2631
configurations {
@@ -42,7 +47,7 @@ dependencies {
4247
exclude module: "rocksdbjni"
4348
}
4449

45-
shadowDependencies ("com.marklogic:marklogic-client-api:6.6.1") {
50+
shadowDependencies ("com.marklogic:marklogic-client-api:7.0.0") {
4651
// The Java Client uses Jackson 2.15.2; Scala 3.4.x does not yet support that and will throw the following error:
4752
// Scala module 2.14.2 requires Jackson Databind version >= 2.14.0 and < 2.15.0 - Found jackson-databind version 2.15.2
4853
// So the 4 Jackson modules are excluded to allow for Spark's to be used.
@@ -63,20 +68,37 @@ dependencies {
6368

6469
shadowDependencies "org.jdom:jdom2:2.0.6.1"
6570

66-
testImplementation ('com.marklogic:ml-app-deployer:4.7.0') {
71+
testImplementation ('com.marklogic:ml-app-deployer:4.8.0') {
6772
exclude group: "com.fasterxml.jackson.core"
6873
exclude group: "com.fasterxml.jackson.dataformat"
74+
75+
// Use the Java Client declared above.
76+
exclude module: "marklogic-client-api"
6977
}
78+
7079
testImplementation ('com.marklogic:marklogic-junit5:1.4.0') {
7180
exclude group: "com.fasterxml.jackson.core"
7281
exclude group: "com.fasterxml.jackson.dataformat"
82+
83+
// Use the Java Client declared above.
84+
exclude module: "marklogic-client-api"
7385
}
7486

7587
testImplementation "ch.qos.logback:logback-classic:1.3.14"
7688
testImplementation "org.slf4j:jcl-over-slf4j:1.7.36"
7789
testImplementation "org.skyscreamer:jsonassert:1.5.1"
7890
}
7991

92+
// See https://jeremylong.github.io/DependencyCheck/dependency-check-gradle/configuration.html for more information.
93+
dependencyCheck {
94+
// Need a JSON report to integrate with Sonar. And HTML is easier for humans to read.
95+
formats = ["HTML", "JSON"]
96+
// We don't include compileOnly since that includes Spark, and Spark and its dependencies are not actual dependencies
97+
// of our connector.
98+
scanConfigurations = ["shadowDependencies"]
99+
suppressionFile = "config/dependency-check-suppressions.xml"
100+
}
101+
80102
test {
81103
useJUnitPlatform()
82104
finalizedBy jacocoTestReport
@@ -95,6 +117,8 @@ sonar {
95117
properties {
96118
property "sonar.projectKey", "marklogic-spark"
97119
property "sonar.host.url", "http://localhost:9000"
120+
// See https://github.com/dependency-check/dependency-check-sonar-plugin for more information.
121+
property "sonar.dependencyCheck.jsonReportPath", "build/reports/dependency-check-report.json"
98122
}
99123
}
100124

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<suppressions xmlns="https://jeremylong.github.io/DependencyCheck/dependency-suppression.1.3.xsd">
3+
<suppress>
4+
<notes><![CDATA[
5+
file name: jackson-databind-2.14.3.jar
6+
7+
See https://nvd.nist.gov/vuln/detail/CVE-2023-35116 and https://github.com/FasterXML/jackson-databind/issues/3972 .
8+
The Jackson team heartily refutes that this is a vulnerability, and we agree.
9+
]]></notes>
10+
<packageUrl regex="true">^pkg:maven/com\.fasterxml\.jackson\.core/jackson-databind@.*$</packageUrl>
11+
<cve>CVE-2023-35116</cve>
12+
</suppress>
13+
<suppress>
14+
<notes><![CDATA[
15+
file name: commons-compress-1.24.0.jar
16+
This is brought in by Jena 4.10. It's a medium, and we don't want to interfere with Jena dependencies.
17+
]]></notes>
18+
<packageUrl regex="true">^pkg:maven/org\.apache\.commons/commons-compress@.*$</packageUrl>
19+
<cve>CVE-2024-25710</cve>
20+
</suppress>
21+
<suppress>
22+
<notes><![CDATA[
23+
file name: commons-compress-1.24.0.jar
24+
This is brought in by Jena 4.10. It's a medium, and we don't want to interfere with Jena dependencies.
25+
]]></notes>
26+
<packageUrl regex="true">^pkg:maven/org\.apache\.commons/commons-compress@.*$</packageUrl>
27+
<cve>CVE-2024-26308</cve>
28+
</suppress>
29+
</suppressions>

docker-compose.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,10 @@ services:
2828
SONAR_JDBC_USERNAME: sonar
2929
SONAR_JDBC_PASSWORD: sonar
3030
volumes:
31-
- sonarqube_data:/opt/sonarqube/data
32-
- sonarqube_extensions:/opt/sonarqube/extensions
33-
- sonarqube_logs:/opt/sonarqube/logs
31+
- ./docker/sonarqube/data:/opt/sonarqube/data
32+
- ./docker/sonarqube/logs:/opt/sonarqube/logs
33+
# Allows for Sonar plugins to be installed by including plugin jar files in this directory.
34+
- ./docker/sonarqube/extensions:/opt/sonarqube/extensions
3435
ports:
3536
- "9000:9000"
3637

Binary file not shown.

docs/getting-started/jupyter.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@ connector and also to initialize Spark:
3232

3333
```
3434
import os
35-
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars "/path/to/marklogic-spark-connector-2.2.0.jar" pyspark-shell'
35+
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars "/path/to/marklogic-spark-connector-2.3.1.jar" pyspark-shell'
3636
3737
from pyspark.sql import SparkSession
3838
spark = SparkSession.builder.master("local[*]").appName('My Notebook').getOrCreate()
3939
spark.sparkContext.setLogLevel("WARN")
4040
spark
4141
```
4242

43-
The path of `/path/to/marklogic-spark-connector-2.2.0.jar` should be changed to match the location of the connector
43+
The path of `/path/to/marklogic-spark-connector-2.3.1.jar` should be changed to match the location of the connector
4444
jar on your filesystem. You are free to customize the `spark` variable in any manner you see fit as well.
4545

4646
Now that you have an initialized Spark session, you can run any of the examples found in the

docs/getting-started/pyspark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ shell by pressing `ctrl-D`.
2929

3030
Run PySpark from the directory that you downloaded the connector to per the [setup instructions](setup.md):
3131

32-
pyspark --jars marklogic-spark-connector-2.2.0.jar
32+
pyspark --jars marklogic-spark-connector-2.3.1.jar
3333

3434
The `--jars` command line option is PySpark's method for utilizing Spark connectors. Each Spark environment should have
3535
a similar mechanism for including third party connectors; please see the documentation for your particular Spark

docs/getting-started/setup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ have an instance of MarkLogic running, you can skip step 4 below, but ensure tha
3131
extracted directory contains valid connection properties for your instance of MarkLogic.
3232

3333
1. From [this repository's Releases page](https://github.com/marklogic/marklogic-spark-connector/releases), select
34-
the latest release and download the `marklogic-spark-getting-started-2.2.0.zip` file.
34+
the latest release and download the `marklogic-spark-getting-started-2.3.1.zip` file.
3535
2. Extract the contents of the downloaded zip file.
3636
3. Open a terminal window and go to the directory created by extracting the zip file; the directory should have a
37-
name of "marklogic-spark-getting-started-2.2.0".
37+
name of "marklogic-spark-getting-started-2.3.1".
3838
4. Run `docker-compose up -d` to start an instance of MarkLogic
3939
5. Ensure that the `./gradlew` file is executable; depending on your operating system, you may need to run
4040
`chmod 755 gradlew` to make the file executable.

examples/entity-aggregation/docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ name: entity_aggregation
55
services:
66

77
marklogic:
8-
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
8+
image: "marklogicdb/marklogic-db:latest-11"
99
platform: linux/amd64
1010
environment:
1111
- MARKLOGIC_INIT=true

examples/getting-started/docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ name: marklogic_spark_getting_started
55
services:
66

77
marklogic:
8-
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
8+
image: "marklogicdb/marklogic-db:latest-11"
99
platform: linux/amd64
1010
environment:
1111
- MARKLOGIC_INIT=true

examples/getting-started/marklogic-spark-getting-started.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"source": [
1010
"# Make the MarkLogic connector available to the underlying PySpark application.\n",
1111
"import os\n",
12-
"os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars \"marklogic-spark-connector-2.2.0.jar\" pyspark-shell'\n",
12+
"os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars \"marklogic-spark-connector-2.3.1.jar\" pyspark-shell'\n",
1313
"\n",
1414
"# Define the connection details for the getting-started example application.\n",
1515
"client_uri = \"spark-example-user:password@localhost:8003\"\n",

src/main/java/com/marklogic/spark/ConnectionString.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
/*
2+
* Copyright © 2024 MarkLogic Corporation. All Rights Reserved.
3+
*/
14
package com.marklogic.spark;
25

36
import java.io.UnsupportedEncodingException;

src/main/java/com/marklogic/spark/ConnectorException.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
/*
2+
* Copyright © 2024 MarkLogic Corporation. All Rights Reserved.
3+
*/
14
package com.marklogic.spark;
25

36
public class ConnectorException extends RuntimeException {

0 commit comments

Comments
 (0)