Skip to content

Commit 75fa27e

Browse files
authored
Merge pull request #273 from marklogic/release/2.3.0.rc1
Merge 2.3.0.rc1
2 parents 1a907f8 + 7ad73ac commit 75fa27e

File tree

252 files changed

+9925
-1399
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

252 files changed

+9925
-1399
lines changed

.env

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Defines environment variables for docker-compose.
22
# Can be overridden via e.g. `MARKLOGIC_TAG=latest-10.0 docker-compose up -d --build`.
3-
MARKLOGIC_TAG=11.1.0-centos-1.1.0
3+
MARKLOGIC_TAG=11.2.0-centos-1.1.2

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ logs
1717
venv
1818
.venv
1919
docker
20+
export

CONTRIBUTING.md

Lines changed: 39 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -24,26 +24,6 @@ The above will result in a new MarkLogic instance with a single node.
2424
Alternatively, if you would like to test against a 3-node MarkLogic cluster with a load balancer in front of it,
2525
run `docker-compose -f docker-compose-3nodes.yaml up -d --build`.
2626

27-
## Accessing MarkLogic logs in Grafana
28-
29-
This project's `docker-compose-3nodes.yaml` file includes
30-
[Grafana, Loki, and promtail services](https://grafana.com/docs/loki/latest/clients/promtail/) for the primary reason of
31-
collecting MarkLogic log files and allowing them to be viewed and searched via Grafana.
32-
33-
Once you have run `docker-compose`, you can access Grafana at http://localhost:3000 . Follow these instructions to
34-
access MarkLogic logging data:
35-
36-
1. Click on the hamburger in the upper left hand corner and select "Explore", or simply go to
37-
http://localhost:3000/explore .
38-
2. Verify that "Loki" is the default data source - you should see it selected in the upper left hand corner below
39-
the "Home" link.
40-
3. Click on the "Select label" dropdown and choose `job`. Click on the "Select value" label for this filter and
41-
select `marklogic` as the value.
42-
4. Click on the blue "Run query" button in the upper right hand corner.
43-
44-
You should now see logs from all 3 nodes in the MarkLogic cluster.
45-
46-
4727
## Deploying the test application
4828

4929
To deploy the test application, first create `./gradle-local.properties` and add the following to it:
@@ -63,20 +43,6 @@ To run the tests against the test application, run the following Gradle task:
6343

6444
./gradlew test
6545

66-
If you installed MarkLogic using this project's `docker-compose.yaml` file, you can also run the tests from within the
67-
Docker environment by first running the following task:
68-
69-
./gradlew dockerBuildCache
70-
71-
The above task is a mostly one-time step to build a Docker image that contains all of this project's Gradle
72-
dependencies. This will allow the next step to run much more quickly. You'll only need to run this again when the
73-
project's Gradle dependencies change.
74-
75-
You can then run the tests from within the Docker environment via the following task:
76-
77-
./gradlew dockerTest
78-
79-
8046
## Generating code quality reports with SonarQube
8147

8248
In order to use SonarQube, you must have used Docker to run this project's `docker-compose.yml` file and you must
@@ -117,6 +83,25 @@ you've introduced on the feature branch you're working on. You can then click on
11783
Note that if you only need results on code smells and vulnerabilities, you can repeatedly run `./gradlew sonar`
11884
without having to re-run the tests.
11985

86+
## Accessing MarkLogic logs in Grafana
87+
88+
This project's `docker-compose-3nodes.yaml` file includes
89+
[Grafana, Loki, and promtail services](https://grafana.com/docs/loki/latest/clients/promtail/) for the primary reason of
90+
collecting MarkLogic log files and allowing them to be viewed and searched via Grafana.
91+
92+
Once you have run `docker-compose`, you can access Grafana at http://localhost:3000 . Follow these instructions to
93+
access MarkLogic logging data:
94+
95+
1. Click on the hamburger in the upper left hand corner and select "Explore", or simply go to
96+
http://localhost:3000/explore .
97+
2. Verify that "Loki" is the default data source - you should see it selected in the upper left hand corner below
98+
the "Home" link.
99+
3. Click on the "Select label" dropdown and choose `job`. Click on the "Select value" label for this filter and
100+
select `marklogic` as the value.
101+
4. Click on the blue "Run query" button in the upper right hand corner.
102+
103+
You should now see logs from all 3 nodes in the MarkLogic cluster.
104+
120105
# Testing with PySpark
121106

122107
The documentation for this project
@@ -131,7 +116,7 @@ This will produce a single jar file for the connector in the `./build/libs` dire
131116

132117
You can then launch PySpark with the connector available via:
133118

134-
pyspark --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
119+
pyspark --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar
135120

136121
The below command is an example of loading data from the test application deployed via the instructions at the top of
137122
this page.
@@ -171,14 +156,28 @@ df2.head()
171156
json.loads(df2.head()['content'])
172157
```
173158

159+
For a quick test of writing documents, use the following:
160+
161+
```
162+
163+
spark.read.option("header", True).csv("src/test/resources/data.csv")\
164+
.repartition(2)\
165+
.write.format("marklogic")\
166+
.option("spark.marklogic.client.uri", "spark-test-user:spark@localhost:8000")\
167+
.option("spark.marklogic.write.permissions", "spark-user-role,read,spark-user-role,update")\
168+
.option("spark.marklogic.write.logProgress", 50)\
169+
.option("spark.marklogic.write.batchSize", 10)\
170+
.mode("append")\
171+
.save()
172+
```
174173

175174
# Testing against a local Spark cluster
176175

177176
When you run PySpark, it will create its own Spark cluster. If you'd like to try against a separate Spark cluster
178177
that still runs on your local machine, perform the following steps:
179178

180-
1. Use [sdkman to install Spark](https://sdkman.io/sdks#spark). Run `sdk install spark 3.4.1` since we are currently
181-
building against Spark 3.4.1.
179+
1. Use [sdkman to install Spark](https://sdkman.io/sdks#spark). Run `sdk install spark 3.4.3` since we are currently
180+
building against Spark 3.4.3.
182181
2. `cd ~/.sdkman/candidates/spark/current/sbin`, which is where sdkman will install Spark.
183182
3. Run `./start-master.sh` to start a master Spark node.
184183
4. `cd ../logs` and open the master log file that was created to find the address for the master node. It will be in a
@@ -193,7 +192,7 @@ The Spark master GUI is at <http://localhost:8080>. You can use this to view det
193192

194193
Now that you have a Spark cluster running, you just need to tell PySpark to connect to it:
195194

196-
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
195+
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar
197196

198197
You can then run the same commands as shown in the PySpark section above. The Spark master GUI will allow you to
199198
examine details of each of the commands that you run.
@@ -212,12 +211,12 @@ You will need the connector jar available, so run `./gradlew clean shadowJar` if
212211
You can then run a test Python program in this repository via the following (again, change the master address as
213212
needed); note that you run this outside of PySpark, and `spark-submit` is available after having installed PySpark:
214213

215-
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar src/test/python/test_program.py
214+
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar src/test/python/test_program.py
216215

217216
You can also test a Java program. To do so, first move the `com.marklogic.spark.TestProgram` class from `src/test/java`
218217
to `src/main/java`. Then run `./gradlew clean shadowJar` to rebuild the connector jar. Then run the following:
219218

220-
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
219+
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.3.0.rc1.jar
221220

222221
Be sure to move `TestProgram` back to `src/test/java` when you are done.
223222

Jenkinsfile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ pipeline{
4040
buildDiscarder logRotator(artifactDaysToKeepStr: '7', artifactNumToKeepStr: '', daysToKeepStr: '30', numToKeepStr: '')
4141
}
4242
environment{
43-
JAVA8_HOME_DIR="/home/builder/java/openjdk-1.8.0-262"
4443
JAVA11_HOME_DIR="/home/builder/java/jdk-11.0.2"
4544
GRADLE_DIR =".gradle"
4645
DMC_USER = credentials('MLBUILD_USER')

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright © 2023 MarkLogic Corporation.
1+
Copyright © 2024 MarkLogic Corporation.
22

33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
44

build.gradle

Lines changed: 45 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -2,61 +2,74 @@ plugins {
22
id 'java-library'
33
id 'net.saliman.properties' version '1.5.2'
44
id 'com.github.johnrengelman.shadow' version '8.1.1'
5-
id "com.marklogic.ml-gradle" version "4.6.0"
5+
id "com.marklogic.ml-gradle" version "4.7.0"
66
id 'maven-publish'
77
id 'signing'
88
id "jacoco"
99
id "org.sonarqube" version "4.4.1.3373"
1010
}
1111

1212
group 'com.marklogic'
13-
version '2.2.0'
13+
version '2.3.0.rc1'
1414

1515
java {
16-
sourceCompatibility = 1.8
17-
targetCompatibility = 1.8
16+
// To support reading RDF files, Apache Jena is used - but that requires Java 11. If we want to do a 2.2.0 release
17+
// without requiring Java 11, we'll remove the support for reading RDF files along with the Jena dependency.
18+
sourceCompatibility = 11
19+
targetCompatibility = 11
1820
}
1921

2022
repositories {
2123
mavenCentral()
2224
}
2325

26+
configurations {
27+
// Defines all the implementation dependencies, but in such a way that they are not included as dependencies in the
28+
// library's pom.xml file. This is due to the shadow jar being published instead of a jar only containing this
29+
// project's classes. The shadow jar is published due to the need to relocate several packages to avoid conflicts
30+
// with Spark.
31+
shadowDependencies
32+
33+
// This approach allows for all of the dependencies to be available for compilation and for running tests.
34+
compileOnly.extendsFrom(shadowDependencies)
35+
testImplementation.extendsFrom(compileOnly)
36+
}
37+
2438
dependencies {
25-
compileOnly 'org.apache.spark:spark-sql_2.12:' + sparkVersion
26-
implementation ("com.marklogic:marklogic-client-api:6.5.0") {
39+
// This is compileOnly as any environment this is used in will provide the Spark dependencies itself.
40+
compileOnly ('org.apache.spark:spark-sql_2.12:' + sparkVersion) {
41+
// Excluded from our ETL tool for size reasons, so excluded here as well to ensure we don't need it.
42+
exclude module: "rocksdbjni"
43+
}
44+
45+
shadowDependencies ("com.marklogic:marklogic-client-api:6.6.1") {
2746
// The Java Client uses Jackson 2.15.2; Scala 3.4.x does not yet support that and will throw the following error:
2847
// Scala module 2.14.2 requires Jackson Databind version >= 2.14.0 and < 2.15.0 - Found jackson-databind version 2.15.2
2948
// So the 4 Jackson modules are excluded to allow for Spark's to be used.
30-
exclude module: 'jackson-core'
31-
exclude module: 'jackson-databind'
32-
exclude module: 'jackson-annotations'
33-
exclude module: 'jackson-dataformat-csv'
49+
exclude group: "com.fasterxml.jackson.core"
50+
exclude group: "com.fasterxml.jackson.dataformat"
3451
}
3552

53+
// Required for converting JSON to XML. Using 2.14.2 to align with Spark 3.4.1.
54+
shadowDependencies "com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.14.2"
55+
3656
// Need this so that an OkHttpClientConfigurator can be created.
37-
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
57+
shadowDependencies 'com.squareup.okhttp3:okhttp:4.12.0'
3858

39-
// Makes it possible to use lambdas in Java 8 to implement Spark's Function1 and Function2 interfaces
40-
// See https://github.com/scala/scala-java8-compat for more information
41-
implementation("org.scala-lang.modules:scala-java8-compat_2.12:1.0.2") {
42-
// Prefer the Scala libraries used within the user's Spark runtime.
43-
exclude module: "scala-library"
59+
shadowDependencies ("org.apache.jena:jena-arq:4.10.0") {
60+
exclude group: "com.fasterxml.jackson.core"
61+
exclude group: "com.fasterxml.jackson.dataformat"
4462
}
4563

46-
testImplementation 'org.apache.spark:spark-sql_2.12:' + sparkVersion
64+
shadowDependencies "org.jdom:jdom2:2.0.6.1"
4765

48-
// The exclusions in these two modules ensure that we use the Jackson libraries from spark-sql when running the tests.
49-
testImplementation ('com.marklogic:ml-app-deployer:4.6.0') {
50-
exclude module: 'jackson-core'
51-
exclude module: 'jackson-databind'
52-
exclude module: 'jackson-annotations'
53-
exclude module: 'jackson-dataformat-csv'
66+
testImplementation ('com.marklogic:ml-app-deployer:4.7.0') {
67+
exclude group: "com.fasterxml.jackson.core"
68+
exclude group: "com.fasterxml.jackson.dataformat"
5469
}
5570
testImplementation ('com.marklogic:marklogic-junit5:1.4.0') {
56-
exclude module: 'jackson-core'
57-
exclude module: 'jackson-databind'
58-
exclude module: 'jackson-annotations'
59-
exclude module: 'jackson-dataformat-csv'
71+
exclude group: "com.fasterxml.jackson.core"
72+
exclude group: "com.fasterxml.jackson.dataformat"
6073
}
6174

6275
testImplementation "ch.qos.logback:logback-classic:1.3.14"
@@ -105,7 +118,11 @@ if (JavaVersion.current().isCompatibleWith(JavaVersion.VERSION_17)) {
105118
}
106119

107120
shadowJar {
108-
// "all" is the default; no need for that in the connector filename.
121+
configurations = [project.configurations.shadowDependencies]
122+
123+
// "all" is the default; no need for that in the connector filename. This also results in this becoming the library
124+
// artifact that is published as a dependency. That is desirable as it includes the relocated packages listed below,
125+
// which a dependent would otherwise have to manage themselves.
109126
archiveClassifier.set("")
110127

111128
// Spark uses an older version of OkHttp; see
@@ -121,38 +138,6 @@ task perfTest(type: JavaExec) {
121138
args mlHost
122139
}
123140

124-
task dockerBuildCache(type: Exec) {
125-
description = "Creates an image named 'marklogic-spark-cache' containing a cache of the Gradle dependencies."
126-
commandLine 'docker', 'build', '--no-cache', '-t', 'marklogic-spark-cache', '.'
127-
}
128-
129-
task dockerTest(type: Exec) {
130-
description = "Run all of the tests within a Docker environment."
131-
commandLine 'docker', 'run',
132-
// Allows for communicating with the MarkLogic cluster that is setup via docker-compose.yaml.
133-
'--network=marklogic_spark_external_net',
134-
// Map the project directory into the Docker container.
135-
'-v', getProjectDir().getAbsolutePath() + ':/root/project',
136-
// Working directory for the Gradle tasks below.
137-
'-w', '/root/project',
138-
// Remove the container after it finishes running.
139-
'--rm',
140-
// Use the output of dockerBuildCache to avoid downloading all the Gradle dependencies.
141-
'marklogic-spark-cache:latest',
142-
'gradle', '-i', '-PmlHost=bootstrap_3n.local', 'test'
143-
}
144-
145-
task dockerPerfTest(type: Exec) {
146-
description = "Run PerformanceTester a Docker environment."
147-
commandLine 'docker', 'run',
148-
'--network=marklogic_spark_external_net',
149-
'-v', getProjectDir().getAbsolutePath() + ':/root/project',
150-
'-w', '/root/project',
151-
'--rm',
152-
'marklogic-spark-cache:latest',
153-
'gradle', '-i', '-PmlHost=bootstrap_3n.local', 'perfTest'
154-
}
155-
156141
task sourcesJar(type: Jar, dependsOn: classes) {
157142
archiveClassifier = "sources"
158143
from sourceSets.main.allSource

docker-compose-3nodes.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ services:
3030
# by this host. Note that each MarkLogic host has its 8000-8002 ports exposed externally so that the apps on those
3131
# ports can each be accessed if needed.
3232
bootstrap_3n:
33-
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
33+
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
3434
platform: linux/amd64
3535
container_name: bootstrap_3n
3636
hostname: bootstrap_3n.local
@@ -50,7 +50,7 @@ services:
5050
- internal_net
5151

5252
node2:
53-
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
53+
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
5454
platform: linux/amd64
5555
container_name: node2
5656
hostname: node2.local
@@ -74,7 +74,7 @@ services:
7474
- internal_net
7575

7676
node3:
77-
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
77+
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
7878
platform: linux/amd64
7979
container_name: node3
8080
hostname: node3.local

docker-compose.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ services:
1919

2020
# Copied from https://docs.sonarsource.com/sonarqube/latest/setup-and-upgrade/install-the-server/#example-docker-compose-configuration .
2121
sonarqube:
22-
image: sonarqube:community
22+
# Using 10.2 to avoid requiring Java 17 for now.
23+
image: sonarqube:10.2.1-community
2324
depends_on:
2425
- postgres
2526
environment:

docs/Gemfile.lock

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,8 +224,8 @@ GEM
224224
rb-fsevent (0.11.2)
225225
rb-inotify (0.10.1)
226226
ffi (~> 1.0)
227-
rexml (3.2.8)
228-
strscan (>= 3.0.9)
227+
rexml (3.3.2)
228+
strscan
229229
rouge (3.26.0)
230230
ruby2_keywords (0.0.5)
231231
rubyzip (2.3.2)

0 commit comments

Comments
 (0)