Skip to content

Commit f1bbf9c

Browse files
authored
Merge pull request #174 from marklogic/release/2.2.0
Release/2.2.0
2 parents 885d300 + 8f639d7 commit f1bbf9c

File tree

186 files changed

+6981
-1291
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

186 files changed

+6981
-1291
lines changed

.env

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Defines environment variables for docker-compose.
2+
# Can be overridden via e.g. `MARKLOGIC_TAG=latest-10.0 docker-compose up -d --build`.
3+
MARKLOGIC_TAG=11.1.0-centos-1.1.0

CONTRIBUTING.md

Lines changed: 123 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
1-
This is an evolving guide for developers interested in developing and testing this project. This guide assumes that you
2-
have cloned this repository to your local workstation.
1+
This guide covers how to develop and test this project. It assumes that you have cloned this repository to your local
2+
workstation.
33

4-
# Do this first!
4+
Due to the use of the Sonar plugin for Gradle, you must use Java 11 or higher for developing and testing the project.
5+
The `build.gradle` file for this project ensures that the connector is built to run on Java 8 or higher.
56

6-
In order to develop and/or test the connector, or to try out the PySpark instructions below, you first
7-
need to deploy the test application in this project to MarkLogic. You can do so either on your own installation of
8-
MarkLogic, or you can use `docker-compose` to install MarkLogic, optionally as a 3-node cluster with a load balancer
9-
in front of it.
7+
# Setup
8+
9+
To begin, you need to deploy the test application in this project to MarkLogic. You can do so either on your own
10+
installation of MarkLogic, or you can use `docker-compose` to install MarkLogic, optionally as a 3-node cluster with
11+
a load balancer in front of it.
1012

1113
## Installing MarkLogic with docker-compose
1214

@@ -22,9 +24,9 @@ The above will result in a new MarkLogic instance with a single node.
2224
Alternatively, if you would like to test against a 3-node MarkLogic cluster with a load balancer in front of it,
2325
run `docker-compose -f docker-compose-3nodes.yaml up -d --build`.
2426

25-
### Accessing MarkLogic logs in Grafana
27+
## Accessing MarkLogic logs in Grafana
2628

27-
This project's `docker-compose.yaml` file includes
29+
This project's `docker-compose-3nodes.yaml` file includes
2830
[Grafana, Loki, and promtail services](https://grafana.com/docs/loki/latest/clients/promtail/) for the primary reason of
2931
collecting MarkLogic log files and allowing them to be viewed and searched via Grafana.
3032

@@ -75,6 +77,46 @@ You can then run the tests from within the Docker environment via the following
7577
./gradlew dockerTest
7678

7779

80+
## Generating code quality reports with SonarQube
81+
82+
In order to use SonarQube, you must have used Docker to run this project's `docker-compose.yml` file and you must
83+
have the services in that file running.
84+
85+
To configure the SonarQube service, perform the following steps:
86+
87+
1. Go to http://localhost:9000 .
88+
2. Login as admin/admin. SonarQube will ask you to change this password; you can choose whatever you want ("password" works).
89+
3. Click on "Create project manually".
90+
4. Enter "marklogic-spark" for the Project Name; use that as the Project Key too.
91+
5. Enter "develop" as the main branch name.
92+
6. Click on "Next".
93+
7. Click on "Use the global setting" and then "Create project".
94+
8. On the "Analysis Method" page, click on "Locally".
95+
9. In the "Provide a token" panel, click on "Generate". Copy the token.
96+
10. Add `systemProp.sonar.token=your token pasted here` to `gradle-local.properties` in the root of your project, creating
97+
that file if it does not exist yet.
98+
99+
To run SonarQube, run the following Gradle tasks, which will run all the tests with code coverage and then generate
100+
a quality report with SonarQube:
101+
102+
./gradlew test sonar
103+
104+
If you do not add `systemProp.sonar.token` to your `gradle-local.properties` file, you can specify the token via the
105+
following:
106+
107+
./gradlew test sonar -Dsonar.token=paste your token here
108+
109+
When that completes, you will see a line like this near the end of the logging:
110+
111+
ANALYSIS SUCCESSFUL, you can find the results at: http://localhost:9000/dashboard?id=marklogic-spark
112+
113+
Click on that link. If it's the first time you've run the report, you'll see all issues. If you've run the report
114+
before, then SonarQube will show "New Code" by default. That's handy, as you can use that to quickly see any issues
115+
you've introduced on the feature branch you're working on. You can then click on "Overall Code" to see all issues.
116+
117+
Note that if you only need results on code smells and vulnerabilities, you can repeatedly run `./gradlew sonar`
118+
without having to re-run the tests.
119+
78120
# Testing with PySpark
79121

80122
The documentation for this project
@@ -89,19 +131,16 @@ This will produce a single jar file for the connector in the `./build/libs` dire
89131

90132
You can then launch PySpark with the connector available via:
91133

92-
pyspark --jars build/libs/marklogic-spark-connector-2.1.0.jar
134+
pyspark --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
93135

94136
The below command is an example of loading data from the test application deployed via the instructions at the top of
95137
this page.
96138

97139
```
98-
df = spark.read.format("com.marklogic.spark")\
99-
.option("spark.marklogic.client.host", "localhost")\
100-
.option("spark.marklogic.client.port", "8016")\
101-
.option("spark.marklogic.client.username", "admin")\
102-
.option("spark.marklogic.client.password", "admin")\
103-
.option("spark.marklogic.client.authType", "digest")\
140+
df = spark.read.format("marklogic")\
141+
.option("spark.marklogic.client.uri", "spark-test-user:spark@localhost:8016")\
104142
.option("spark.marklogic.read.opticQuery", "op.fromView('Medical', 'Authors')")\
143+
.option("spark.marklogic.read.numPartitions", 8)\
105144
.load()
106145
```
107146

@@ -114,6 +153,74 @@ You now have a Spark dataframe - try some commands out on it:
114153
Check out the [PySpark docs](https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_df.html) for
115154
more commands you can try out.
116155

156+
You can query for documents as well - the following shows a simple example along with a technique for converting the
157+
binary content of each document into a string of JSON.
158+
159+
```
160+
import json
161+
from pyspark.sql import functions as F
162+
163+
df = spark.read.format("marklogic")\
164+
.option("spark.marklogic.client.uri", "spark-test-user:spark@localhost:8016")\
165+
.option("spark.marklogic.read.documents.collections", "author")\
166+
.load()
167+
df.show()
168+
169+
df2 = df.select(F.col("content").cast("string"))
170+
df2.head()
171+
json.loads(df2.head()['content'])
172+
```
173+
174+
175+
# Testing against a local Spark cluster
176+
177+
When you run PySpark, it will create its own Spark cluster. If you'd like to try against a separate Spark cluster
178+
that still runs on your local machine, perform the following steps:
179+
180+
1. Use [sdkman to install Spark](https://sdkman.io/sdks#spark). Run `sdk install spark 3.4.1` since we are currently
181+
building against Spark 3.4.1.
182+
2. `cd ~/.sdkman/candidates/spark/current/sbin`, which is where sdkman will install Spark.
183+
3. Run `./start-master.sh` to start a master Spark node.
184+
4. `cd ../logs` and open the master log file that was created to find the address for the master node. It will be in a
185+
log message similar to `Starting Spark master at spark://NYWHYC3G0W:7077` - copy that address at the end of the message.
186+
5. `cd ../sbin`.
187+
6. Run `./start-worker.sh spark://NYWHYC3G0W:7077`, changing that address as necessary.
188+
189+
You can of course simplify the above steps by adding `SPARK_HOME` to your env and adding `$SPARK_HOME/sbin` to your
190+
path, which thus avoids having to change directories. The log files in `./logs` are useful to tail as well.
191+
192+
The Spark master GUI is at <http://localhost:8080>. You can use this to view details about jobs running in the cluster.
193+
194+
Now that you have a Spark cluster running, you just need to tell PySpark to connect to it:
195+
196+
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
197+
198+
You can then run the same commands as shown in the PySpark section above. The Spark master GUI will allow you to
199+
examine details of each of the commands that you run.
200+
201+
The above approach is ultimately a sanity check to ensure that the connector works properly with a separate cluster
202+
process.
203+
204+
## Testing spark-submit
205+
206+
Once you have the above Spark cluster running, you can test out
207+
[spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) which enables submitting a program
208+
and an optional set of jars to a Spark cluster for execution.
209+
210+
You will need the connector jar available, so run `./gradlew clean shadowJar` if you have not already.
211+
212+
You can then run a test Python program in this repository via the following (again, change the master address as
213+
needed); note that you run this outside of PySpark, and `spark-submit` is available after having installed PySpark:
214+
215+
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar src/test/python/test_program.py
216+
217+
You can also test a Java program. To do so, first move the `com.marklogic.spark.TestProgram` class from `src/test/java`
218+
to `src/main/java`. Then run `./gradlew clean shadowJar` to rebuild the connector jar. Then run the following:
219+
220+
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
221+
222+
Be sure to move `TestProgram` back to `src/test/java` when you are done.
223+
117224
# Testing the documentation locally
118225

119226
See the section with the same name in the

Jenkinsfile

Lines changed: 61 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,32 @@
11
@Library('shared-libraries') _
22

3-
def runtests(String mlVersionType, String mlVersion, String javaVersion){
4-
copyRPM mlVersionType,mlVersion
5-
setUpML '$WORKSPACE/xdmp/src/Mark*.rpm'
3+
def runtests(String javaVersion){
64
sh label:'test', script: '''#!/bin/bash
75
export JAVA_HOME=$'''+javaVersion+'''
86
export GRADLE_USER_HOME=$WORKSPACE/$GRADLE_DIR
97
export PATH=$GRADLE_USER_HOME:$JAVA_HOME/bin:$PATH
108
cd marklogic-spark-connector
119
echo "mlPassword=admin" > gradle-local.properties
10+
echo "Waiting for MarkLogic server to initialize."
11+
sleep 30s
1212
./gradlew -i mlDeploy
13+
echo "Loading data a second time to try to avoid Optic bug with duplicate rows being returned."
14+
./gradlew -i mlLoadData
1315
./gradlew test || true
1416
'''
1517
junit '**/build/**/*.xml'
1618
}
1719

20+
def runSonarScan(String javaVersion){
21+
sh label:'test', script: '''#!/bin/bash
22+
export JAVA_HOME=$'''+javaVersion+'''
23+
export GRADLE_USER_HOME=$WORKSPACE/$GRADLE_DIR
24+
export PATH=$GRADLE_USER_HOME:$JAVA_HOME/bin:$PATH
25+
cd marklogic-spark-connector
26+
./gradlew sonar -Dsonar.projectKey='marklogic_marklogic-spark-connector_AY1bXn6J_50_odbCDKMX' -Dsonar.projectName='ML-DevExp-marklogic-spark-connector' || true
27+
'''
28+
}
29+
1830
pipeline{
1931
agent none
2032
triggers{
@@ -30,16 +42,39 @@ pipeline{
3042
environment{
3143
JAVA8_HOME_DIR="/home/builder/java/openjdk-1.8.0-262"
3244
JAVA11_HOME_DIR="/home/builder/java/jdk-11.0.2"
33-
JAVA17_HOME_DIR="/home/builder/java/jdk-17.0.2"
3445
GRADLE_DIR =".gradle"
3546
DMC_USER = credentials('MLBUILD_USER')
3647
DMC_PASSWORD = credentials('MLBUILD_PASSWORD')
3748
}
3849
stages{
3950
stage('tests'){
51+
environment{
52+
scannerHome = tool 'SONAR_Progress'
53+
}
4054
agent {label 'devExpLinuxPool'}
4155
steps{
42-
runtests('Latest','11','JAVA8_HOME_DIR')
56+
sh label:'mlsetup', script: '''#!/bin/bash
57+
echo "Removing any running MarkLogic server and clean up MarkLogic data directory"
58+
sudo /usr/local/sbin/mladmin remove
59+
sudo /usr/local/sbin/mladmin cleandata
60+
cd marklogic-spark-connector
61+
mkdir -p docker/marklogic/logs
62+
docker-compose down -v || true
63+
docker-compose up -d --build
64+
'''
65+
runtests('JAVA11_HOME_DIR')
66+
withSonarQubeEnv('SONAR_Progress') {
67+
runSonarScan('JAVA11_HOME_DIR')
68+
}
69+
}
70+
post{
71+
always{
72+
sh label:'mlcleanup', script: '''#!/bin/bash
73+
cd marklogic-spark-connector
74+
docker-compose down -v || true
75+
sudo /usr/local/sbin/mladmin delete $WORKSPACE/marklogic-spark-connector/docker/marklogic/logs/
76+
'''
77+
}
4378
}
4479
}
4580
stage('publish'){
@@ -49,7 +84,7 @@ pipeline{
4984
}
5085
steps{
5186
sh label:'publish', script: '''#!/bin/bash
52-
export JAVA_HOME=$JAVA_HOME_DIR
87+
export JAVA_HOME=$JAVA11_HOME_DIR
5388
export GRADLE_USER_HOME=$WORKSPACE/$GRADLE_DIR
5489
export PATH=$GRADLE_USER_HOME:$JAVA_HOME/bin:$PATH
5590
cp ~/.gradle/gradle.properties $GRADLE_USER_HOME;
@@ -59,55 +94,35 @@ pipeline{
5994
}
6095
}
6196
stage('regressions'){
97+
agent {label 'devExpLinuxPool'}
6298
when{
6399
allOf{
64100
branch 'develop'
65101
expression {return params.regressions}
66102
}
67103
}
68-
parallel{
69-
stage('11-nightly-java11'){
70-
agent {label 'devExpLinuxPool'}
71-
steps{
72-
runtests('Latest','11','JAVA11_HOME_DIR')
73-
}
74-
}
75-
stage('11-nightly-java17'){
76-
agent {label 'devExpLinuxPool'}
77-
steps{
78-
runtests('Latest','11','JAVA17_HOME_DIR')
79-
}
80-
}
81-
stage('10.0-9.5-java11'){
82-
agent {label 'devExpLinuxPool'}
83-
steps{
84-
runtests('Release','10.0-9.5','JAVA11_HOME_DIR')
85-
}
86-
}
87-
stage('10.0-9.5-nightly-java17'){
88-
agent {label 'devExpLinuxPool'}
89-
steps{
90-
runtests('Release','10.0-9.5','JAVA17_HOME_DIR')
91-
}
92-
}
93-
stage('11.0.2-java8-spark3.4'){
94-
agent {label 'devExpLinuxPool'}
95-
steps{
96-
copyRPM 'Release','11.0.2'
97-
setUpML '$WORKSPACE/xdmp/src/Mark*.rpm'
98-
sh label:'test', script: '''#!/bin/bash
99-
export JAVA_HOME=$JAVA8_HOME_DIR
100-
export GRADLE_USER_HOME=$WORKSPACE/$GRADLE_DIR
101-
export PATH=$GRADLE_USER_HOME:$JAVA_HOME/bin:$PATH
102-
cd marklogic-spark-connector
103-
echo "mlPassword=admin" > gradle-local.properties
104-
./gradlew -i mlDeploy
105-
./gradlew test -PsparkVersion="3.4.0" || true
104+
steps{
105+
sh label:'mlsetup', script: '''#!/bin/bash
106+
echo "Removing any running MarkLogic server and clean up MarkLogic data directory"
107+
sudo /usr/local/sbin/mladmin remove
108+
sudo /usr/local/sbin/mladmin cleandata
109+
cd marklogic-spark-connector
110+
mkdir -p docker/marklogic/logs
111+
docker-compose down -v || true
112+
MARKLOGIC_TAG=latest-10.0 docker-compose up -d --build
106113
'''
107-
junit '**/build/**/*.xml'
108-
}
114+
runtests('JAVA11_HOME_DIR')
115+
}
116+
post{
117+
always{
118+
sh label:'mlcleanup', script: '''#!/bin/bash
119+
cd marklogic-spark-connector
120+
docker-compose down -v || true
121+
sudo /usr/local/sbin/mladmin delete $WORKSPACE/marklogic-spark-connector/docker/marklogic/logs/
122+
'''
109123
}
110124
}
125+
111126
}
112127
}
113128
}

0 commit comments

Comments
 (0)