This is a Spark job that collects spans from storage, analyze links between services,
and stores them for later presentation in the UI. Note that it is needed for the production deployment.
all-in-one distribution does not need this job.
This job parses all traces on a given day, based on UTC. By default, it processes the current day, but other days can be explicitly specified.
Spark job can be run as docker container and also as java executable:
Starting with version 0.6.x, Docker images are published with variant-specific tags. Each variant automatically uses the appropriate storage backend, so the STORAGE environment variable is no longer needed.
The images are named ghcr.io/jaegertracing/spark-dependencies/spark-dependencies:{VERSION}-{VARIANT}:
VERSION-cassandra: For Cassandra storage (uses CassandraDependenciesJob directly)VERSION-elasticsearch7: For Elasticsearch 7.12-7.16 (uses ElasticsearchDependenciesJob with ES connector 7.17.29)VERSION-elasticsearch8: For Elasticsearch 7.17+ and 8.x (uses ElasticsearchDependenciesJob with ES connector 8.13.4)VERSION-elasticsearch9: For Elasticsearch 9.x (uses ElasticsearchDependenciesJob with ES connector 9.1.3) - also tagged as:latest
Example for Cassandra:
$ docker run \
--env CASSANDRA_CONTACT_POINTS=host1,host2 \
ghcr.io/jaegertracing/spark-dependencies/spark-dependencies:v0.5.3-cassandraExample for Elasticsearch 8.x:
$ docker run \
--env ES_NODES=http://elasticsearch:9200 \
ghcr.io/jaegertracing/spark-dependencies/spark-dependencies:v0.5.3-elasticsearch8Use --env JAVA_OPTS=-Djavax.net.ssl. to set trust store and other Java properties.
Note: the latest versions are hosted on ghcr.io, not on Docker Hub.
As jar file:
STORAGE=cassandra java -jar jaeger-spark-dependencies.jarBy default, this job parses all traces since midnight UTC. You can parse traces for a different day via an argument in YYYY-mm-dd format, like 2016-07-16 or specify the date via an env property.
# ex to run the job to process yesterday's traces on OS/X
$ STORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -uv-1d +%F`
# or on Linux
$ STORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -u -d '1 day ago' +%F`jaeger-spark-dependencies applies configuration parameters through environment variables.
The following variables are common to all storage layers:
SPARK_MASTER: Spark master to submit the job to; Defaults tolocal[*]DATE: Date in YYYY-mm-dd format. Denotes a day for which dependency links will be created.PEER_SERVICE_TAG: Tag name used to identify peer service in spans. Defaults topeer.service
Cassandra is used when STORAGE=cassandra.
CASSANDRA_KEYSPACE: The keyspace to use. Defaults to "jaeger_v1_dc1".CASSANDRA_CONTACT_POINTS: Comma separated list of hosts / ip addresses part of Cassandra cluster. Defaults to localhostCASSANDRA_LOCAL_DC: The local DC to connect to (other nodes will be ignored)CASSANDRA_USERNAMEandCASSANDRA_PASSWORD: Cassandra authentication. Will throw an exception on startup if authentication failsCASSANDRA_USE_SSL: Requiresjavax.net.ssl.trustStoreandjavax.net.ssl.trustStorePassword, Defaults to false.CASSANDRA_CLIENT_AUTH_ENABLED: If set enables client authentication on SSL connections. Requiresjavax.net.ssl.keyStoreandjavax.net.ssl.keyStorePassword, defaults to false.
Example usage:
$ STORAGE=cassandra CASSANDRA_CONTACT_POINTS=localhost:9042 java -jar jaeger-spark-dependencies.jarElasticsearch is used when STORAGE=elasticsearch.
Important: Use the appropriate Docker image variant for your Elasticsearch version:
- ES 7.12-7.16: Use
:VERSION-elasticsearch7tag - ES 7.17-8.x: Use
:VERSION-elasticsearch8tag - ES 9.x: Use
:VERSION-elasticsearch9tag (or:latest)
ES_NODES: A comma separated list of elasticsearch hosts advertising http. Defaults to 127.0.0.1. Add port section if not listening on port 9200. Only one of these hosts needs to be available to fetch the remaining nodes in the cluster. It is recommended to set this to all the master nodes of the cluster. Use url format for SSL. For example, "https://yourhost:8888"ES_NODES_WAN_ONLY: Set to true to only use the values set in ES_NODES, for example if your elasticsearch cluster is in Docker. If you're using a cloudprovider such as AWS Elasticsearch, set this to true. Defaults to falseES_USERNAMEandES_PASSWORD: Elasticsearch basic authentication. Use when X-Pack security (formerly Shield) is in place. By default no username or password is provided to elasticsearch.ES_CLIENT_NODE_ONLY: Set to true to disable elasticsearch cluster nodes.discovery and enable nodes.client.only. If your elasticsearch cluster's data nodes only listen on loopback ip, set this to true. Defaults to false.ES_INDEX_PREFIX: index prefix of Jaeger indices. By default unset.ES_INDEX_DATE_SEPARATOR: index date separator of Jaeger indices. The default value is-. For example.will find index "jaeger-span-2020.11.25".ES_TIME_RANGE: How far in the past the job should look to for spans, the maximum and default is24h. Any value accepted by date-math can be used here, but the anchor is alwaysnow.ES_USE_ALIASES: Set to true to use index alias names to read from and write to. Usually required when using rollover indices.
Example usage:
$ STORAGE=elasticsearch ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies.jarAt a high-level, this job does the following:
- read lots of spans from a time period
- group them by traceId
- construct a graph using parent-child relationships expressed in span references
- for each edge
(parent span, child span)output(parent service, child service, count) - write the results to the database (e.g.
dependencies_v2table in Cassandra)
To build the job locally and run tests:
./mvnw clean install # if failed add SPARK_LOCAL_IP=127.0.0.1To run the unified jar (includes both Cassandra and Elasticsearch):
STORAGE=cassandra java -jar jaeger-spark-dependencies/target/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar
# or
STORAGE=elasticsearch ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies/target/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jarTo run storage-specific jars directly (without STORAGE variable):
# Cassandra
java -jar jaeger-spark-dependencies-cassandra/target/jaeger-spark-dependencies-cassandra-0.0.1-SNAPSHOT.jar
# Elasticsearch
ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies-elasticsearch/target/jaeger-spark-dependencies-elasticsearch-0.0.1-SNAPSHOT.jarTo build Docker image:
docker build -t jaegertracing/spark-dependencies:latest .In tests it's possible to specify version of Jaeger images by env variable JAEGER_VERSION
or system property jaeger.version. By default tests are using latest images.
The integration tests validate the Spark dependencies job against different storage backends:
- Cassandra 4.x
- Elasticsearch 7
- Elasticsearch 8
- Elasticsearch 9
Before running integration tests, ensure you have the following installed:
- Java 11 (Temurin distribution recommended)
- Docker (for building images and running testcontainers)
- Maven (included via
./mvnwwrapper)
Use the following make targets to run integration tests:
make e2e-cassandra # Run Cassandra integration tests
make e2e-es7 # Run Elasticsearch 7 integration tests
make e2e-es8 # Run Elasticsearch 8 integration tests
make e2e-es9 # Run Elasticsearch 9 integration testsEach test suite performs two steps:
- Builds a Docker image with the appropriate storage variant
- Runs tests using testcontainers against that variant
The following environment variables are used in integration tests:
SPARK_DEPENDENCIES_JOB_TAG: Specifies the Docker image tag to use in tests (e.g.,test-cassandra,test-es7,test-es8,test-es9)ELASTICSEARCH_VERSION: Specifies the Elasticsearch version for testcontainers to useJAEGER_VERSION: (Optional) Specifies the version of Jaeger images to use in tests. Defaults to latest.
You can also set this as a system property:
./mvnw test -Djaeger.version=2.14.0If you encounter Docker permission issues, ensure your user is in the docker group:
sudo usermod -aG docker $USERThen log out and log back in.
If testcontainers fail to start, ensure:
- Docker is running and accessible
- The Ryuk image is pulled:
docker pull testcontainersofficial/ryuk:latest - You have sufficient disk space for Docker images
If you encounter build failures:
- Ensure you have Java 11 installed
- Clean the Maven cache:
./mvnw clean - Try running with the
-Uflag to force update dependencies:./mvnw -U clean install
If tests fail due to port conflicts, ensure no other services are running on the ports used by testcontainers (typically ephemeral ports, but sometimes standard ports like 9042 for Cassandra or 9200 for Elasticsearch).