Skip to content

Commit 455e338

Browse files
authored
docs: FE Feedback fixes (#210)
Major update for the component overview page. Minor update for the monitiring guide. Added tfvars example. The class not found error has been resolved with latest release - no docs update needed.
1 parent 1c03ef9 commit 455e338

4 files changed

Lines changed: 178 additions & 66 deletions

File tree

docs/.custom_wordlist.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@ xindy
9696
xml
9797
yaml
9898
YouTube
99+
walkthrough
100+
Alertmanager
99101
Ory
100102
middleware
101103
Entra

docs/explanation/component-overview.md

Lines changed: 142 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -7,68 +7,151 @@ myst:
77
(explanation-component-overview)=
88
# Components overview
99

10-
The Charmed Apache Spark solution bundles the following components:
11-
12-
* [spark8t](https://github.com/canonical/spark-k8s-toolkit-py), which is a Python package to enhance
13-
Apache Spark capabilities allowing to manage Spark jobs and service accounts, with hierarchical
14-
level of configuration
15-
* [Charmed Apache Spark
16-
Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) OCI-compliant
17-
Image, that bundles Apache Spark binaries together with Canonical tooling to be used to start your
18-
Apache Spark workload on Kubernetes, to use Charmed Apache Spark CLI tooling or derive your own
19-
images from secured and supported bases;
20-
* [Apache Spark Client Snap](https://snapcraft.io/spark-client), to simplify Apache Spark
21-
installation on edge nodes or local machines, by leveraging on confined
22-
[snaps](https://snapcraft.io/) and exposing simple Snap commands to run and manage Spark jobs
23-
* [Charmed Bundle](https://charmhub.io/spark-k8s-bundle) to deploy, manage and operate Charmed
24-
Apache Spark using [Juju](https://juju.is/). This includes:
25-
* [Spark History Server](https://charmhub.io/spark-history-server-k8s) to expose a web UI for
26-
analysing the logs of previous Spark jobs
27-
* [Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) to provide a JDBC/ODBC endpoint for
28-
running Hive powered by Apache Spark engines
29-
* [Integration Hub for Apache Spark](https://charmhub.io/spark-integration-hub-k8s) to enable easy
30-
configuration of Apache Spark service accounts, providing a native Juju integration with [S3
31-
Integrator](https://charmhub.io/s3-integrator) and [Azure Storage
32-
Integrator](https://charmhub.io/azure-storage-integrator) for enabling object-storage
33-
persistence and with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite) for
34-
enabling resource usage monitoring and alerting.
35-
36-
The following image shows how the different artifacts interacts with each other:
10+
Charmed Apache Spark is composed of foundational software artifacts and a set of Juju operators (charms) that together provide a fully managed Apache Spark platform on Kubernetes. All charms are available individually on [Charmhub](https://charmhub.io/) and can also be deployed together via the [Charmed Apache Spark bundle](https://charmhub.io/spark-k8s-bundle) or [Terraform modules](https://github.com/canonical/spark-k8s-bundle/tree/main/releases/3.4/terraform).
11+
12+
## Software artifacts
13+
14+
Three foundational components that are used independently of Juju: [spark8t](explanation-component-overview-spark8t), the [Charmed Apache Spark Rock](explanation-component-overview-rock), and the [spark-client snap](explanation-component-overview-snap).
15+
16+
(explanation-component-overview-spark8t)=
17+
### spark8t
18+
19+
[spark8t](https://github.com/canonical/spark-k8s-toolkit-py) is a Python library that extends Apache Spark with tooling to manage Spark jobs and service accounts with hierarchical configuration. It is the foundation shared by both the `spark-client` snap, the OCI images and the Juju charms.
20+
21+
(explanation-component-overview-rock)=
22+
### Charmed Apache Spark Rock
23+
24+
The [Charmed Apache Spark Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) is an OCI-compliant container image that bundles Apache Spark binaries together with Canonical tooling. It is used as the base image for Spark driver and executor pods on Kubernetes, and as the foundation for the `spark-client` snap.
25+
26+
(explanation-component-overview-snap)=
27+
### spark-client snap
28+
29+
The [spark-client snap](https://snapcraft.io/spark-client) provides CLI tools for working with Charmed Apache Spark from a workstation or edge node. It communicates with the Kubernetes API to submit jobs and manage service accounts — it does not connect to any Juju charm directly.
30+
31+
| Command | Description |
32+
|---|---|
33+
| `spark-client.spark-submit` | Submit Spark applications to a Kubernetes cluster |
34+
| `spark-client.pyspark` | Start an interactive PySpark shell |
35+
| `spark-client.service-account-registry` | Create, configure, and manage Spark service accounts |
36+
| `spark-client.beeline` | JDBC client for connecting to Apache Kyuubi endpoints |
37+
| `spark-client.import-certificate` | Import TLS certificates for encrypted Kyuubi connections |
38+
39+
## Juju operators (charms)
40+
41+
Each subsection below groups charms by function. All charms can be deployed individually or together via the [bundle](https://charmhub.io/spark-k8s-bundle).
42+
43+
### Core components
44+
45+
The following charms form the foundation of any Charmed Apache Spark deployment, connecting Spark service accounts to external services and providing a UI for completed job logs:
46+
47+
| Charm | Description |
48+
|---|---|
49+
| [`spark-integration-hub-k8s`](https://charmhub.io/spark-integration-hub-k8s) | Central hub that manages Spark service account configurations and writes them into Kubernetes Secrets. It allows high-level configuration of Spark properties and seamless integration with external services, such as object storage backends and COS deployments. |
50+
| [`s3-integrator`](https://charmhub.io/s3-integrator) | Supplies S3-compatible object storage credentials (endpoint, bucket, access key) to the Integration Hub and History Server. Supports MinIO, AWS S3, and any S3-compatible backend. |
51+
| [`azure-storage-integrator`](https://charmhub.io/azure-storage-integrator) | Alternative to `s3-integrator` for deployments using Azure Blob Storage. |
52+
| [`spark-history-server-k8s`](https://charmhub.io/spark-history-server-k8s) | Exposes a web UI for browsing and analyzing event logs of completed Spark jobs stored in object storage. Receives credentials from `s3-integrator` or `azure-storage-integrator`. |
53+
| [`traefik-k8s`](https://charmhub.io/traefik-k8s) | Kubernetes ingress proxy. Exposes the History Server web UI at a stable URL outside the cluster. |
54+
55+
### Apache Kyuubi (SQL / JDBC)
56+
57+
[Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) provides a JDBC/ODBC endpoint for running SQL queries against data in object storage, powered by Apache Spark engines.
58+
59+
| Charm | Description |
60+
|---|---|
61+
| [`kyuubi-k8s`](https://charmhub.io/kyuubi-k8s) | Provides a JDBC/ODBC endpoint for SQL queries. Integrates with the Integration Hub to obtain Spark service account configuration. Supports horizontal scaling and external metastore. |
62+
| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `auth-db`) | Required authentication database for Kyuubi. Kyuubi will remain blocked without this integration. |
63+
| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `metastore`) | External Hive metastore providing persistent metadata storage. Without it, metadata is stored on pod-local storage and lost on pod restarts. |
64+
| [`zookeeper-k8s`](https://charmhub.io/zookeeper-k8s) | Required for multi-node Kyuubi deployments. Coordinates distributed Kyuubi instances. |
65+
| [`self-signed-certificates`](https://charmhub.io/self-signed-certificates) | Provides TLS certificates to Kyuubi for encrypted JDBC connections. For production environments, use a CA-backed certificates operator instead. |
66+
| [`data-integrator`](https://charmhub.io/data-integrator) | Retrieves JDBC credentials (endpoint, username, password, TLS certificate) from Kyuubi via the `get-credentials` action, making them available to external clients. |
67+
68+
### Observability (COS integration)
69+
70+
Charmed Apache Spark integrates natively with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite), which is deployed in a separate Juju model and includes Grafana, Prometheus, Loki, and Alertmanager.
71+
72+
The [Tutorial](tutorial-introduction) demonstrates a simplified COS setup using only `prometheus-pushgateway-k8s` and `cos-configuration-k8s`, integrated directly with the COS model charms. The full bundle overlay additionally deploys `grafana-agent-k8s` and `prometheus-scrape-config-k8s` as a cross-model observability bridge:
73+
74+
| Charm | Tutorial | Bundle | Description |
75+
|---|---|---|---|
76+
| [`prometheus-pushgateway-k8s`](https://charmhub.io/prometheus-pushgateway-k8s) | Yes | Yes | Accepts metrics pushed by ephemeral Spark jobs (which are too short-lived for pull-based scraping) and exposes them to Prometheus. In the full bundle, integrates with the Integration Hub to automatically configure service accounts with the pushgateway address. |
77+
| [`cos-configuration-k8s`](https://charmhub.io/cos-configuration-k8s) | Yes | Yes | Syncs Grafana dashboard definitions from a git repository into Grafana. Pre-configured in the bundle to use the dashboards from this repository. |
78+
| [`grafana-agent-k8s`](https://charmhub.io/grafana-agent-k8s) || Yes | Cross-model bridge that ships metrics, log streams, and dashboard definitions from the Spark model to the COS model via remote-write and Loki push API. |
79+
| [`prometheus-scrape-config-k8s`](https://charmhub.io/prometheus-scrape-config-k8s) || Yes | Configures the Prometheus scrape interval for the Pushgateway metrics endpoint. |
80+
81+
## Architecture
82+
83+
The following diagram shows how the components relate in a full deployment:
3784

3885
```{mermaid}
39-
flowchart TD
40-
spark8t["`**spark8t**
41-
(*python package*)
42-
exposes functionalities to create, configure and manage Apache Spark users via a Python SDK`"]
43-
44-
spark-rock["`**Charmed Apache Spark Rock**
45-
(*OCI Image*)
46-
provides a reliable Apache Spark image to run Apache Spark applications and Apache Spark CLI tooling`"]
47-
48-
spark-client["`**Spark Client Snap**
49-
(*snap*)
50-
simplify client integration with an Apache Spark Kubernetes cluster via a snap package to be installed in edge nodes or locally`"]
51-
52-
spark-k8s-bundle["`**Charmed Apache Spark**
53-
(*Charmed Operator*)
54-
manages the entire lifecycle of Spark jobs`"]
55-
56-
spark8t --> spark-rock
57-
spark8t --> spark-client
58-
spark-rock --> spark-client
59-
spark-rock --> spark-k8s-bundle
86+
flowchart TB
87+
client["<b>spark-client snap</b><br>spark-submit · pyspark<br>beeline · service-account-registry"]
88+
backend[("<b>Object Storage</b><br>MinIO · AWS S3 · Azure Blob")]
89+
90+
subgraph spark-model["Spark Juju Model"]
91+
direction TB
92+
93+
hub["<b>spark-integration-hub-k8s</b>"]
94+
95+
subgraph integrators["Storage Integrators"]
96+
direction LR
97+
s3int["s3-integrator"]
98+
azint["azure-storage-integrator"]
99+
end
100+
101+
traefik["traefik-k8s"]
102+
hs["spark-history-server-k8s"]
103+
104+
subgraph kyu-grp["Apache Kyuubi"]
105+
direction TB
106+
kyuubi["kyuubi-k8s"]
107+
authdb["postgresql-k8s<br>(auth-db)"]
108+
metadb["postgresql-k8s<br>(metastore)"]
109+
zk["zookeeper-k8s"]
110+
tls["self-signed-certificates"]
111+
di["data-integrator"]
112+
end
113+
114+
subgraph cos-bridge["COS Bridge"]
115+
direction LR
116+
pgw["prometheus-pushgateway-k8s"]
117+
scrape["prometheus-scrape-config-k8s"]
118+
agent["grafana-agent-k8s"]
119+
cosconf["cos-configuration-k8s"]
120+
end
121+
end
122+
123+
subgraph cos-model["COS Juju Model (cos-lite)"]
124+
direction LR
125+
prom["Prometheus"]
126+
graf["Grafana"]
127+
loki_n["Loki"]
128+
end
129+
130+
client -->|"K8s API<br>(spark-submit / pyspark)"| hub
131+
client -->|"JDBC"| kyuubi
132+
backend <-->|credentials| s3int
133+
backend <-->|credentials| azint
134+
s3int -->|s3-credentials| hub
135+
azint -->|azure-storage-credentials| hub
136+
s3int -->|s3-credentials| hs
137+
azint -->|azure-storage-credentials| hs
138+
hub -->|spark-service-account| kyuubi
139+
hub -->|cos| pgw
140+
traefik -->|ingress| hs
141+
kyuubi -->|auth-db| authdb
142+
kyuubi -->|metastore-db| metadb
143+
kyuubi -->|zookeeper| zk
144+
kyuubi -->|certificates| tls
145+
di -->|jdbc| kyuubi
146+
hs -->|metrics · logs · dashboards| agent
147+
pgw --- scrape -->|metrics| agent
148+
cosconf -->|dashboards| agent
149+
agent -->|remote-write| prom
150+
agent -->|push API| loki_n
151+
agent -->|dashboards| graf
60152
```
61153

62-
The Charmed Apache Spark solution can be used to deploy and manage Apache Spark workloads using the
63-
provided distribution on any conformant Kubernetes (recommended versions `1.32` and above), like:
154+
The `spark-client` tools communicate with Kubernetes directly — the Integration Hub writes configuration into K8s Secrets associated with service accounts, and `spark-client` reads them at job submission time.
64155

65-
* [MicroK8s](https://microk8s.io/), which is the simplest production-grade conformant K8s.
66-
Lightweight and focused. Single command install on Linux, Windows and macOS. See the
67-
[installation guide](https://microk8s.io/#install-microk8s) for more information.
68-
* [Charmed Kubernetes](https://ubuntu.com/kubernetes/charmed-k8s), which is a platform independent,
69-
model-driven distribution of Kubernetes powered by [Juju](https://juju.is/)
70-
* [AWK EKS](https://ubuntu.com/kubernetes/charmed-k8s), which is the managed Kubernetes service
71-
provided by Amazon Web Services to run Kubernetes in the AWS cloud and on-premises data centers.
156+
For a step-by-step walkthrough of setting up these components, see the [Tutorial](tutorial-introduction). For supported Kubernetes distributions, see the [Requirements](reference-requirements) page.
72157

73-
Setup instructions are available in the [Tutorial](tutorial-introduction), specifically
74-
the [environment setup](tutorial-1-environment-setup) page.

docs/how-to/deploy/spark.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ myst:
99

1010
Charmed Apache Spark comes with a bundled set of components that allow you to easily manage Apache
1111
Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation.
12-
For an overview on the different components that form Charmed Apache Spark, please refer to the
13-
[components overview](explanation-component-overview) page.
12+
13+
For an overview of all components and how they relate to each other, see the [Components overview](explanation-component-overview).
1414

1515
## Prerequisites
1616

@@ -219,6 +219,23 @@ The following table provides the description of the different configuration opti
219219
| `s3.bucket` | Name of the S3 bucket to be used for storing logs and data |
220220
| `cos_model` | (Optional) Name of the model where COS is deployed. If omitted, the resource of the cos-integration submodules will not be deployed |
221221

222+
#### Example .tfvars.json
223+
224+
For example, to point to the MinIO instance and the right bucket:
225+
226+
```json
227+
{
228+
"storage_backend": "s3",
229+
"s3": {
230+
"region": "eu-central-1",
231+
"bucket": "spark-test",
232+
"endpoint": "http://<host>:80"
233+
}
234+
}
235+
```
236+
237+
For more information on this particular example, see the [microK8s MinIO demo](https://github.com/deusebio/datawarehousing-with-spark/blob/main/docs/microk8s_minio.md). For other examples, see the [repository](https://github.com/deusebio/datawarehousing-with-spark/blob/main/README.md).
238+
222239
```{caution}
223240
The Juju Terraform provider does not yet support cross-controller relations with COS.
224241
Therefore, COS model must be hosted in the same controller as the Charmed Apache Spark model.

docs/how-to/enable-monitoring.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ myst:
55
---
66

77
(how-to-monitoring)=
8-
# Enable and configuring monitoring
8+
# Enable and configure monitoring
99

1010
Charmed Apache Spark supports native integration with the Canonical Observability Stack (COS). If you want to enable monitoring on top of Charmed Apache Spark, make sure that you have a Juju model with COS correctly deployed.
1111

@@ -14,9 +14,9 @@ For more information about Charmed Apache Spark and COS integration, refer to th
1414

1515
Once COS is correctly deployed, to enable monitoring it is necessary to:
1616

17-
1. Integrate and configure the COS bundle with Charmed Apache Spark
17+
1. Deploy and integrate the monitoring components (Pushgateway, scrape config, Grafana agent, etc.) with Charmed Apache Spark
1818
2. Configure the Apache Spark service account
19-
3. (Optional) Integrate the optional components of Charmed Apache Spark (such as the Spark History Server charm and Charmed Apache Kyuubi) with the COS bundle
19+
3. (Optional) Integrate the optional components of Charmed Apache Spark (such as the Spark History Server charm and Charmed Apache Kyuubi) with COS
2020

2121
## Integrating/configuring with COS
2222

@@ -27,6 +27,16 @@ The deployments of these resources can be enabled/disabled using either overlays
2727
(for Juju bundles) or input variables (for Terraform bundles).
2828
Please refer to the [how-to deploy](how-to-deploy-spark) guide for more information.
2929

30+
The monitoring components (`grafana-agent-k8s`, `prometheus-pushgateway-k8s`,
31+
`prometheus-scrape-config-k8s`, `cos-configuration-k8s`) and their integrations
32+
are included in the [COS overlay](https://github.com/canonical/spark-k8s-bundle/blob/main/releases/3.4/yaml/overlays/cos-integration.yaml.j2)
33+
for Juju bundles or deployed automatically when `cos_model` is set in the Terraform module.
34+
See the [how-to deploy](how-to-deploy-spark) guide for details.
35+
36+
If you are not using the overlay or Terraform module, you can inspect the
37+
[COS overlay YAML](https://github.com/canonical/spark-k8s-bundle/blob/main/releases/3.4/yaml/overlays/cos-integration.yaml.j2)
38+
for the full list of charms and relations to deploy manually.
39+
3040
After the deployment settles on an `active/idle` state, you can make sure that
3141
Grafana is correctly set up with dedicated dashboards.
3242
To do so, retrieve the credentials for logging into the Grafana dashboard, by
@@ -52,7 +62,7 @@ In particular, it is crucial to configure the scraping interval to make sure
5262
data points have proper sampling frequency, e.g.:
5363

5464
```shell
55-
juju config scrape-config --config scrape_interval=<SCRAPE_INTERVAL>
65+
juju config scrape-config scrape_interval=<SCRAPE_INTERVAL>
5666
```
5767

5868
For more information about the properties that can be set using `prometheus-scrape-config-k8s`,
@@ -104,7 +114,7 @@ and check that the following property:
104114
spark.metrics.conf.driver.sink.prometheus.pushgateway-address=<PROMETHEUS_GATEWAY_ADDRESS>:<PROMETHEUS_PORT>
105115
```
106116

107-
is configured with the correct values. The Prometheus Pushgateway address and port should be can be
117+
is configured with the correct values. The Prometheus Pushgateway address and port should be
108118
consistent with what is exposed by Juju, e.g.:
109119

110120
```shell

0 commit comments

Comments
 (0)