docs: FE Feedback fixes (#210)

izmalk · web-flow · commit 455e3383ac77 · 2026-04-28T21:47:35.000+01:00
Major update for the component overview page.
Minor update for the monitiring guide.
Added tfvars example.
The class not found error has been resolved with latest release - no docs update needed.
diff --git a/docs/.custom_wordlist.txt b/docs/.custom_wordlist.txt
@@ -96,6 +96,8 @@ xindy
 xml
 yaml
 YouTube
+walkthrough
+Alertmanager
 Ory
 middleware
 Entra
diff --git a/docs/explanation/component-overview.md b/docs/explanation/component-overview.md
@@ -7,68 +7,151 @@ myst:
 (explanation-component-overview)=
 # Components overview
 
-The Charmed Apache Spark solution bundles the following components:
-
-* [spark8t](https://github.com/canonical/spark-k8s-toolkit-py), which is a Python package to enhance
-  Apache Spark capabilities allowing to manage Spark jobs and service accounts, with hierarchical
-  level of configuration
-* [Charmed Apache Spark
-  Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) OCI-compliant
-  Image, that bundles Apache Spark binaries together with Canonical tooling to be used to start your
-  Apache Spark workload on Kubernetes, to use Charmed Apache Spark CLI tooling or derive your own
-  images from secured and supported bases;
-* [Apache Spark Client Snap](https://snapcraft.io/spark-client), to simplify Apache Spark
-  installation on edge nodes or local machines, by leveraging on confined
-  [snaps](https://snapcraft.io/) and exposing simple Snap commands to run and manage Spark jobs
-* [Charmed Bundle](https://charmhub.io/spark-k8s-bundle) to deploy, manage and operate Charmed
-  Apache Spark using [Juju](https://juju.is/). This includes:
-  * [Spark History Server](https://charmhub.io/spark-history-server-k8s) to expose a web UI for
-    analysing the logs of previous Spark jobs
-  * [Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) to provide a JDBC/ODBC endpoint for
-    running Hive powered by Apache Spark engines
-  * [Integration Hub for Apache Spark](https://charmhub.io/spark-integration-hub-k8s) to enable easy
-    configuration of Apache Spark service accounts, providing a native Juju integration with [S3
-    Integrator](https://charmhub.io/s3-integrator) and [Azure Storage
-    Integrator](https://charmhub.io/azure-storage-integrator) for enabling object-storage
-    persistence and with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite) for
-    enabling resource usage monitoring and alerting.
-
-The following image shows how the different artifacts interacts with each other:
+Charmed Apache Spark is composed of foundational software artifacts and a set of Juju operators (charms) that together provide a fully managed Apache Spark platform on Kubernetes. All charms are available individually on [Charmhub](https://charmhub.io/) and can also be deployed together via the [Charmed Apache Spark bundle](https://charmhub.io/spark-k8s-bundle) or [Terraform modules](https://github.com/canonical/spark-k8s-bundle/tree/main/releases/3.4/terraform).
+
+## Software artifacts
+
+Three foundational components that are used independently of Juju: [spark8t](explanation-component-overview-spark8t), the [Charmed Apache Spark Rock](explanation-component-overview-rock), and the [spark-client snap](explanation-component-overview-snap).
+
+(explanation-component-overview-spark8t)=
+### spark8t
+
+[spark8t](https://github.com/canonical/spark-k8s-toolkit-py) is a Python library that extends Apache Spark with tooling to manage Spark jobs and service accounts with hierarchical configuration. It is the foundation shared by both the `spark-client` snap, the OCI images and the Juju charms.
+
+(explanation-component-overview-rock)=
+### Charmed Apache Spark Rock
+
+The [Charmed Apache Spark Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) is an OCI-compliant container image that bundles Apache Spark binaries together with Canonical tooling. It is used as the base image for Spark driver and executor pods on Kubernetes, and as the foundation for the `spark-client` snap.
+
+(explanation-component-overview-snap)=
+### spark-client snap
+
+The [spark-client snap](https://snapcraft.io/spark-client) provides CLI tools for working with Charmed Apache Spark from a workstation or edge node. It communicates with the Kubernetes API to submit jobs and manage service accounts — it does not connect to any Juju charm directly.
+
+| Command | Description |
+|---|---|
+| `spark-client.spark-submit` | Submit Spark applications to a Kubernetes cluster |
+| `spark-client.pyspark` | Start an interactive PySpark shell |
+| `spark-client.service-account-registry` | Create, configure, and manage Spark service accounts |
+| `spark-client.beeline` | JDBC client for connecting to Apache Kyuubi endpoints |
+| `spark-client.import-certificate` | Import TLS certificates for encrypted Kyuubi connections |
+
+## Juju operators (charms)
+
+Each subsection below groups charms by function. All charms can be deployed individually or together via the [bundle](https://charmhub.io/spark-k8s-bundle).
+
+### Core components
+
+The following charms form the foundation of any Charmed Apache Spark deployment, connecting Spark service accounts to external services and providing a UI for completed job logs:
+
+| Charm | Description |
+|---|---|
+| [`spark-integration-hub-k8s`](https://charmhub.io/spark-integration-hub-k8s) | Central hub that manages Spark service account configurations and writes them into Kubernetes Secrets. It allows high-level configuration of Spark properties and seamless integration with external services, such as object storage backends and COS deployments. |
+| [`s3-integrator`](https://charmhub.io/s3-integrator) | Supplies S3-compatible object storage credentials (endpoint, bucket, access key) to the Integration Hub and History Server. Supports MinIO, AWS S3, and any S3-compatible backend. |
+| [`azure-storage-integrator`](https://charmhub.io/azure-storage-integrator) | Alternative to `s3-integrator` for deployments using Azure Blob Storage. |
+| [`spark-history-server-k8s`](https://charmhub.io/spark-history-server-k8s) | Exposes a web UI for browsing and analyzing event logs of completed Spark jobs stored in object storage. Receives credentials from `s3-integrator` or `azure-storage-integrator`. |
+| [`traefik-k8s`](https://charmhub.io/traefik-k8s) | Kubernetes ingress proxy. Exposes the History Server web UI at a stable URL outside the cluster. |
+
+### Apache Kyuubi (SQL / JDBC)
+
+[Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) provides a JDBC/ODBC endpoint for running SQL queries against data in object storage, powered by Apache Spark engines.
+
+| Charm | Description |
+|---|---|
+| [`kyuubi-k8s`](https://charmhub.io/kyuubi-k8s) | Provides a JDBC/ODBC endpoint for SQL queries. Integrates with the Integration Hub to obtain Spark service account configuration. Supports horizontal scaling and external metastore. |
+| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `auth-db`) | Required authentication database for Kyuubi. Kyuubi will remain blocked without this integration. |
+| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `metastore`) | External Hive metastore providing persistent metadata storage. Without it, metadata is stored on pod-local storage and lost on pod restarts. |
+| [`zookeeper-k8s`](https://charmhub.io/zookeeper-k8s) | Required for multi-node Kyuubi deployments. Coordinates distributed Kyuubi instances. |
+| [`self-signed-certificates`](https://charmhub.io/self-signed-certificates) | Provides TLS certificates to Kyuubi for encrypted JDBC connections. For production environments, use a CA-backed certificates operator instead. |
+| [`data-integrator`](https://charmhub.io/data-integrator) | Retrieves JDBC credentials (endpoint, username, password, TLS certificate) from Kyuubi via the `get-credentials` action, making them available to external clients. |
+
+### Observability (COS integration)
+
+Charmed Apache Spark integrates natively with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite), which is deployed in a separate Juju model and includes Grafana, Prometheus, Loki, and Alertmanager.
+
+The [Tutorial](tutorial-introduction) demonstrates a simplified COS setup using only `prometheus-pushgateway-k8s` and `cos-configuration-k8s`, integrated directly with the COS model charms. The full bundle overlay additionally deploys `grafana-agent-k8s` and `prometheus-scrape-config-k8s` as a cross-model observability bridge:
+
+| Charm | Tutorial | Bundle | Description |
+|---|---|---|---|
+| [`prometheus-pushgateway-k8s`](https://charmhub.io/prometheus-pushgateway-k8s) | Yes | Yes | Accepts metrics pushed by ephemeral Spark jobs (which are too short-lived for pull-based scraping) and exposes them to Prometheus. In the full bundle, integrates with the Integration Hub to automatically configure service accounts with the pushgateway address. |
+| [`cos-configuration-k8s`](https://charmhub.io/cos-configuration-k8s) | Yes | Yes | Syncs Grafana dashboard definitions from a git repository into Grafana. Pre-configured in the bundle to use the dashboards from this repository. |
+| [`grafana-agent-k8s`](https://charmhub.io/grafana-agent-k8s) | — | Yes | Cross-model bridge that ships metrics, log streams, and dashboard definitions from the Spark model to the COS model via remote-write and Loki push API. |
+| [`prometheus-scrape-config-k8s`](https://charmhub.io/prometheus-scrape-config-k8s) | — | Yes | Configures the Prometheus scrape interval for the Pushgateway metrics endpoint. |
+
+## Architecture
+
+The following diagram shows how the components relate in a full deployment:
 
 ```{mermaid}
-flowchart TD
-    spark8t["`**spark8t** 
-    (*python package*)
-    exposes functionalities to create, configure and manage Apache Spark users via a Python SDK`"]
-    
-    spark-rock["`**Charmed Apache Spark Rock** 
-    (*OCI Image*)
-    provides a reliable Apache Spark image to run Apache Spark applications and Apache Spark CLI tooling`"]
-
-    spark-client["`**Spark Client Snap** 
-    (*snap*)
-    simplify client integration with an Apache Spark Kubernetes cluster via a snap package to be installed in edge nodes or locally`"]
-
-    spark-k8s-bundle["`**Charmed Apache Spark** 
-    (*Charmed Operator*)
-    manages the entire lifecycle of Spark jobs`"]
-
-    spark8t --> spark-rock
-    spark8t --> spark-client
-    spark-rock --> spark-client
-    spark-rock --> spark-k8s-bundle
+flowchart TB
+    client["<b>spark-client snap</b><br>spark-submit · pyspark<br>beeline · service-account-registry"]
+    backend[("<b>Object Storage</b><br>MinIO · AWS S3 · Azure Blob")]
+
+    subgraph spark-model["Spark Juju Model"]
+        direction TB
+
+        hub["<b>spark-integration-hub-k8s</b>"]
+
+        subgraph integrators["Storage Integrators"]
+            direction LR
+            s3int["s3-integrator"]
+            azint["azure-storage-integrator"]
+        end
+
+        traefik["traefik-k8s"]
+        hs["spark-history-server-k8s"]
+
+        subgraph kyu-grp["Apache Kyuubi"]
+            direction TB
+            kyuubi["kyuubi-k8s"]
+            authdb["postgresql-k8s<br>(auth-db)"]
+            metadb["postgresql-k8s<br>(metastore)"]
+            zk["zookeeper-k8s"]
+            tls["self-signed-certificates"]
+            di["data-integrator"]
+        end
+
+        subgraph cos-bridge["COS Bridge"]
+            direction LR
+            pgw["prometheus-pushgateway-k8s"]
+            scrape["prometheus-scrape-config-k8s"]
+            agent["grafana-agent-k8s"]
+            cosconf["cos-configuration-k8s"]
+        end
+    end
+
+    subgraph cos-model["COS Juju Model (cos-lite)"]
+        direction LR
+        prom["Prometheus"]
+        graf["Grafana"]
+        loki_n["Loki"]
+    end
+
+    client -->|"K8s API<br>(spark-submit / pyspark)"| hub
+    client -->|"JDBC"| kyuubi
+    backend <-->|credentials| s3int
+    backend <-->|credentials| azint
+    s3int -->|s3-credentials| hub
+    azint -->|azure-storage-credentials| hub
+    s3int -->|s3-credentials| hs
+    azint -->|azure-storage-credentials| hs
+    hub -->|spark-service-account| kyuubi
+    hub -->|cos| pgw
+    traefik -->|ingress| hs
+    kyuubi -->|auth-db| authdb
+    kyuubi -->|metastore-db| metadb
+    kyuubi -->|zookeeper| zk
+    kyuubi -->|certificates| tls
+    di -->|jdbc| kyuubi
+    hs -->|metrics · logs · dashboards| agent
+    pgw --- scrape -->|metrics| agent
+    cosconf -->|dashboards| agent
+    agent -->|remote-write| prom
+    agent -->|push API| loki_n
+    agent -->|dashboards| graf
 ```
 
-The Charmed Apache Spark solution can be used to deploy and manage Apache Spark workloads using the
-provided distribution on any conformant Kubernetes (recommended versions `1.32` and above), like:
+The `spark-client` tools communicate with Kubernetes directly — the Integration Hub writes configuration into K8s Secrets associated with service accounts, and `spark-client` reads them at job submission time.
 
-* [MicroK8s](https://microk8s.io/), which is the simplest production-grade conformant K8s.
-Lightweight and focused. Single command install on Linux, Windows and macOS. See the
-[installation guide](https://microk8s.io/#install-microk8s) for more information.
-* [Charmed Kubernetes](https://ubuntu.com/kubernetes/charmed-k8s), which is a platform independent,
-  model-driven distribution of Kubernetes powered by [Juju](https://juju.is/)
-* [AWK EKS](https://ubuntu.com/kubernetes/charmed-k8s), which is the managed Kubernetes service
-  provided by Amazon Web Services to run Kubernetes in the AWS cloud and on-premises data centers.
+For a step-by-step walkthrough of setting up these components, see the [Tutorial](tutorial-introduction). For supported Kubernetes distributions, see the [Requirements](reference-requirements) page.
 
-Setup instructions are available in the [Tutorial](tutorial-introduction), specifically
-the [environment setup](tutorial-1-environment-setup) page.
diff --git a/docs/how-to/deploy/spark.md b/docs/how-to/deploy/spark.md
@@ -9,8 +9,8 @@ myst:
 
 Charmed Apache Spark comes with a bundled set of components that allow you to easily manage Apache
 Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation.
-For an overview on the different components that form Charmed Apache Spark, please refer to the
-[components overview](explanation-component-overview) page.
+
+For an overview of all components and how they relate to each other, see the [Components overview](explanation-component-overview).
 
 ## Prerequisites
 
@@ -219,6 +219,23 @@ The following table provides the description of the different configuration opti
 | `s3.bucket`   | Name of the S3 bucket to be used for storing logs and data                                                                            |
 | `cos_model`   | (Optional) Name of the model where COS is deployed. If omitted, the resource of the cos-integration submodules will not be deployed   |
 
+#### Example .tfvars.json
+
+For example, to point to the MinIO instance and the right bucket:
+
+```json
+{
+  "storage_backend": "s3",
+  "s3": {
+    "region": "eu-central-1",
+    "bucket": "spark-test",
+    "endpoint": "http://<host>:80"
+  }
+}
+```
+
+For more information on this particular example, see the [microK8s MinIO demo](https://github.com/deusebio/datawarehousing-with-spark/blob/main/docs/microk8s_minio.md). For other examples, see the [repository](https://github.com/deusebio/datawarehousing-with-spark/blob/main/README.md).
+
 ```{caution}
 The Juju Terraform provider does not yet support cross-controller relations with COS.
 Therefore, COS model must be hosted in the same controller as the Charmed Apache Spark model. 
diff --git a/docs/how-to/enable-monitoring.md b/docs/how-to/enable-monitoring.md
@@ -5,7 +5,7 @@ myst:
 ---
 
 (how-to-monitoring)=
-# Enable and configuring monitoring
+# Enable and configure monitoring
 
 Charmed Apache Spark supports native integration with the Canonical Observability Stack (COS). If you want to enable monitoring on top of Charmed Apache Spark, make sure that you have a Juju model with COS correctly deployed.
 
@@ -14,9 +14,9 @@ For more information about Charmed Apache Spark and COS integration, refer to th
 
 Once COS is correctly deployed, to enable monitoring it is necessary to:
 
-1. Integrate and configure the COS bundle with Charmed Apache Spark
+1. Deploy and integrate the monitoring components (Pushgateway, scrape config, Grafana agent, etc.) with Charmed Apache Spark
 2. Configure the Apache Spark service account
-3. (Optional) Integrate the optional components of Charmed Apache Spark (such as the Spark History Server charm and Charmed Apache Kyuubi) with the COS bundle
+3. (Optional) Integrate the optional components of Charmed Apache Spark (such as the Spark History Server charm and Charmed Apache Kyuubi) with COS
 
 ## Integrating/configuring with COS
 
@@ -27,6 +27,16 @@ The deployments of these resources can be enabled/disabled using either overlays
 (for Juju bundles) or input variables (for Terraform bundles).
 Please refer to the [how-to deploy](how-to-deploy-spark) guide for more information.
 
+The monitoring components (`grafana-agent-k8s`, `prometheus-pushgateway-k8s`,
+`prometheus-scrape-config-k8s`, `cos-configuration-k8s`) and their integrations
+are included in the [COS overlay](https://github.com/canonical/spark-k8s-bundle/blob/main/releases/3.4/yaml/overlays/cos-integration.yaml.j2)
+for Juju bundles or deployed automatically when `cos_model` is set in the Terraform module.
+See the [how-to deploy](how-to-deploy-spark) guide for details.
+
+If you are not using the overlay or Terraform module, you can inspect the
+[COS overlay YAML](https://github.com/canonical/spark-k8s-bundle/blob/main/releases/3.4/yaml/overlays/cos-integration.yaml.j2)
+for the full list of charms and relations to deploy manually.
+
 After the deployment settles on an `active/idle` state, you can make sure that
 Grafana is correctly set up with dedicated dashboards.
 To do so, retrieve the credentials for logging into the Grafana dashboard, by
@@ -52,7 +62,7 @@ In particular, it is crucial to configure the scraping interval to make sure
 data points have proper sampling frequency, e.g.:
 
 ```shell
-juju config scrape-config --config scrape_interval=<SCRAPE_INTERVAL>
+juju config scrape-config scrape_interval=<SCRAPE_INTERVAL>
 ```
 
 For more information about the properties that can be set using `prometheus-scrape-config-k8s`,
@@ -104,7 +114,7 @@ and check that the following property:
 spark.metrics.conf.driver.sink.prometheus.pushgateway-address=<PROMETHEUS_GATEWAY_ADDRESS>:<PROMETHEUS_PORT>
 ```
 
-is configured with the correct values. The Prometheus Pushgateway address and port should be can be
+is configured with the correct values. The Prometheus Pushgateway address and port should be
 consistent with what is exposed by Juju, e.g.:
 
 ```shell