|
7 | 7 | (explanation-component-overview)= |
8 | 8 | # Components overview |
9 | 9 |
|
10 | | -The Charmed Apache Spark solution bundles the following components: |
11 | | - |
12 | | -* [spark8t](https://github.com/canonical/spark-k8s-toolkit-py), which is a Python package to enhance |
13 | | - Apache Spark capabilities allowing to manage Spark jobs and service accounts, with hierarchical |
14 | | - level of configuration |
15 | | -* [Charmed Apache Spark |
16 | | - Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) OCI-compliant |
17 | | - Image, that bundles Apache Spark binaries together with Canonical tooling to be used to start your |
18 | | - Apache Spark workload on Kubernetes, to use Charmed Apache Spark CLI tooling or derive your own |
19 | | - images from secured and supported bases; |
20 | | -* [Apache Spark Client Snap](https://snapcraft.io/spark-client), to simplify Apache Spark |
21 | | - installation on edge nodes or local machines, by leveraging on confined |
22 | | - [snaps](https://snapcraft.io/) and exposing simple Snap commands to run and manage Spark jobs |
23 | | -* [Charmed Bundle](https://charmhub.io/spark-k8s-bundle) to deploy, manage and operate Charmed |
24 | | - Apache Spark using [Juju](https://juju.is/). This includes: |
25 | | - * [Spark History Server](https://charmhub.io/spark-history-server-k8s) to expose a web UI for |
26 | | - analysing the logs of previous Spark jobs |
27 | | - * [Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) to provide a JDBC/ODBC endpoint for |
28 | | - running Hive powered by Apache Spark engines |
29 | | - * [Integration Hub for Apache Spark](https://charmhub.io/spark-integration-hub-k8s) to enable easy |
30 | | - configuration of Apache Spark service accounts, providing a native Juju integration with [S3 |
31 | | - Integrator](https://charmhub.io/s3-integrator) and [Azure Storage |
32 | | - Integrator](https://charmhub.io/azure-storage-integrator) for enabling object-storage |
33 | | - persistence and with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite) for |
34 | | - enabling resource usage monitoring and alerting. |
35 | | - |
36 | | -The following image shows how the different artifacts interacts with each other: |
| 10 | +Charmed Apache Spark is composed of foundational software artifacts and a set of Juju operators (charms) that together provide a fully managed Apache Spark platform on Kubernetes. All charms are available individually on [Charmhub](https://charmhub.io/) and can also be deployed together via the [Charmed Apache Spark bundle](https://charmhub.io/spark-k8s-bundle) or [Terraform modules](https://github.com/canonical/spark-k8s-bundle/tree/main/releases/3.4/terraform). |
| 11 | + |
| 12 | +## Software artifacts |
| 13 | + |
| 14 | +Three foundational components that are used independently of Juju: [spark8t](explanation-component-overview-spark8t), the [Charmed Apache Spark Rock](explanation-component-overview-rock), and the [spark-client snap](explanation-component-overview-snap). |
| 15 | + |
| 16 | +(explanation-component-overview-spark8t)= |
| 17 | +### spark8t |
| 18 | + |
| 19 | +[spark8t](https://github.com/canonical/spark-k8s-toolkit-py) is a Python library that extends Apache Spark with tooling to manage Spark jobs and service accounts with hierarchical configuration. It is the foundation shared by both the `spark-client` snap, the OCI images and the Juju charms. |
| 20 | + |
| 21 | +(explanation-component-overview-rock)= |
| 22 | +### Charmed Apache Spark Rock |
| 23 | + |
| 24 | +The [Charmed Apache Spark Rock](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) is an OCI-compliant container image that bundles Apache Spark binaries together with Canonical tooling. It is used as the base image for Spark driver and executor pods on Kubernetes, and as the foundation for the `spark-client` snap. |
| 25 | + |
| 26 | +(explanation-component-overview-snap)= |
| 27 | +### spark-client snap |
| 28 | + |
| 29 | +The [spark-client snap](https://snapcraft.io/spark-client) provides CLI tools for working with Charmed Apache Spark from a workstation or edge node. It communicates with the Kubernetes API to submit jobs and manage service accounts — it does not connect to any Juju charm directly. |
| 30 | + |
| 31 | +| Command | Description | |
| 32 | +|---|---| |
| 33 | +| `spark-client.spark-submit` | Submit Spark applications to a Kubernetes cluster | |
| 34 | +| `spark-client.pyspark` | Start an interactive PySpark shell | |
| 35 | +| `spark-client.service-account-registry` | Create, configure, and manage Spark service accounts | |
| 36 | +| `spark-client.beeline` | JDBC client for connecting to Apache Kyuubi endpoints | |
| 37 | +| `spark-client.import-certificate` | Import TLS certificates for encrypted Kyuubi connections | |
| 38 | + |
| 39 | +## Juju operators (charms) |
| 40 | + |
| 41 | +Each subsection below groups charms by function. All charms can be deployed individually or together via the [bundle](https://charmhub.io/spark-k8s-bundle). |
| 42 | + |
| 43 | +### Core components |
| 44 | + |
| 45 | +The following charms form the foundation of any Charmed Apache Spark deployment, connecting Spark service accounts to external services and providing a UI for completed job logs: |
| 46 | + |
| 47 | +| Charm | Description | |
| 48 | +|---|---| |
| 49 | +| [`spark-integration-hub-k8s`](https://charmhub.io/spark-integration-hub-k8s) | Central hub that manages Spark service account configurations and writes them into Kubernetes Secrets. It allows high-level configuration of Spark properties and seamless integration with external services, such as object storage backends and COS deployments. | |
| 50 | +| [`s3-integrator`](https://charmhub.io/s3-integrator) | Supplies S3-compatible object storage credentials (endpoint, bucket, access key) to the Integration Hub and History Server. Supports MinIO, AWS S3, and any S3-compatible backend. | |
| 51 | +| [`azure-storage-integrator`](https://charmhub.io/azure-storage-integrator) | Alternative to `s3-integrator` for deployments using Azure Blob Storage. | |
| 52 | +| [`spark-history-server-k8s`](https://charmhub.io/spark-history-server-k8s) | Exposes a web UI for browsing and analyzing event logs of completed Spark jobs stored in object storage. Receives credentials from `s3-integrator` or `azure-storage-integrator`. | |
| 53 | +| [`traefik-k8s`](https://charmhub.io/traefik-k8s) | Kubernetes ingress proxy. Exposes the History Server web UI at a stable URL outside the cluster. | |
| 54 | + |
| 55 | +### Apache Kyuubi (SQL / JDBC) |
| 56 | + |
| 57 | +[Charmed Apache Kyuubi](https://charmhub.io/kyuubi-k8s) provides a JDBC/ODBC endpoint for running SQL queries against data in object storage, powered by Apache Spark engines. |
| 58 | + |
| 59 | +| Charm | Description | |
| 60 | +|---|---| |
| 61 | +| [`kyuubi-k8s`](https://charmhub.io/kyuubi-k8s) | Provides a JDBC/ODBC endpoint for SQL queries. Integrates with the Integration Hub to obtain Spark service account configuration. Supports horizontal scaling and external metastore. | |
| 62 | +| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `auth-db`) | Required authentication database for Kyuubi. Kyuubi will remain blocked without this integration. | |
| 63 | +| [`postgresql-k8s`](https://charmhub.io/postgresql-k8s) (as `metastore`) | External Hive metastore providing persistent metadata storage. Without it, metadata is stored on pod-local storage and lost on pod restarts. | |
| 64 | +| [`zookeeper-k8s`](https://charmhub.io/zookeeper-k8s) | Required for multi-node Kyuubi deployments. Coordinates distributed Kyuubi instances. | |
| 65 | +| [`self-signed-certificates`](https://charmhub.io/self-signed-certificates) | Provides TLS certificates to Kyuubi for encrypted JDBC connections. For production environments, use a CA-backed certificates operator instead. | |
| 66 | +| [`data-integrator`](https://charmhub.io/data-integrator) | Retrieves JDBC credentials (endpoint, username, password, TLS certificate) from Kyuubi via the `get-credentials` action, making them available to external clients. | |
| 67 | + |
| 68 | +### Observability (COS integration) |
| 69 | + |
| 70 | +Charmed Apache Spark integrates natively with the [Canonical Observability Stack (COS)](https://charmhub.io/cos-lite), which is deployed in a separate Juju model and includes Grafana, Prometheus, Loki, and Alertmanager. |
| 71 | + |
| 72 | +The [Tutorial](tutorial-introduction) demonstrates a simplified COS setup using only `prometheus-pushgateway-k8s` and `cos-configuration-k8s`, integrated directly with the COS model charms. The full bundle overlay additionally deploys `grafana-agent-k8s` and `prometheus-scrape-config-k8s` as a cross-model observability bridge: |
| 73 | + |
| 74 | +| Charm | Tutorial | Bundle | Description | |
| 75 | +|---|---|---|---| |
| 76 | +| [`prometheus-pushgateway-k8s`](https://charmhub.io/prometheus-pushgateway-k8s) | Yes | Yes | Accepts metrics pushed by ephemeral Spark jobs (which are too short-lived for pull-based scraping) and exposes them to Prometheus. In the full bundle, integrates with the Integration Hub to automatically configure service accounts with the pushgateway address. | |
| 77 | +| [`cos-configuration-k8s`](https://charmhub.io/cos-configuration-k8s) | Yes | Yes | Syncs Grafana dashboard definitions from a git repository into Grafana. Pre-configured in the bundle to use the dashboards from this repository. | |
| 78 | +| [`grafana-agent-k8s`](https://charmhub.io/grafana-agent-k8s) | — | Yes | Cross-model bridge that ships metrics, log streams, and dashboard definitions from the Spark model to the COS model via remote-write and Loki push API. | |
| 79 | +| [`prometheus-scrape-config-k8s`](https://charmhub.io/prometheus-scrape-config-k8s) | — | Yes | Configures the Prometheus scrape interval for the Pushgateway metrics endpoint. | |
| 80 | + |
| 81 | +## Architecture |
| 82 | + |
| 83 | +The following diagram shows how the components relate in a full deployment: |
37 | 84 |
|
38 | 85 | ```{mermaid} |
39 | | -flowchart TD |
40 | | - spark8t["`**spark8t** |
41 | | - (*python package*) |
42 | | - exposes functionalities to create, configure and manage Apache Spark users via a Python SDK`"] |
43 | | - |
44 | | - spark-rock["`**Charmed Apache Spark Rock** |
45 | | - (*OCI Image*) |
46 | | - provides a reliable Apache Spark image to run Apache Spark applications and Apache Spark CLI tooling`"] |
47 | | -
|
48 | | - spark-client["`**Spark Client Snap** |
49 | | - (*snap*) |
50 | | - simplify client integration with an Apache Spark Kubernetes cluster via a snap package to be installed in edge nodes or locally`"] |
51 | | -
|
52 | | - spark-k8s-bundle["`**Charmed Apache Spark** |
53 | | - (*Charmed Operator*) |
54 | | - manages the entire lifecycle of Spark jobs`"] |
55 | | -
|
56 | | - spark8t --> spark-rock |
57 | | - spark8t --> spark-client |
58 | | - spark-rock --> spark-client |
59 | | - spark-rock --> spark-k8s-bundle |
| 86 | +flowchart TB |
| 87 | + client["<b>spark-client snap</b><br>spark-submit · pyspark<br>beeline · service-account-registry"] |
| 88 | + backend[("<b>Object Storage</b><br>MinIO · AWS S3 · Azure Blob")] |
| 89 | +
|
| 90 | + subgraph spark-model["Spark Juju Model"] |
| 91 | + direction TB |
| 92 | +
|
| 93 | + hub["<b>spark-integration-hub-k8s</b>"] |
| 94 | +
|
| 95 | + subgraph integrators["Storage Integrators"] |
| 96 | + direction LR |
| 97 | + s3int["s3-integrator"] |
| 98 | + azint["azure-storage-integrator"] |
| 99 | + end |
| 100 | +
|
| 101 | + traefik["traefik-k8s"] |
| 102 | + hs["spark-history-server-k8s"] |
| 103 | +
|
| 104 | + subgraph kyu-grp["Apache Kyuubi"] |
| 105 | + direction TB |
| 106 | + kyuubi["kyuubi-k8s"] |
| 107 | + authdb["postgresql-k8s<br>(auth-db)"] |
| 108 | + metadb["postgresql-k8s<br>(metastore)"] |
| 109 | + zk["zookeeper-k8s"] |
| 110 | + tls["self-signed-certificates"] |
| 111 | + di["data-integrator"] |
| 112 | + end |
| 113 | +
|
| 114 | + subgraph cos-bridge["COS Bridge"] |
| 115 | + direction LR |
| 116 | + pgw["prometheus-pushgateway-k8s"] |
| 117 | + scrape["prometheus-scrape-config-k8s"] |
| 118 | + agent["grafana-agent-k8s"] |
| 119 | + cosconf["cos-configuration-k8s"] |
| 120 | + end |
| 121 | + end |
| 122 | +
|
| 123 | + subgraph cos-model["COS Juju Model (cos-lite)"] |
| 124 | + direction LR |
| 125 | + prom["Prometheus"] |
| 126 | + graf["Grafana"] |
| 127 | + loki_n["Loki"] |
| 128 | + end |
| 129 | +
|
| 130 | + client -->|"K8s API<br>(spark-submit / pyspark)"| hub |
| 131 | + client -->|"JDBC"| kyuubi |
| 132 | + backend <-->|credentials| s3int |
| 133 | + backend <-->|credentials| azint |
| 134 | + s3int -->|s3-credentials| hub |
| 135 | + azint -->|azure-storage-credentials| hub |
| 136 | + s3int -->|s3-credentials| hs |
| 137 | + azint -->|azure-storage-credentials| hs |
| 138 | + hub -->|spark-service-account| kyuubi |
| 139 | + hub -->|cos| pgw |
| 140 | + traefik -->|ingress| hs |
| 141 | + kyuubi -->|auth-db| authdb |
| 142 | + kyuubi -->|metastore-db| metadb |
| 143 | + kyuubi -->|zookeeper| zk |
| 144 | + kyuubi -->|certificates| tls |
| 145 | + di -->|jdbc| kyuubi |
| 146 | + hs -->|metrics · logs · dashboards| agent |
| 147 | + pgw --- scrape -->|metrics| agent |
| 148 | + cosconf -->|dashboards| agent |
| 149 | + agent -->|remote-write| prom |
| 150 | + agent -->|push API| loki_n |
| 151 | + agent -->|dashboards| graf |
60 | 152 | ``` |
61 | 153 |
|
62 | | -The Charmed Apache Spark solution can be used to deploy and manage Apache Spark workloads using the |
63 | | -provided distribution on any conformant Kubernetes (recommended versions `1.32` and above), like: |
| 154 | +The `spark-client` tools communicate with Kubernetes directly — the Integration Hub writes configuration into K8s Secrets associated with service accounts, and `spark-client` reads them at job submission time. |
64 | 155 |
|
65 | | -* [MicroK8s](https://microk8s.io/), which is the simplest production-grade conformant K8s. |
66 | | -Lightweight and focused. Single command install on Linux, Windows and macOS. See the |
67 | | -[installation guide](https://microk8s.io/#install-microk8s) for more information. |
68 | | -* [Charmed Kubernetes](https://ubuntu.com/kubernetes/charmed-k8s), which is a platform independent, |
69 | | - model-driven distribution of Kubernetes powered by [Juju](https://juju.is/) |
70 | | -* [AWK EKS](https://ubuntu.com/kubernetes/charmed-k8s), which is the managed Kubernetes service |
71 | | - provided by Amazon Web Services to run Kubernetes in the AWS cloud and on-premises data centers. |
| 156 | +For a step-by-step walkthrough of setting up these components, see the [Tutorial](tutorial-introduction). For supported Kubernetes distributions, see the [Requirements](reference-requirements) page. |
72 | 157 |
|
73 | | -Setup instructions are available in the [Tutorial](tutorial-introduction), specifically |
74 | | -the [environment setup](tutorial-1-environment-setup) page. |
|
0 commit comments