Skip to content

Commit bb96875

Browse files
docs: Home page remodeling (#181)
* Home page remodeling * Replaced Multipass broken links Signed-off-by: Vladimir Izmalkov <48120135+izmalk@users.noreply.github.com> Co-authored-by: Bikalpa Dhakal <theoctober19th@gmail.com>
1 parent a81e1db commit bb96875

10 files changed

Lines changed: 44 additions & 24 deletions

File tree

docs/.custom_wordlist.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,15 @@ kubeconfig
8181
CVEs
8282
Canonical's
8383
metastore
84+
DBeaver
85+
lakehouse
86+
serverless
87+
Kaggle
88+
GPUs
89+
Kubeconfig
90+
ConfigMaps
91+
plaintext
92+
databag
8493
serverless
8594
performant
8695
configmap

docs/explanation/security.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ Charmed Apache Spark K8s provides external authentication capabilities for:
177177
Authentication to the Kubernetes API follows standard implementations, as described in the [upstream Kubernetes documentation](https://kubernetes.io/docs/reference/access-authn-authz/authentication/).
178178
Please make sure that the distribution being used supports the authentication used by clients, and that the Kubernetes cluster has been correctly configured.
179179

180-
Generally, client applications store credentials information locally in a `KUBECONFIG` file.
180+
Generally, client applications store credentials information locally in a `kubeconfig` file.
181181
On the other hand, pods created by the charms and the Spark Job workloads receive credentials via shared secrets, mounted to the default locations `/var/run/secrets/kubernetes.io/serviceaccount/`.
182182
See the [upstream documentation](https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/#directly-accessing-the-rest-api) for more information.
183183

docs/how-to/deploy/environment.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ Make sure that the MicroK8s cluster is now up and running:
9898
microk8s status --wait-ready
9999
```
100100

101-
Export the Kubernetes config file associated with admin rights and store it in the $KUBECONFIG file, e.g. `~/.kube/config`:
101+
Export the Kubernetes config file associated with admin rights and store it in the `$KUBECONFIG` file, e.g. `~/.kube/config`:
102102

103103
```bash
104104
export KUBECONFIG=path/to/file # Usually ~/.kube/config
@@ -113,9 +113,9 @@ microk8s.enable dns rbac storage hostpath-storage
113113

114114
The MicroK8s cluster is now ready to be used.
115115

116-
#### External LoadBalancer
116+
#### External load balancer
117117

118-
If you want to expose the Spark History Server UI via a Traefik ingress, we need to enable an external loadbalancer:
118+
If you want to expose the Spark History Server UI via a Traefik ingress, we need to enable an external load balancer:
119119

120120
```bash
121121
IPADDR=$(ip -4 -j route get 2.2.2.2 | jq -r '.[] | .prefsrc')
@@ -185,13 +185,13 @@ You can then create the EKS via CLI:
185185
eksctl create cluster -f cluster.yaml
186186
```
187187

188-
The EKS cluster creation process may take several minutes. The cluster creation process should already update the `KUBECONFIG` file with the new cluster information. By default, `eksctl` creates a user that generates a new access token on the fly via the `aws` CLI. However, this conflicts with the `spark-client` snap that is strictly confined and does not have access to the `aws` command. Therefore, we recommend you to manually retrieve a token:
188+
The EKS cluster creation process may take several minutes. The cluster creation process should already update the `kubeconfig` file with the new cluster information. By default, `eksctl` creates a user that generates a new access token on the fly via the `aws` CLI. However, this conflicts with the `spark-client` snap that is strictly confined and does not have access to the `aws` command. Therefore, we recommend you to manually retrieve a token:
189189

190190
```bash
191191
aws eks get-token --region <AWS_REGION_NAME> --cluster-name spark-cluster --output json
192192
```
193193

194-
and paste the token in the KUBECONFIG file:
194+
and paste the token in the `kubeconfig` file:
195195

196196
```yaml
197197
users:
@@ -302,9 +302,9 @@ terraform output
302302
# resource_group_name = "TestSparkAKSRG"
303303
```
304304

305-
#### Generating Kubeconfig file
305+
#### Generating the Kubeconfig file
306306

307-
To generate the Kubeconfig file for connecting the client to the newly created cluster:
307+
To generate the `kubeconfig` file for connecting the client to the newly created cluster:
308308

309309
```bash
310310
az aks get-credentials --resource-group <resource_group_name> --name <aks_cluster_name> --file ~/.kube/config
@@ -437,7 +437,7 @@ The RadosGW API can then be reached at `<hostname>:<port>`, where `hostname` is
437437

438438
#### MicroK8s MinIO
439439

440-
If you have already a MicroK8s cluster running, you can enable the MinIO storage with the dedicated addon
440+
If you have already a MicroK8s cluster running, you can enable the MinIO storage with the dedicated add-on
441441

442442
```shell
443443
microk8s.enable minio

docs/how-to/manage-service-accounts/using-python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Furthermore, you need to make sure that `PYTHONPATH` contains the location where
1515

1616
The following snipped allows you to import relevant environment variables
1717
into a confined object, among which there should an auto-inference of your
18-
kubeconfig file location.
18+
`kubeconfig` file location.
1919

2020
```python
2121
import os

docs/how-to/spark-history-server/expose-web-gui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ IP=$(kubectl get pod spark-history-server-k8s-0 -n spark --template '{{.status.p
1919

2020
## With Ingress
2121

22-
The Spark History server can be exposed outside a K8s cluster by means of an ingress. This is the recommended way in production for any K8s distribution. Exposing Kubernetes services through an ingress generally requires the cloud provider/infrastrucutre to have an external load balancer integrated with the Kubernetes cluster. Most cloud providers (such as AWS, Google and Azure) provide this integration out-of-the-box. If you are running on MicroK8s, make sure that you have enabled `metallb`, as shown in the "How-To Setup K8s" user guide.
22+
The Spark History server can be exposed outside a K8s cluster by means of an ingress. This is the recommended way in production for any K8s distribution. Exposing Kubernetes services through an ingress generally requires the cloud provider/infrastrucutre to have an external load balancer integrated with the Kubernetes cluster. Most cloud providers (such as AWS, Google and Azure) provide this integration out-of-the-box. If you are running on MicroK8s, make sure that you have enabled `metallb`, as shown in the "How-To Setup K8s" user guide.
2323

2424
Spark History server can be exposed outside of the K8s cluster using `traefik-k8s` charm.
2525
If COS is enabled, you can use the ingress already provided as part of the COS bundle. Otherwise, you can deploy one using

docs/how-to/use-gpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ The Charmed Apache Spark solution offers an OCI image that supports the [Apache
55

66
## Setup
77

8-
After installing [spark-client](https://snapcraft.io/spark-client) and [Microk8s](https://microk8s.io/) with the GPU addon enabled, now we can look into how to launch Spark jobs with GPU in Kubernetes.
8+
After installing [spark-client](https://snapcraft.io/spark-client) and [Microk8s](https://microk8s.io/) with the GPU add-on enabled, now we can look into how to launch Spark jobs with GPU in Kubernetes.
99

1010
First, we need to create a pod template to limit the amount of GPU per container.
1111

docs/index.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,19 @@ easy-to-use application integration, and monitoring.
2525

2626
| | |
2727
|--|--|
28-
| [Tutorial](tutorial-introduction)</br> Learn how to use Charmed Apache Spark with our step-by-step guidance. Get started from [step one](tutorial-1-environment-setup). </br> | [How-to guides](how-to-deploy-index) </br> Practical instructions for key tasks, like [deploy](how-to-deploy-index), [manage service accounts](how-to-manage-service-accounts-index), [monitor metrics](how-to-monitoring), [process streams](how-to-streaming-jobs), [use GPU](how-to-use-gpu). |
29-
| [Reference](reference-index) </br> Technical information, for example: [release notes](reference-releases-index), [system requirements](reference-requirements), and [contact information](reference-contacts). | [Explanation](explanation-index) </br> Explore and grow your understanding of key topics, such as: [security](explanation-security), [cryptography](explanation-cryptography), [solution components](explanation-component-overview), [configuration](explanation-configuration), and [monitoring](explanation-monitoring). |
28+
| **Tutorial** | [Introduction](tutorial-introduction)[Step 1: Environment setup](tutorial-1-environment-setup) |
29+
| **Deployment** | [Environment setup](how-to-deploy-environment)[Charmed Apache Spark](how-to-deploy-spark)[Charmed Apache Kyuubi](how-to-deploy-kyuubi)[Requirements](reference-requirements) |
30+
| **Service account management** | [Integration hub](how-to-service-accounts-integration-hub)[Python](how-to-service-accounts-python)[Spark-client](how-to-service-accounts-spark-client) |
31+
| **Operations** | [Monitoring](how-to-monitoring) • Spark History Server: [Auth](how-to-spark-history-server-auth) and [web GUI](how-to-spark-history-server-expose-web-gui)[Use K8s pods](how-to-use-k8s-pods)[Streaming jobs](how-to-streaming-jobs)[Use GPUs](how-to-use-gpu) |
32+
| **Apache Kyuubi** | [External connections](how-to-apache-kyuubi-external-connections)[Integrate](how-to-apache-kyuubi-integrate-with-applications)[Metastore](how-to-apache-kyuubi-external-metastore)[Backups](how-to-apache-kyuubi-back-up-and-restore)[Upgrades](how-to-apache-kyuubi-upgrade)[GPU support](how-to-apache-kyuubi-gpu) |
33+
| **Security** | [Overview](explanation-security)[Enable encryption (Apache Kyuubi)](how-to-apache-kyuubi-encryption-and-passwords)[Cryptography](explanation-cryptography)[Self-signed certificates](how-to-self-signed-certificates) |
34+
35+
## How the documentation is organised
36+
37+
[Tutorial](tutorial-introduction): For new users needing to learn how to use Charmed Apache Kafka <br>
38+
[How-to guides](how-to-index): For users needing step-by-step instructions to achieve a practical goal <br>
39+
[Reference](reference-index): For precise, theoretical, factual information to be used while working with the charm <br>
40+
[Explanation](explanation-index): For deeper understanding of key Charmed Apache Kafka concepts <br>
3041

3142
## Project and community
3243

docs/reference/contacts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ Please do NOT file GitHub issues on security topics.
1818
* [Charmed Apache Kafka](https://charmhub.io/kafka)
1919
* [Git sources for Charmed Apache Spark](https://github.com/canonical/spark-k8s-bundle)
2020
* [Canonical Data on Launchpad](https://launchpad.net/~data-platform)
21-
* [Canonical Data on Matrix](https://matrix.to/#/#charmhub-data-platform:ubuntu.com)
21+
* [Canonical Data on Matrix](https://matrix.to/#/#charmhub-data-platform:ubuntu.com)

docs/tutorial/1-environment-setup.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ multipass launch --cpus 4 --memory 8G --disk 50G --name spark-tutorial 24.04
3131
```{note}
3232
See also:
3333
34-
* [How to create an instance](https://canonical.com/multipass/docs/create-an-instance#create-an-instance-with-a-specific-image) guide from Multipass documentation
35-
* [`multipass launch` command reference](https://canonical.com/multipass/docs/launch-command)
34+
* [How to create an instance](https://documentation.ubuntu.com/multipass/latest/how-to-guides/manage-instances/create-an-instance/#create-an-instance-with-a-specific-image) guide from Multipass documentation
35+
* [`multipass launch` command reference](https://documentation.ubuntu.com/multipass/latest/reference/command-line-interface/launch/)
3636
```
3737

3838
Check the status of the provisioned virtual machine:
@@ -116,13 +116,13 @@ addons:
116116
```
117117

118118
Let's generate a Kubernetes configuration file using MicroK8s and write it to `~/.kube/config`.
119-
This is where `kubectl` looks for the Kubeconfig file by default.
119+
This is where `kubectl` looks for the `kubeconfig` file by default.
120120

121121
```bash
122122
microk8s config | tee ~/.kube/config
123123
```
124124

125-
Now let's enable a few addons for using features like role based access control, usage of local volume for storage, and load balancing.
125+
Now let's enable a few add-ons for using features like role based access control, usage of local volume for storage, and load balancing.
126126

127127
```bash
128128
sudo microk8s enable rbac
@@ -133,7 +133,7 @@ IPADDR=$(ip -4 -j route get 2.2.2.2 | jq -r '.[] | .prefsrc')
133133
sudo microk8s enable metallb:$IPADDR-$IPADDR
134134
```
135135

136-
Wait for the commands to finish running and check the list of enabled addons:
136+
Wait for the commands to finish running and check the list of enabled add-ons:
137137

138138
```bash
139139
microk8s status --wait-ready
@@ -293,7 +293,7 @@ Apache Spark can be configured to use S3 for object storage.
293293
However, for this tutorial, instead of AWS S3, we'll use [MinIO](https://min.io/): a lightweight S3-compatible object storage.
294294
It is available as a MicroK8s [add-on](https://microk8s.io/docs/addon-minio) by default, allowing us to create a local S3 bucket, which is more convenient for our local tests.
295295

296-
Let's enable the MinIO addon for MicroK8s.
296+
Let's enable the MinIO add-on for MicroK8s.
297297

298298
```bash
299299
sudo microk8s enable minio

docs/tutorial/2-distributed-data-processing.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ sudo apt install zip
145145
unzip twitter.zip
146146
```
147147

148-
This archive unpacks a directory called `twcs` with a single csv file of the same in it.
148+
This archive unpacks a directory called `twcs` with a single `CSV` file of the same name in it.
149149
Let's upload it to our S3 storage:
150150

151151
```bash
@@ -170,15 +170,15 @@ spark-client.pyspark --username spark --namespace spark
170170

171171
For distributed and parallel data processing Apache Spark actively uses the concept of a [resilient distributed dataset (RDD)](https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds), which is a fault-tolerant collection of elements that can be operated on in parallel across the nodes of the cluster.
172172

173-
Read CSV from S3 and create an RDD from our sample dataset:
173+
Read `CSV` from S3 and create an RDD from our sample dataset:
174174

175175
```python
176176
rdd = spark.read.csv("s3a://spark-tutorial/twitter.csv", header=True).rdd
177177
```
178178

179179
Now that RDD can be used for parallel processing by multiple Apache Spark executors.
180180

181-
Count the number of tweets (lines in CSV) with "text" field containing "Ubuntu" in a case insensitive way:
181+
Count the number of tweets (lines in `CSV`) with "text" field containing "Ubuntu" in a case insensitive way:
182182

183183
```python
184184
from operator import add

0 commit comments

Comments
 (0)