Skip to content

Commit 8f0c137

Browse files
committed
TELCODOCS-1874: Adding hub cluster RDS - Tech Preview
1 parent 10220ca commit 8f0c137

33 files changed

+1287
-0
lines changed

_topic_maps/_topic_map.yml

+2
Original file line numberDiff line numberDiff line change
@@ -3332,6 +3332,8 @@ Topics:
33323332
File: telco-core-rds
33333333
- Name: Telco RAN DU reference design specifications
33343334
File: telco-ran-du-rds
3335+
- Name: Telco hub reference design specifications
3336+
File: telco-hub-rds
33353337
- Name: Comparing cluster configurations
33363338
Dir: cluster-compare
33373339
Distros: openshift-origin,openshift-enterprise
89.3 KB
Loading
Loading
+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-acm-observability_{context}"]
3+
= {rh-rhacm} Observability
4+
5+
Cluster observability is provided by the multicluster engine and {rh-rhacm}.
6+
7+
* Observability storage needs several `PV` resources and an S3 compatible bucket storage for long term retention of the metrics.
8+
* Storage requirements calculation is complex and dependent on the specific workloads and characteristics of managed clusters.
9+
Requirements for `PV` resources and the S3 bucket depend on many aspects including data retention, the number of managed clusters, managed cluster workloads, and so on.
10+
* Estimate the required storage for observability by using the observability sizing calculator in the {rh-rhacm} capacity planning repository.
11+
See the Red Hat Knowledgebase article link:https://access.redhat.com/articles/7103886[Calculating storage need for MultiClusterHub Observability on telco environments] for an explanation of using the calculator to estimate observability storage requirements.
12+
The below table uses inputs derived from the telco RAN DU RDS and the hub cluster RDS as representative values.
13+
14+
[NOTE]
15+
====
16+
The following numbers are estimated.
17+
Tune the values for more accurate results.
18+
Add an engineering margin, for example +20%, to the results to account for potential estimation inaccuracies.
19+
====
20+
21+
.Cluster requirements
22+
[cols="42%,42%,16%",options="header"]
23+
|====
24+
|Capacity planner input
25+
|Data source
26+
|Example value
27+
28+
|Number of control plane nodes
29+
|Hub cluster RDS (scale) and telco RAN DU RDS (topology)
30+
|3500
31+
32+
|Number of additional worker nodes
33+
|Hub cluster RDS (scale) and telco RAN DU RDS (topology)
34+
|0
35+
36+
|Days for storage of data
37+
|Hub cluster RDS
38+
|15
39+
40+
|Total Number of pods per cluster
41+
|Telco RAN DU RDS
42+
|120
43+
44+
|Number of namespaces (excl OCP)
45+
|Telco RAN DU RDS
46+
|4
47+
48+
|Number of metric samples per hour
49+
|Default value
50+
|12
51+
52+
|Number of hours of retention in Receiver PV
53+
|Default value
54+
|24
55+
|====
56+
57+
With these input values, the sizing calculator as described in the Red Hat Knowledgebase article link:https://access.redhat.com/articles/7103886[Calculating storage need for MultiClusterHub Observability on telco environments] indicates the following storage needs:
58+
59+
.Storage requirements
60+
[options="header"]
61+
|====
62+
2+|alertmanager PV 2+|thanos-receive PV 2+|thanos-compactor PV
63+
64+
|*Per replica* |*Total* |*Per replica* |*Total* 2+|*Total*
65+
66+
|10GBi |30GBi |10GBi |30GBi 2+|100GBi
67+
|====
68+
69+
.Storage requirements
70+
[options="header"]
71+
|====
72+
2+|thanos-rule PV 2+|thanos-store PV 2+|Object bucket^[1]^
73+
74+
|*Per replica* |*Total* |*Per replica* |*Total* |*Per day* |*Total*
75+
76+
|30GBi |90GBi |100GBi |300GBi |15GBi |101GBi
77+
|====
78+
[1] For object bucket we assume we disable downsampling, so only need to calculate storage for raw data.

modules/telco-hub-acmMCH-yaml.adoc

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[id="telco-hub-acmMCH-yaml"]
2+
.acmMCH.yaml
3+
[source,yaml]
4+
----
5+
link:https://raw.githubusercontent.com/openshift-kni/telco-reference/release-4.19/telco-hub/configuration/reference-crs/required/acm/acmMCH.yaml[role=include]
6+
----
7+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
:_mod-docs-content-type: CONCEPT
2+
[id="telco-hub-architecture-overview_{context}"]
3+
= Hub cluster architecture overview
4+
5+
6+
Use the features and components running on the management hub cluster to manage many other clusters in a hub-and-spoke topology.
7+
The hub cluster provides a highly-available and centralized interface for managing the configuration, lifecycle, and observability of the fleet of deployed clusters.
8+
9+
[NOTE]
10+
====
11+
All management hub functionality can be deployed on a dedicated {product-title} cluster or as applications that are co-resident on an existing cluster.
12+
====
13+
14+
Managed cluster lifecycle::
15+
Using a combination of Day 2 Operators, the hub cluster provides the necessary infrastructure to deploy and configure the fleet of clusters by using a GitOps methodology.
16+
Over the lifetime of the deployed clusters, further management of upgrades, scaling the number of clusters, node replacement, and other lifecycle management functions can be declaratively defined and rolled out.
17+
You can control the timing and progression of the rollout across the fleet.
18+
19+
Monitoring::
20+
+
21+
--
22+
The hub cluster provides monitoring and status reporting for the managed clusters through the Observability pillar of the {rh-rhacm} Operator.
23+
This includes aggregated metrics, alerts, and compliance monitoring through the Governance policy framework.
24+
--
25+
26+
The Telco management hub reference design specifications (RDS) and the associated reference CRs describe the telco engineering and QE validated method for deploying, configuring and managing the lifecycle of telco managed cluster infrastructure.
27+
The reference configuration includes the installation and configuration of the hub cluster components on top of {product-title}.
28+
29+
30+
.Hub cluster reference design components
31+
image::telco-hub-cluster-reference-design-components.png[]
32+
33+
.Hub cluster reference design architecture
34+
image::telco-hub-cluster-rds-architecture.png[]
35+
+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-assisted-service_{context}"]
3+
= Assisted Service
4+
5+
The Assisted Service is deployed with the multicluster engine and {rh-rhacm}.
6+
7+
.Assisted Service storage requirements
8+
[cols="1,2", options="header"]
9+
|====
10+
|Persistent volume resource
11+
|Size (GB)
12+
13+
|`imageStorage`
14+
|50
15+
16+
|`filesystemStorage`
17+
|700
18+
19+
|`dataBaseStorage`
20+
|20
21+
|====
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-cluster-topology_{context}"]
3+
= Cluster topology
4+
5+
In production settings, the {product-title} hub cluster must be highly available to maintain high availability of the management functions.
6+
7+
Limits and requirements::
8+
Use a highly available cluster topology for the hub cluster, for example:
9+
* Compact (3 nodes combined control plane and compute nodes)
10+
* Standard (3 control plane nodes + N compute nodes)
11+
12+
Engineering considerations::
13+
* In non-production settings, a {sno} cluster can be used for limited hub cluster functionality.
14+
* Certain capabilities, for example {rh-storage}, are not supported on {sno}.
15+
In this configuration some hub cluster features might not be available.
16+
* The number of optional compute nodes can vary depending on the scale of the specific use case.
17+
* Compute nodes can be added later as required.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-engineering-considerations_{context}"]
3+
= Hub cluster engineering considerations
4+
5+
The follwing sections describe the engineering considerations for hub cluster resource scaling targets and utilization.

modules/telco-hub-git-repository.adoc

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
:_mod-docs-content-type: CONCEPT
2+
[id="telco-hub-git-repository_{context}"]
3+
= Git repository
4+
5+
The telco management hub cluster supports a GitOps driven methodology for installing and managing the configuration of OpenShift clusters for various telco applications.
6+
This methodology requires an accessible Git repository that serves as the authoritative source of truth for cluster definitions and configuration artifacts.
7+
8+
Red Hat does not offer a commercially supported Git server.
9+
An existing Git server provided in the production environment can be used.
10+
Gitea and Gogs are examples of self-hosted Git servers that you can use.
11+
12+
The Git repository is typically provided in the production network external to the hub cluster.
13+
In a large-scale deployment, multiple hub clusters can use the same Git repository for maintaining the definitions of managed clusters. Using this approach, you can easily review the state of the complete network.
14+
As the source of truth for cluster definitions, the Git repository should be highly available and recoverable in disaster scenarios.
15+
16+
[NOTE]
17+
====
18+
For disaster recovery and multi-hub considerations, run the Git repository separately from the hub cluster.
19+
====
20+
21+
Limits and requirements::
22+
* A Git repository is required to support the {ztp} functions of the hub cluster, including installation, configuration, and lifecycle management of the managed clusters.
23+
* The Git repository must be accessible from the management cluster.
24+
25+
Engineering considerations::
26+
* The Git repository is used by the GitOps Operator to ensure continuous deployment and a single source of truth for the applied configuration.
27+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-gitops-operator-and-ztp-plugins_{context}"]
3+
= GitOps Operator and {ztp}
4+
5+
New in this release::
6+
* No reference design updates in this release
7+
8+
Description::
9+
GitOps Operator and {ztp} provide a GitOps-based infrastructure for managing cluster deployment and configuration.
10+
Cluster definitions and configurations are maintained as a declarative state in Git.
11+
You can apply `ClusterInstance` CRs to the hub cluster where the `SiteConfig` Operator renders them as installation CRs.
12+
In earlier releases, a {ztp} plugin supported the generation of installation CRs from `SiteConfig` CRs.
13+
This plugin is now deprecated.
14+
A separate {ztp} plugin is available to enable automatic wrapping of configuration CRs into policies based on the `PolicyGenerator` or `PolicyGenTemplate` CR.
15+
+
16+
You can deploy and manage multiple versions of {product-title} on managed clusters by using the baseline reference configuration CRs.
17+
You can use custom CRs alongside the baseline CRs.
18+
To maintain multiple per-version policies simultaneously, use Git to manage the versions of the source and policy CRs by using `PolicyGenerator` or `PolicyGenTemplate` CRs.
19+
20+
21+
Limits and requirements::
22+
* 300 single node `SiteConfig` CRs can be synchronized for each ArgoCD application.
23+
You can use multiple applications to achieve the maximum number of clusters supported by a single hub cluster.
24+
* To ensure consistent and complete cleanup of managed clusters and their associated resources during cluster or node deletion, you must configure ArgoCD to use background deletion mode.
25+
26+
Engineering considerations::
27+
* To avoid confusion or unintentional overwrite when updating content, use unique and distinguishable names for custom CRs in the `source-crs` directory and extra manifests.
28+
* Keep reference source CRs in a separate directory from custom CRs.
29+
This facilitates easy update of reference CRs as required.
30+
* To help with multiple versions, keep all source CRs and policy creation CRs in versioned Git repositories to ensure consistent generation of policies for each {product-title} version.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-hub-cluster-day-2-operators_{context}"]
3+
= Day 2 Operators in the hub cluster
4+
5+
The management hub cluster relies on a set of Day 2 Operators to provide critical management services and infrastructure.
6+
Use Operator versions that match the set of managed cluster versions in your fleet.
7+
8+
Install Day 2 Operators using Operator Lifecycle Manager (OLM) and `Subscription` CRs.
9+
`Subscription` CRs identify the specific Day 2 Operator to install, the catalog in which the operator is found, and the appropriate version channel for the Operator.
10+
By default OLM installs and attempt to keep Operators updated with the latest z-stream version available in the channel.
11+
By default all Subscriptions are set with an `installPlanApproval: Automatic` value.
12+
In this mode, OLM automatically installs new Operator versions when they are available in the catalog and channel.
13+
14+
[NOTE]
15+
====
16+
Setting `installPlanApproval` to automatic exposes the risk of the Operator being updated outside of defined maintenance windows if the catalog index is updated to include newer Operator versions.
17+
In a disconnected environment where you are building and maintaining a curated set of Operators and versions in the catalog, and if you follow a strategy of creating a new catalog index for updated versions, the risk of the Operators being inadvertently updated is largely removed.
18+
However, if you want to further close this risk, the `Subscription` CRs can be set to `installPlanApproval: Manual` which prevents Operators from being updated without explicit administrator approval.
19+
====
20+
21+
Limits and requirements::
22+
* When upgrading a Telco hub cluster, the versions of {product-title} and Operators must meet the requirements of all relevant compatibility matrixes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-hub-cluster-openshift-deployment_{context}"]
3+
= {product-title} installation on the hub cluster
4+
5+
Description::
6+
+
7+
--
8+
The reference method for installing {product-title} for the hub cluster is through the Agent-based Installer.
9+
10+
Agent-based Installer provides installation capabilities without additional centralized infrastructure.
11+
The Agent-based Installer creates an ISO image which you mount to the server to be installed.
12+
When you boot the server, {product-title} is installed alongside optionally supplied extra manifests, such as {ztp} custom resources.
13+
14+
[NOTE]
15+
====
16+
You can also install {product-title} in the hub cluster by using other installation methods.
17+
====
18+
19+
If hub cluster functions are being applied to an existing {product-title} cluster, the Agent-based Installer installation is not required.
20+
The remaining steps to install Day 2 Operators and configure the cluster for these functions remains the same.
21+
When {product-title} installation is complete, the set of additional Operators and their configuration must be installed on the hub cluster.
22+
23+
The reference configuration includes all of these CRs, which you can apply manually, for example:
24+
25+
[source,terminal]
26+
----
27+
$ oc apply -f <reference_cr>
28+
----
29+
30+
You can also add the reference configuration to the Git repository and apply it using ArgoCD.
31+
32+
[NOTE]
33+
====
34+
If applying manually the CRs manually, take care to apply the CRs in the order indicated by the ArgoCD wave annotations.
35+
Any CRs without annotations are in the initial wave.
36+
====
37+
--
38+
39+
Limits and requirements::
40+
* Agent-based Installer requires an accessible image repository containing all required {product-title} and Day 2 Operator images.
41+
* Agent-based Installer builds ISO images based on a specific OpenShift releases and specific cluster details.
42+
Installation of a second hub requires a separate ISO image to be built.
43+
44+
Engineering considerations::
45+
* Agent-based Installer provides a baseline {product-title} installation.
46+
You apply Day 2 Operators and other configuration CRs after the cluster is installed.
47+
* The reference configuration supports Agent-based Installer installation in a disconnected environment.
48+
* A limited set of additional manifests can be supplied at installation time.
49+
* Any `MachineConfiguration` CRs you require should be included as extra manifests during installation.
50+
51+
52+

modules/telco-hub-hub-components.adoc

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-hub-components_{context}"]
3+
= Hub cluster components
4+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-hub-disaster-recovery_{context}"]
3+
= Hub cluster disaster recovery
4+
5+
Note that loss of the hub cluster does not typically create a service outage on the managed clusters.
6+
Functions provided by the hub cluster will be lost, such as observability, configuration and LCM updates being driven through the hub cluster, and so on.
7+
8+
Limits and requirements::
9+
10+
* Backup,restore and disaster recovery are offered by the cluster backup and restore Operator, which depends on the OpenShift API for Data Protection (OADP) Operator.
11+
12+
Engineering considerations::
13+
14+
* The cluster backup and restore operator can be extended to third party resources of the hub cluster based on user configuration.
15+
* The cluster backup and restore operator is not enabled by default in ACM.
16+
The reference configuration enables this feature.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-local-storage-operator_{context}"]
3+
= Local Storage Operator
4+
5+
New in this release::
6+
* No reference design updates in this release
7+
8+
Description::
9+
You can create persistent volumes that can be used as `PVC` resources by applications with the Local Storage Operator.
10+
The number and type of `PV` resources that you create depends on your requirements.
11+
12+
Engineering considerations::
13+
* Create backing storage for `PV` CRs before creating the `PV`.
14+
This can be a partition, a local volume, LVM volume, or full disk.
15+
* Refer to the device listing in `LocalVolume` CRs by the hardware path used to access each device to ensure correct allocation of disks and partitions, for example, `/dev/disk/by-path/<id>`.
16+
Logical names (for example, `/dev/sda`) are not guaranteed to be consistent across node reboots.
17+

modules/telco-hub-logging.adoc

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-logging_{context}"]
3+
= Logging
4+
5+
New in this release::
6+
* No reference design updates in this release
7+
8+
Description::
9+
The Cluster Logging Operator enables collection and shipping of logs off the node for remote archival and analysis.
10+
The reference configuration uses Kafka to ship audit and infrastructure logs to a remote archive.
11+
12+
Limits and requirements::
13+
* The reference configuration does not include local log storage.
14+
* The reference configuration does not include aggregation of managed cluster logs at the hub cluster.
15+
16+
Engineering considerations::
17+
* The impact of cluster CPU use is based on the number or size of logs generated and the amount of log filtering configured.
18+
* The reference configuration does not include shipping of application logs.
19+
The inclusion of application logs in the configuration requires you to evaluate the application logging rate and have sufficient additional CPU resources allocated to the reserved set.
20+
21+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-managed-cluster-deployment_{context}"]
3+
= Managed cluster deployment
4+
5+
Description::
6+
As of {rh-rhacm} 2.12, using the SiteConfig Operator is the recommended method for deploying managed clusters.
7+
The SiteConfig Operator introduces a unified ClusterInstance API that decouples the parameters that define the cluster from the manner in which it is deployed.
8+
The SiteConfig Operator uses a set of cluster templates that are instantiated using the data from a `ClusterInstance` CR to dynamically generate installation manifests.
9+
Following the GitOps methodology, the `ClusterInstance` CR is sourced from a Git repository through ArgoCD.
10+
The `ClusterInstance` CR can be used to initiate cluster installation by using either Assisted Installer, or the image-based installation available in multicluster engine.
11+
12+
Limits and requirements::
13+
* The SiteConfig ArgoCD plugin which handles `SiteConfig` CRs is deprecated from {product-title} 4.18.
14+
15+
16+
Engineering considerations::
17+
* You must create a `Secret` CR with the login information for the cluster baseboard management controller (BMC).
18+
This Secret is then referenced in the `SiteConfig` CR.
19+
Integration with a secret store such as Vault can be used to manage the secrets.
20+
* Besides offering deployment method isolation and unification of Git and non-Git workflows, the SiteConfig Operator provides better scalability, greater flexibility with the use of custom templates, and an enhanced troubleshooting experience.

0 commit comments

Comments
 (0)