Skip to content

Commit 150cce4

Browse files
committed
TELCODOCS-1874: Adding hub cluster RDS - Tech Preview
1 parent 10220ca commit 150cce4

31 files changed

+1301
-0
lines changed

_topic_maps/_topic_map.yml

+2
Original file line numberDiff line numberDiff line change
@@ -3332,6 +3332,8 @@ Topics:
33323332
File: telco-core-rds
33333333
- Name: Telco RAN DU reference design specifications
33343334
File: telco-ran-du-rds
3335+
- Name: Telco hub reference design specifications
3336+
File: telco-hub-rds
33353337
- Name: Comparing cluster configurations
33363338
Dir: cluster-compare
33373339
Distros: openshift-origin,openshift-enterprise
89.3 KB
Loading
Loading
+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-acm-observability_{context}"]
3+
= {rh-rhacm} Observability
4+
5+
Cluster observability is provided by the multicluster engine and {rh-rhacm}.
6+
7+
* Observability storage needs several `PV` resources and an S3 compatible bucket storage for long term retention of the metrics.
8+
* Storage requirements calculation is complex and dependent on the specific workloads and characteristics of managed clusters.
9+
Requirements for `PV` resources and the S3 bucket depend on many aspects including data retention, the number of managed clusters, managed cluster workloads, and so on.
10+
* Estimate the required storage for observability by using the observability sizing calculator in the {rh-rhacm} capacity planning repository.
11+
See the Red Hat Knowledgebase article link:https://access.redhat.com/articles/7103886[Calculating storage need for MultiClusterHub Observability on telco environments] for an explanation of using the calculator to estimate observability storage requirements.
12+
The below table uses inputs derived from the telco RAN DU RDS and the hub cluster RDS as representative values.
13+
14+
[NOTE]
15+
====
16+
The following numbers are estimated.
17+
Tune the values for more accurate results.
18+
Add an engineering margin, for example +20%, to the results to account for potential estimation inaccuracies.
19+
====
20+
21+
.Cluster requirements
22+
[cols="42%,42%,16%",options="header"]
23+
|====
24+
|Capacity planner input
25+
|Data source
26+
|Example value
27+
28+
|Number of control plane nodes
29+
|Hub cluster RDS (scale) and telco RAN DU RDS (topology)
30+
|3500
31+
32+
|Number of additional worker nodes
33+
|Hub cluster RDS (scale) and telco RAN DU RDS (topology)
34+
|0
35+
36+
|Days for storage of data
37+
|Hub cluster RDS
38+
|15
39+
40+
|Total Number of pods per cluster
41+
|Telco RAN DU RDS
42+
|120
43+
44+
|Number of namespaces (excl OCP)
45+
|Telco RAN DU RDS
46+
|4
47+
48+
|Number of metric samples per hour
49+
|Default value
50+
|12
51+
52+
|Number of hours of retention in Receiver PV
53+
|Default value
54+
|24
55+
|====
56+
57+
With these input values, the sizing calculator as described in the Red Hat Knowledgebase article link:https://access.redhat.com/articles/7103886[Calculating storage need for MultiClusterHub Observability on telco environments] indicates the following storage needs:
58+
59+
.Storage requirements
60+
[options="header"]
61+
|====
62+
2+|alertmanager PV 2+|thanos-receive PV 2+|thanos-compactor PV
63+
64+
|*Per replica* |*Total* |*Per replica* |*Total* 2+|*Total*
65+
66+
|10GBi |30GBi |10GBi |30GBi 2+|100GBi
67+
|====
68+
69+
.Storage requirements
70+
[options="header"]
71+
|====
72+
2+|thanos-rule PV 2+|thanos-store PV 2+|Object bucket^[1]^
73+
74+
|*Per replica* |*Total* |*Per replica* |*Total* |*Per day* |*Total*
75+
76+
|30GBi |90GBi |100GBi |300GBi |15GBi |101GBi
77+
|====
78+
[1] For object bucket we assume we disable downsampling, so only need to calculate storage for raw data.

modules/telco-hub-acmMCH-yaml.adoc

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[id="telco-hub-acmMCH-yaml"]
2+
.acmMCH.yaml
3+
[source,yaml]
4+
----
5+
link:https://raw.githubusercontent.com/openshift-kni/telco-reference/release-4.19/telco-hub/configuration/reference-crs/required/acm/acmMCH.yaml[role=include]
6+
----
7+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
:_mod-docs-content-type: CONCEPT
2+
[id="telco-hub-architecture-overview_{context}"]
3+
= Hub cluster architecture overview
4+
5+
6+
Use the features and components running on the management hub cluster to manage many other clusters in a hub-and-spoke topology.
7+
The hub cluster provides a highly-available and centralized interface for managing the configuration, lifecycle, and observability of the fleet of deployed clusters.
8+
9+
[NOTE]
10+
====
11+
All management hub functionality can be deployed on a dedicated {product-title} cluster or as applications that are co-resident on an existing cluster.
12+
====
13+
14+
Managed cluster lifecycle::
15+
Using a combination of Day 2 Operators, the hub cluster provides the necessary infrastructure to deploy and configure the fleet of clusters by using a GitOps methodology.
16+
Over the lifetime of the deployed clusters, further management of upgrades, scaling the number of clusters, node replacement, and other lifecycle management functions can be declaratively defined and rolled out.
17+
You can control the timing and progression of the rollout across the fleet.
18+
19+
Monitoring::
20+
+
21+
--
22+
The hub cluster provides monitoring and status reporting for the managed clusters through the Observability pillar of the {rh-rhacm} Operator.
23+
This includes aggregated metrics, alerts, and compliance monitoring through the Governance policy framework.
24+
--
25+
26+
The Telco management hub reference design specifications (RDS) and the associated reference CRs describe the telco engineering and QE validated method for deploying, configuring and managing the lifecycle of telco managed cluster infrastructure.
27+
The reference configuration includes the installation and configuration of the hub cluster components on top of {product-title}.
28+
29+
30+
.Hub cluster reference design components
31+
image::telco-hub-cluster-reference-design-components.png[]
32+
33+
.Hub cluster reference design architecture
34+
image::telco-hub-cluster-rds-architecture.png[]
35+
+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-assisted-service_{context}"]
3+
= Assisted Service
4+
5+
The Assisted Service is deployed with the multicluster engine and {rh-rhacm}.
6+
7+
.Assisted Service storage requirements
8+
[cols="1,2", options="header"]
9+
|====
10+
|Persistent volume resource
11+
|Size (GB)
12+
13+
|`imageStorage`
14+
|50
15+
16+
|`filesystemStorage`
17+
|700
18+
19+
|`dataBaseStorage`
20+
|20
21+
|====
22+
23+
24+
[role="_additional-resources"]
25+
.Additional resources
26+
27+
* link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12/html/clusters/cluster_mce_overview#enable-cim-disconnected[Enabling central infrastructure management in disconnected environments]
28+
+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-cluster-topology_{context}"]
3+
= Cluster topology
4+
5+
In production settings, the {product-title} hub cluster must be highly available to maintain high availability of the management functions.
6+
7+
Limits and requirements::
8+
Use a highly available cluster topology for the hub cluster, for example:
9+
* Compact (3 nodes combined control plane and compute nodes)
10+
* Standard (3 control plane nodes + N compute nodes)
11+
12+
Engineering considerations::
13+
* In non-production settings, a {sno} cluster can be used for limited hub cluster functionality.
14+
* Certain capabilities, for example {rh-storage}, are not supported on {sno}.
15+
In this configuration some hub cluster features might not be available.
16+
* The number of optional compute nodes can vary depending on the scale of the specific use case.
17+
* Compute nodes can be added later as required.
18+
19+
[role="_additional-resources"]
20+
.Additional resources
21+
22+
* link:xref:../welcome/learn_more_about_openshift.adoc#architecture[{product-title} architecture]
23+
* link:xref:../post_installation_configuration/node-tasks.adoc#post-install-node-tasks[Postinstallation node tasks]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-engineering-considerations_{context}"]
3+
= Hub cluster engineering considerations
4+
5+
The follwing sections describe the engineering considerations for hub cluster resource scaling targets and utilization.
6+
7+
Reference configuration scaling target::
8+
+
9+
--
10+
The resource requirements for the hub cluster are directly dependent on the number of clusters being managed by the hub, the number of policies used for each managed cluster, and the set of features that are configured in {rh-rhacm}.
11+
12+
The hub cluster reference configuration can support up to 3500 managed {sno} clusters under the following conditions:
13+
14+
* 5 policies for each cluster with hub-side templating configured with a 10 minute evaluation interval.
15+
16+
* Only the following {rh-rhacm} add-ons are enabled:
17+
18+
** Policy controller
19+
** Observability with the default configuration
20+
21+
* You deploy managed clusters by using {ztp} in batches of up to 500 clusters at a time.
22+
23+
The reference configuration is also validated for deployment and management of a mix of managed cluster topologies.
24+
The specific limits depend on the mix of cluster topologies, enabled {rh-rhacm} features, and so on.
25+
In a mixed topology scenario, the reference hub configuration is validated with a combination of 1200 {sno} clusters, 400 compact clusters (3 nodes combined control plane and compute nodes), and 230 standard clusters (3 control plane and 2 worker nodes).
26+
27+
[NOTE]
28+
====
29+
Specific dimensioning requirements are highly dependent on the cluster topology and workload.
30+
See "Storage requirements" for details.
31+
Adjust cluster dimensions for the specific characteristics of your fleet of managed clusters.
32+
====
33+
--
34+
35+
Resource utilization::
36+
+
37+
--
38+
Resource utilization was measured for hub clusters in the following scenario:
39+
40+
* Under reference load managing 3500 {sno} clusters
41+
* 3-node compact cluster for management hub running on dual socket bare-metal servers.
42+
* Network impairment of 50ms round-trip latency, 100Mbps bandwidth limit and 0.02% packet loss.
43+
44+
.Resource utilization values
45+
[options="header"]
46+
|====
47+
|Metric |Peak Measurement
48+
|OpenShift Platform CPU |106 cores (52 cores per node)
49+
|OpenShift Platform memory |504G (168G per node)
50+
|Persistent storage |<pending data from scale test>
51+
|====
52+
--
53+
54+
55+
[role="_additional-resources"]
56+
.Additional resources
57+
58+
* link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12/html-single/governance/index#template-comparison-table[Comparison of hub cluster and managed cluster templates]
59+
60+

modules/telco-hub-git-repository.adoc

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
:_mod-docs-content-type: CONCEPT
2+
[id="telco-hub-git-repository_{context}"]
3+
= Git repository
4+
5+
The telco management hub cluster supports a GitOps driven methodology for installing and managing the configuration of OpenShift clusters for various telco applications.
6+
This methodology requires an accessible Git repository that serves as the authoritative source of truth for cluster definitions and configuration artifacts.
7+
8+
Red Hat does not offer a commercially supported Git server.
9+
An existing Git server provided in the production environment can be used.
10+
Gitea and Gogs are examples of self-hosted Git servers that you can use.
11+
12+
The Git repository is typically provided in the production network external to the hub cluster.
13+
In a large-scale deployment, multiple hub clusters can use the same Git repository for maintaining the definitions of managed clusters. Using this approach, you can easily review the state of the complete network.
14+
As the source of truth for cluster definitions, the Git repository should be highly available and recoverable in disaster scenarios.
15+
16+
[NOTE]
17+
====
18+
For disaster recovery and multi-hub considerations, run the Git repository separately from the hub cluster.
19+
====
20+
21+
Limits and requirements::
22+
* A Git repository is required to support the {ztp} functions of the hub cluster, including installation, configuration, and lifecycle management of the managed clusters.
23+
* The Git repository must be accessible from the management cluster.
24+
25+
Engineering considerations::
26+
* The Git repository is used by the GitOps Operator to ensure continuous deployment and a single source of truth for the applied configuration.
27+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-gitops-operator-and-ztp-plugins_{context}"]
3+
= GitOps Operator and {ztp}
4+
5+
New in this release::
6+
* No reference design updates in this release
7+
8+
Description::
9+
GitOps Operator and {ztp} provide a GitOps-based infrastructure for managing cluster deployment and configuration.
10+
Cluster definitions and configurations are maintained as a declarative state in Git.
11+
You can apply `ClusterInstance` CRs to the hub cluster where the `SiteConfig` Operator renders them as installation CRs.
12+
In earlier releases, a {ztp} plugin supported the generation of installation CRs from `SiteConfig` CRs.
13+
This plugin is now deprecated.
14+
A separate {ztp} plugin is available to enable automatic wrapping of configuration CRs into policies based on the `PolicyGenerator` or `PolicyGenTemplate` CR.
15+
+
16+
You can deploy and manage multiple versions of {product-title} on managed clusters by using the baseline reference configuration CRs.
17+
You can use custom CRs alongside the baseline CRs.
18+
To maintain multiple per-version policies simultaneously, use Git to manage the versions of the source and policy CRs by using `PolicyGenerator` or `PolicyGenTemplate` CRs.
19+
20+
21+
Limits and requirements::
22+
* 300 single node `SiteConfig` CRs can be synchronized for each ArgoCD application.
23+
You can use multiple applications to achieve the maximum number of clusters supported by a single hub cluster.
24+
* To ensure consistent and complete cleanup of managed clusters and their associated resources during cluster or node deletion, you must configure ArgoCD to use background deletion mode.
25+
26+
Engineering considerations::
27+
* To avoid confusion or unintentional overwrite when updating content, use unique and distinguishable names for custom CRs in the `source-crs` directory and extra manifests.
28+
* Keep reference source CRs in a separate directory from custom CRs.
29+
This facilitates easy update of reference CRs as required.
30+
* To help with multiple versions, keep all source CRs and policy creation CRs in versioned Git repositories to ensure consistent generation of policies for each {product-title} version.
31+
32+
[role="_additional-resources"]
33+
.Additional resources
34+
35+
* link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12/html/multicluster_engine_operator_with_red_hat_advanced_cluster_management/siteconfig-intro[ClusterInstance CR]
36+
* xref:../edge_computing/policygentemplate_for_ztp/ztp-configuring-managed-clusters-policies.adoc#ztp-configuring-managed-clusters-policies[PolicyGenTemplate CRs]
37+
* xref:../edge_computing/ztp-preparing-the-hub-cluster.adoc#ztp-preparing-the-ztp-git-repository-ver-ind_ztp-preparing-the-hub-cluster[{ztp} version independence]
38+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
:_mod-docs-content-type: REFERENCE
2+
[id="telco-hub-hub-cluster-day-2-operators_{context}"]
3+
= Day 2 Operators in the hub cluster
4+
5+
The management hub cluster relies on a set of Day 2 Operators to provide critical management services and infrastructure.
6+
Use Operator versions that match the set of managed cluster versions in your fleet.
7+
8+
Install Day 2 Operators using Operator Lifecycle Manager (OLM) and `Subscription` CRs.
9+
`Subscription` CRs identify the specific Day 2 Operator to install, the catalog in which the operator is found, and the appropriate version channel for the Operator.
10+
By default OLM installs and attempt to keep Operators updated with the latest z-stream version available in the channel.
11+
By default all Subscriptions are set with an `installPlanApproval: Automatic` value.
12+
In this mode, OLM automatically installs new Operator versions when they are available in the catalog and channel.
13+
14+
[NOTE]
15+
====
16+
Setting `installPlanApproval` to automatic exposes the risk of the Operator being updated outside of defined maintenance windows if the catalog index is updated to include newer Operator versions.
17+
In a disconnected environment where you are building and maintaining a curated set of Operators and versions in the catalog, and if you follow a strategy of creating a new catalog index for updated versions, the risk of the Operators being inadvertently updated is largely removed.
18+
However, if you want to further close this risk, the `Subscription` CRs can be set to `installPlanApproval: Manual` which prevents Operators from being updated without explicit administrator approval.
19+
====
20+
21+
Limits and requirements::
22+
* When upgrading a Telco hub cluster, the versions of {product-title} and Operators must meet the requirements of all relevant compatibility matrixes.
23+
24+
[role="_additional-resources"]
25+
.Additional resources
26+
27+
* link:https://access.redhat.com/articles/7073065[Red Hat Advanced Cluster Management for Kubernetes 2.11 Support Matrix]
28+
* link:https://access.redhat.com/support/policy/updates/openshift_operators[OpenShift Operator lifecycles]
29+
30+
* For more information about telco hub cluster update requirements, see:
31+
** xref:../edge_computing/ztp-preparing-the-hub-cluster.adoc#ztp-gitops-ztp-max-spoke-clusters_ztp-preparing-the-hub-cluster[Recommended hub cluster specifications and managed cluster limits for {ztp}].
32+
** link:https://access.redhat.com/articles/7073065[Red Hat Advanced Cluster Management for Kubernetes 2.11 Support Matrix]
33+
** link:https://access.redhat.com/support/policy/updates/openshift_operators[OpenShift Operator Life Cycles]
34+
35+
* For more information about updating the hub cluster, see:
36+
** xref:../updating/understanding_updates/intro-to-updates.adoc#understanding-openshift-updates[Introduction to OpenShift updates]
37+
** link:https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.13/html-single/install/index#upgrading-hub[Upgrading your hub cluster]
38+
** xref:../edge_computing/ztp-updating-gitops.adoc#ztp-updating-gitops[Updating {ztp}]
39+

0 commit comments

Comments
 (0)