Skip to content

Commit 17aef87

Browse files
authored
Intel AMX accelerated Multicloud GitOps pattern with OpenShift AI - documentation
1 parent 2ab8569 commit 17aef87

32 files changed

+978
-4
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
title: Intel AMX accelerated Multicloud GitOps with Openshift AI
3+
date: 2024-02-27
4+
validated: false
5+
summary: This is extension of Multicloud GitOps pattern with Red Hat Openshift AI component to show the value of using Intel AMX.
6+
products:
7+
- Red Hat OpenShift Container Platform
8+
- Red Hat Advanced Cluster Management
9+
- Red Hat Openshift AI
10+
- OpenVINO Toolkit Operator
11+
- 5th Gen Intel Xeon Scalable processors with Intel Advanced Matrix Extensions (Intel AMX)
12+
industries:
13+
- General
14+
aliases: /multicloud-gitops-amx-rhoai/
15+
# uncomment once this exists
16+
# pattern_logo: multicloud-gitops.png
17+
pattern_logo: amx-intel-ai.png
18+
links:
19+
install: mcg-amx-rhoai-getting-started
20+
help: https://groups.google.com/g/hybrid-cloud-patterns
21+
bugs: https://github.com/validatedpatterns-sandbox/amx-accelerated-multicloud-gitops/issues
22+
---
23+
24+
include::modules/comm-attributes.adoc[]
25+
26+
:toc:
27+
:imagesdir: /images
28+
:_content-type: CONCEPT
29+
30+
[id="about-multicloud-gitops-rhoai-amx-pattern"]
31+
== About the {amx-rhoai-mcg-pattern}
32+
33+
Use case::
34+
35+
* Use a GitOps approach to manage hybrid and multi-cloud deployments across both public and private clouds.
36+
* Enable cross-cluster governance and application lifecycle management.
37+
* Accelerate AI operations and improve computational performance by using Intel Advanced Matrix Extensions together with Openshift AI operator.
38+
* Securely manage secrets across the deployment.
39+
+
40+
[NOTE]
41+
====
42+
Based on the requirements of a specific implementation, certain details might differ. However, all validated patterns that are based on a portfolio architecture, generalize one or more successful deployments of a use case.
43+
====
44+
45+
Background::
46+
Organizations are aiming to develop, deploy, and operate applications on an open hybrid cloud in a stable, simple, and secure way. This hybrid strategy includes multi-cloud deployments where workloads might be running on multiple clusters and on multiple clouds, private or public.
47+
This strategy requires an infrastructure-as-code approach: GitOps. GitOps uses Git repositories as a single source of truth to deliver infrastructure-as-code. Submitted code checks the continuous integration (CI) process, while the continuous delivery (CD) process checks and applies requirements for things like security, infrastructure-as-code, or any other boundaries set for the application framework. All changes to code are tracked, making updates easy while also providing version control should a rollback be needed.
48+
Moreover, organizations are looking for solutions that increase efficiency and at the same time reduce costs, what is possible using *{intel-5th-gen-xeon-processors}* with a new build-in accelerator - *Intel Advanced Matrix Extensions*.
49+
50+
[id="about-solution"]
51+
== About the solution
52+
53+
This architecture covers hybrid and multi-cloud management with GitOps as shown in following figure. At a high level this requires a management hub, for DevOps and GitOps, and infrastructure that extends to one or more managed clusters running on private or public clouds. The automated infrastructure-as-code approach can manage the versioning of components and deploy according to the infrastructure-as-code configuration.
54+
55+
Benefits of Hybrid Multicloud management with GitOps:
56+
57+
* Unify management across cloud environments.
58+
* Dynamic infrastructure security.
59+
* Infrastructural continuous delivery best practices.
60+
61+
//figure 1 originally
62+
.Overview of the solution including the business drivers, management hub, and the clusters under management
63+
image::multicloud-gitops-amx-rhoai/hybrid-multicloud-management-gitops-hl-arch.png[Multicloud Architecture]
64+
65+
66+
[id="about-technology"]
67+
== About the technology
68+
69+
The following technologies are used in this solution:
70+
71+
https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[Red Hat OpenShift Platform]::
72+
An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments. It delivers a complete application platform for both traditional and cloud-native applications, allowing them to run anywhere. OpenShift has a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. It also enables the use of external secret management systems, for example, HashiCorp Vault in this case, to securely add secrets into the OpenShift platform.
73+
74+
https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[Red Hat OpenShift GitOps]::
75+
A declarative application continuous delivery tool for Kubernetes based on the ArgoCD project. Application definitions, configurations, and environments are declarative and version controlled in Git. It can automatically push the desired application state into a cluster, quickly find out if the application state is in sync with the desired state, and manage applications in multi-cluster environments.
76+
77+
https://www.redhat.com/en/technologies/management/advanced-cluster-management[Red Hat Advanced Cluster Management for Kubernetes]::
78+
Controls clusters and applications from a single console, with built-in security policies. Extends the value of Red Hat OpenShift by deploying apps, managing multiple clusters, and enforcing policies across multiple clusters at scale.
79+
80+
https://www.redhat.com/en/technologies/management/ansible[Red Hat Ansible Automation Platform]::
81+
Provides an enterprise framework for building and operating IT automation at scale across hybrid clouds including edge deployments. It enables users across an organization to create, share, and manage automation, from development and operations to security and network teams.
82+
83+
https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai[Red Hat Openshift AI]::
84+
A flexible, scalable MLOps platform with tools to build, deploy, and manage AI-enabled applications. OpenShift AI (previously called Red Hat OpenShift Data Science) supports the full lifecycle of AI/ML experiments and models, on-premise and in the public cloud.
85+
86+
https://github.com/openvinotoolkit/operator[OpenVINO Toolkit Operator]::
87+
The Operator includes OpenVINO™ Notebooks for development of AI optimized workloads and OpenVINO™ Model Server for deployment that enables AI inference execution at scale, and exposes AI models via gRPC and REST API interfaces.
88+
89+
https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html[Intel® Advanced Matrix Extensions]::
90+
A new built-in accelerator that improves the performance of deep-learning training and inference on the CPU and is ideal for workloads like natural-language processing, recommendation systems and image recognition.
91+
92+
Hashicorp Vault::
93+
Provides a secure centralized store for dynamic infrastructure and applications across clusters, including over low-trust networks between clouds and data centers.
94+
95+
This solution also uses a variety of _observability tools_ including the Prometheus monitoring and Grafana dashboard that are integrated with OpenShift as well as components of the Observatorium meta-project which includes Thanos and the Loki API.
96+
97+
98+
[id="extension-of-mcg"]
99+
== {amx-rhoai-mcg-pattern}
100+
101+
102+
// RHODS pattern description
103+
The basic {mcg-pattern} has been extended to highlight the *{intel-5th-gen-xeon-processors}* capabilities, offering developers a streamlined pathway to accelerate their workloads through the integration of cutting-edge *{intel-amx}*, providing efficiency and performance optimization in AI workloads.
104+
105+
The basic pattern has been extended with two components: Openshift AI and OpenVINO Toolkit Operator.
106+
107+
* Openshift AI, serves as a robust AI/ML platform for the creation of AI-driven applications and provides a collaborative environment for data scientists and developers that helps to move easily from experiment to production. It offers Jupyter application with selection of notebook servers, equipped with pre-configured environments and necessary support and optimizations (such as CUDA, PyTorch, Tensorflow, HabanaAI, etc.).
108+
109+
* OpenVINO Toolkit Operator manages OpenVINO components within Openshift environment. First one, OpenVINO™ Model Server (OVMS) is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The other component, that was used in the proposed pattern is Notebook resource. This element integrates Jupyter from OpenShift AI with a container image that includes developer tools from the OpenVINO toolkit. It also enables selecting a defined image OpenVINO™ Toolkit from the Jupyter Spawner choice list.
110+
111+
BERT-Large model is used as an example of AI workload using {intel-amx} in the pattern. The BERT-Large inference is running in the Jupyther Notebook that uses OpenVINO optimizations.
112+
113+
As a side note, BERT_Large is a wide known model used by various enterprise Natural Language Processing workloads. Intel has demonstrated, that *{intel-5th-gen-xeon-processors}* perform up to *1.49* times better in NLP flows on Red Hat OpenShift vs. prior generation of processors- read more:
114+
https://community.intel.com/t5/Blogs/Tech-Innovation/Data-Center/Level-Up-Your-NLP-applications-on-Red-Hat-OpenShift-and-5th-Gen/post/1572320[Level Up Your NLP applications on Red Hat OpenShift and 5th Gen]
115+
116+
[id="next-steps_mcg-index"]
117+
== Next steps
118+
119+
* link:mcg-amx-rhoai-getting-started[Deploy the management hub] using Helm.
120+
* Add a managed cluster to link:mcg-amx-rhoai-managed-cluster[deploy the managed cluster piece using ACM].
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: Intel AMX Demo
3+
weight: 30
4+
aliases: /multicloud-gitops-amx-rhoai/bert/
5+
---
6+
7+
include::modules/comm-attributes.adoc[]
8+
:toc:
9+
:imagesdir: /images
10+
:_content-type: REFERENCE
11+
12+
[id="demo-intro"]
13+
14+
== Introduction
15+
The {amx-rhoai-mcg-pattern} provides developers and data scientists with {rhoai} product that is fully configured and ready to go. It also helps to boost their workloads by integrating AMX, which ensures efficiency and performance optimization for AI workloads.
16+
17+
== AMX demo
18+
19+
Using {intel-5th-gen-xeon-processors} the kernel detects Intel® AMX at run-time, there is no need to enable and configure it additionally to improve performance. However, we need Intel optimized tools and frameworks to take advantage of AMX acceleration, such as OpenVINO Toolkit.
20+
21+
Before proceeding with this demo steps, please make sure you have your pattern deployed with link:../mcg-amx-rhoai-getting-started[Getting started].
22+
23+
. Verify if the *openvino-notebooks-v2022.3-1* build is completed under *Builds* > *Builds*. Build might take some time and before it is finished it won't be accesible from Openshift AI console.
24+
25+
. Open Openhsift AI dashboard and go to *Applications* > *Enabled* window.
26+
27+
. Open Jupyter by clicking *Launch application*.
28+
29+
. Choose *OpenVINO™ Toolkit v2022.3* notebook image with *X Large* container size and start the notebook server. Server launching will take several minutes. Once it is ready, go to access notebook server.
30+
31+
. On the Jupyter Launcher window choose Notebook with *Python 3 (ipykernel)*.
32+
33+
. Download BERT-large example, that uses AMX accelerator, by typing in the opened notebook:
34+
[source,terminal]
35+
!wget https://raw.githubusercontent.com/validatedpatterns-sandbox/amx-accelerated-rhoai-multicloud-gitops/main/scripts/BERT.ipynb -O BERT.ipynb
36+
37+
38+
. On the left-hand side menu the `BERT.ipynb` script should show up. Open it and run instructions one by one with play button or with `Ctr+Enter` from keyboard.
39+
40+
All necessary tools like https://docs.openvino.ai/2022.3/omz_tools_downloader.html#installation[*Model Downloader*] and https://docs.openvino.ai/2022.3/openvino_inference_engine_tools_benchmark_tool_README.html#doxid-openvino-inference-engine-tools-benchmark-tool-r-e-a-d-m-e[*Benchmark Python Tool*] are built in and ready to use.
41+
42+
43+
===== Description of `BERT.ipynb`
44+
In case of issues with downloading the script, you can copy the following steps into your notebook and run.
45+
[source,terminal]
46+
----
47+
%env ONEDNN_VERBOSE=1
48+
----
49+
50+
Download the BERT-Large model compatible with FP32&BF16 precision https://docs.openvino.ai/2022.3/omz_models_model_bert_large_uncased_whole_word_masking_squad_0001.html[bert-large-uncased-whole-word-masking-squad-0001]:
51+
[source,terminal]
52+
----
53+
!omz_downloader --name bert-large-uncased-whole-word-masking-squad-0001
54+
----
55+
56+
Go to the directory with downloaded model and run the benchmark tool with parameter *infer_precision bf16* to use BF16 precision:
57+
[source,terminal]
58+
----
59+
%cd /opt/app-root/src/intel/bert-large-uncased-whole-word-masking-squad-0001/FP32/
60+
61+
!benchmark_app -m bert-large-uncased-whole-word-masking-squad-0001.xml -infer_precision bf16
62+
----
63+
64+
In ONEDNN verbose you should see *`avx_512_core_amx`* entry, what confirms that AMX instructions are being used.
65+
66+
.BERT inference log
67+
image::multicloud-gitops-amx-rhoai/amx-rhoai-bert-logs.png[Logs from amx-app pod]
68+
69+
70+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Cluster sizing
3+
weight: 60
4+
aliases: /multicloud-gitops-amx/mcg-amx-cluster-sizing/
5+
---
6+
7+
include::modules/comm-attributes.adoc[]
8+
9+
:toc:
10+
:imagesdir: /images
11+
:_content-type: ASSEMBLY
12+
13+
[id="about-openshift-cluster-sizing-mcg"]
14+
== About OpenShift cluster sizing for the {amx-mcg-pattern}
15+
16+
The minimum requirements for an {ocp} cluster depend on your installation platform, for example:
17+
18+
* For AWS, see link:https://docs.openshift.com/container-platform/4.13/installing/installing_aws/preparing-to-install-on-aws.html#requirements-for-installing-ocp-on-aws[Installing {ocp} on AWS].
19+
20+
* For bare-metal, see link:https://docs.openshift.com/container-platform/4.13/installing/installing_bare_metal/installing-bare-metal.html#installation-minimum-resource-requirements_installing-bare-metal[Installing {ocp} on bare metal].
21+
22+
To understand cluster sizing requirements for the {amx-rhoai-mcg-pattern}, consider the following components that the pattern deploys on the datacenter or the hub OpenShift cluster:
23+
24+
|===
25+
| Name | Kind | Namespace | Description
26+
27+
| multicloud-gitops-amx-rhoai-hub
28+
| Application
29+
| multicloud-gitops-amx-rhoai-hub
30+
| Hub GitOps management
31+
32+
| Red Hat Advanced Cluster Management
33+
| Operator
34+
| open-cluster-management
35+
| Advance Cluster Management
36+
37+
| Red Hat OpenShift GitOps
38+
| Operator
39+
| openshift-operators
40+
| OpenShift GitOps
41+
42+
| Node Feature Discovery
43+
| Operator
44+
| openshift-nfd
45+
| Manages the detection and labeling of hardware features and configuration (for example {intel-amx})
46+
47+
| Red Hat OpenShift Data Foundation
48+
| Operator
49+
| openshift-storage
50+
| Cloud Native storage solution
51+
|===
52+
53+
The {amx-rhoai-mcg-pattern} also includes the Red Hat Advanced Cluster Management (RHACM) supporting operator that is installed by OpenShift GitOps using Argo CD.
54+
55+
[id="mcg-openshift-datacenter-hub-cluster-size"]
56+
== {amx-rhoai-mcg-pattern} with OpenShift clusters sizes
57+
58+
The datacenter hub OpenShift cluster needs to be a bit bigger than the Factory/Edge clusters because this is where the developers will be running pipelines to build and deploy the {amx-rhoai-mcg-pattern} on the cluster. The above cluster sizing is close to a minimum size for a Datacenter HUB cluster. In the next few sections we take some snapshots of the cluster utilization while the {amx-rhoai-mcg-pattern} is running. Keep in mind that resources will have to be added as more developers are working building their applications.
59+
60+
The recommended clusters sizes for datacenter hub and for managed datacenter are the same in this case:
61+
62+
include::modules/intel-recommended-cluster-sizing-5th-gen-amx.adoc[]

0 commit comments

Comments
 (0)