diff --git a/concepts/NVIDIA-Operator.xml b/concepts/NVIDIA-Operator.xml index 7b9178df2..c53e544ae 100644 --- a/concepts/NVIDIA-Operator.xml +++ b/concepts/NVIDIA-Operator.xml @@ -1,103 +1,96 @@ - + - - %entities; ]> + + %entities; +]> + xmlns="http://docbook.org/ns/docbook" version="5.2" + xmlns:its="http://www.w3.org/2005/11/its" + xmlns:xi="http://www.w3.org/2001/XInclude" + xmlns:xlink="http://www.w3.org/1999/xlink" + xmlns:trans="http://docbook.org/ns/transclusion"> Introduction to the &nvoperator; - This article explains the &nvoperator;, outlines the &nvidia; GPU - components it manages, and summarizes the benefits of using it. + This article explains the &nvoperator;. It outlines the &nvidia; GPU + components it manages. It also summarizes the benefits of using it.
What is the &nvoperator;? - The &nvoperator; is a &kube; operator that simplifies the management and - deployment of &nvidia; GPU resources in a &kube; cluster. It automates the - configuration and monitoring of &nvidia; GPU drivers, as well as - associated components like CUDA, container runtimes, and other GPU-related - software. + The &nvoperator; is a &kube; operator. It simplifies management and + deployment of &nvidia; GPUs. These resources are in a &kube; cluster. The + operator automates configuration and monitoring of &nvidia; GPU drivers. + It also manages associated components. These include CUDA, container + runtimes, and other GPU-related software.
- How does the &nvoperator; work? + How the &nvoperator; works The &nvoperator; follows this workflow: - Operator deployment. The &nvidia; - Operator is deployed as a &helm; chart or using &kube; manifests. + Operator deployment. You can deploy + the &nvoperator; as a &helm; chart. You can also use &kube; manifests. - Node labeling & GPU discovery. - Once installed, the operator deploys the GPU Feature - Discovery (GFD) daemon, which scans the hardware on each - node for &nvidia; GPUs. It labels nodes with GPU-specific information, - making it easier for &kube; to schedule GPU workloads based on - available hardware. + Node labeling and GPU discovery. Once + installed, the operator deploys the GPU Feature Discovery (GFD) + daemon. GFD scans node hardware for &nvidia; GPUs. It labels nodes + with GPU-specific information. This makes it easier for &kube; to + schedule GPU workloads. Scheduling is based on available hardware. - - - - NVIDIA Container Toolkit + &nvidia; Container Toolkit configuration. The GPU operator installs and configures the - &nvidia; Container Toolkit, which allows GPU-accelerated containers to - run in &kube;. + &nvidia; Container Toolkit. The toolkit allows GPU-accelerated + containers to run in &kube;. CUDA runtime and libraries. The - operator ensures that the CUDA toolkit is properly installed, making - it easier for applications requiring CUDA to work seamlessly without - manual intervention. + operator ensures the CUDA toolkit is properly installed. This makes it + easier for applications requiring CUDA to work. Applications work + seamlessly without manual intervention. Validation and health monitoring. - After setting up the environment, the operator continuously monitors - the health of the GPU resources. It also exposes health metrics for - administrators to view and use for decision-making. + After setup, the operator monitors the health of GPU resources. It + also exposes health metrics. Administrators can use these metrics for + decision-making. - Scheduling GPU workloads. Once the - environment is configured, you can install workloads that require GPU - acceleration. &kube; will use the node labels and available GPU - resources to schedule these jobs on GPU-enabled nodes automatically. + Scheduling GPU workloads. Once + configured, you can install workloads that require GPU acceleration. + &kube; uses node labels and available GPU resources. This automates + scheduling these jobs on GPU-enabled nodes.
- Benefits of using the &nvoperator; + Benefits of the &nvoperator; - Using the &nvoperator; has the following key benefits: + Using the &nvoperator; has these key benefits: @@ -108,20 +101,21 @@ xmlns:trans="http://docbook.org/ns/transclusion"> - Cluster-wide management. Works across - the entire &kube; cluster, scaling with node additions or removals. + Cluster-wide management. It works + across the entire &kube; cluster. It scales with node additions or + removals. - Simplified updates. Automates + Simplified updates. It automates updates for GPU-related components. - Optimized GPU usage. Ensures that GPU - resources are efficiently allocated and used. + Optimized GPU usage. It ensures + efficient allocation and use of GPU resources. diff --git a/tasks/keylime-enable-ima-tracking.xml b/tasks/keylime-enable-ima-tracking.xml index b6a8803a8..ff17bc4de 100644 --- a/tasks/keylime-enable-ima-tracking.xml +++ b/tasks/keylime-enable-ima-tracking.xml @@ -16,37 +16,39 @@ xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:trans="http://docbook.org/ns/transclusion"> - Enabling IMA tracking for &keylime; + Enabling IMA tracking - &keylime; is a remote attestation solution that enables you to monitor - the health of remote nodes. Integrity management - architecture (IMA) is a kernel integrity subsystem that - provides a means of detecting malicious changes to files. + &keylime; is a remote attestation solution. It enables you to + monitor the health of remote nodes. Integrity Measurement + Architecture (IMA) is a kernel integrity subsystem. It provides a + means of detecting malicious changes to files. - When using IMA, the kernel calculates a hash of accessed files. The hash is - then used to extend the PCR 10 in the TPM and also log a list of accessed - files. The verifier can request a signed quote to the agent for PCR 10 to - get the logs of all accessed files including the file hashes. Verifiers - then compare the accessed files with a local allowlist of approved files. - If any of the hashes are not recognized, the system is considered unsafe, - and a revocation event is triggered. + When using IMA, the kernel calculates a hash of accessed files. The + kernel then uses the hash to extend the Platform Configuration + Register (PCR) 10 in the Trusted Platform Module (TPM). It also logs a + list of accessed files. The verifier can request a signed quote to + the agent for PCR 10. This retrieves the logs of all accessed files, + including the file hashes. Verifiers then compare the accessed files + with a local allowlist of approved files. If the system does not + recognize any of the hashes, it considers the system unsafe. A + revocation event is then triggered. - Before &keylime; can collect information, IMA/EVM needs to be enabled. To + Before &keylime; can collect information, you must enable IMA/EVM. To enable the process, boot a kernel of the agent with the ima_appraise=log and ima_policy=tcb - parameters: + parameters. Update the option with the - parameters in /etc/default/grub: + parameters in /etc/default/grub. GRUB_CMDLINE_LINUX_DEFAULT="ima_appraise=log ima_policy=tcb" @@ -58,23 +60,21 @@ - Reboot your system. + Reboot the system. - The procedure above uses the default kernel IMA policy. To avoid monitoring - too many files and therefore creating long logs, create a new custom - policy. Find more details in the - &keylime; - documentation. + The procedure above uses the default kernel IMA policy. To avoid + monitoring too many files and creating long logs, create a new custom + policy. Find more details in the &keylime; documentation. - To indicate the expected hashes, use the option - of the keylime_tenant command when registering the - agent. To view the excluded or ignored files, use the - option of the keylime_tenant - command: + To indicate the expected hashes, use the + option of the keylime_tenant command when + registering the agent. To view the excluded or ignored files, use the + option of the + keylime_tenant command. &prompt.root;keylime_tenant --allowlist -v 127.0.0.1 \ diff --git a/tasks/keylime-run-with-podman.xml b/tasks/keylime-run-with-podman.xml index cc9f7da56..e06b3ac7a 100644 --- a/tasks/keylime-run-with-podman.xml +++ b/tasks/keylime-run-with-podman.xml @@ -20,25 +20,23 @@ Running the &keylime; workload using &podman; - &keylime; is a remote attestation solution that enables you to monitor - the health of remote nodes. The verifier and - registrar are essential components of &keylime; on - remote systems to perform the registration and attestation of &keylime; - agents. + &keylime; is a remote attestation solution. It helps you monitor the + health of remote nodes. The verifier and registrar are essential + &keylime; components. You use these components on remote systems to + register and attest &keylime; agents. - The container described in this article delivers control plane services - verifier and registrar and a - tenant command-line tool (CLI) that are part of the - &keylime; project. + This document describes a container. The container delivers &keylime; + control plane services. These services include the verifier, registrar, + and tenant command-line interface (CLI). - Before you start installing and registering agents, prepare the verifier - and the registrar on remote hosts, as described in the following procedure. + Before installing and registering agents, prepare the verifier and registrar + on remote hosts. The following procedure describes how to prepare them. @@ -60,53 +58,51 @@ registry.opensuse.org/devel/microos/containers/containerfile/opensuse/keylime-co - Create the keylime-control-plane volume to - persist the database and certificates required during the attestation - process. + Create the keylime-control-plane volume. This + persists the database and certificates for attestation. &prompt.root;podman container runlabel install \ registry.opensuse.org/devel/microos/containers/containerfile/opensuse/keylime-control-plane:latest - Start the container and related services. + Start the container and its services. &prompt.root;podman container runlabel run \ registry.opensuse.org/devel/microos/containers/containerfile/opensuse/keylime-control-plane:latest - The keylime-control-plane container is - created. It includes configured and running registrar and verifier - services. Internally, the container exposes ports 8881, 8890 and 8891 - to the host using the default values. Validate the firewall - configuration to allow access to the ports and to allow communication - between containers, because the tenant CLI requires it. + This creates the keylime-control-plane container. The + container includes configured and running registrar and verifier + services. It internally exposes ports 8881, 8890, and 8891 to the host. + These ports use default values. Validate your firewall configuration. + Ensure it allows access to these ports and communication between + containers. The tenant CLI requires this communication. - If you need to stop &keylime; services, run the following command: + To stop the &keylime; services, run this command: &prompt.root;podman kill keylime-control-plane-container
Monitoring &keylime; services - To get the status of running containers on the host, run the following - command: + To check the status of running containers on the host, run this command: &prompt.root;podman ps - To view the logs of &keylime; services, run the following command: + To view the &keylime; service logs, run this command: &prompt.root;podman logs keylime-control-plane-container
Executing the tenant CLI - The tenant CLI tool is included in the container, and if the host - firewall does not interfere with the ports exposed by &keylime; services, - you can execute it using the same image, for example: + The container includes the tenant CLI tool. If the host firewall does not + interfere with the exposed ports, you can execute the CLI using the same + image: &prompt.root;podman run --rm \ -v keylime-control-plane-volume:/var/lib/keylime/ \ @@ -116,9 +112,9 @@ keylime_tenant -v 10.88.0.1 -r 10.88.0.1 --cert default -c reglist Extracting the &keylime; certificate - The first time that the &keylime; container is executed, its services - create a certificate required by several agents. You need to extract the - certificate from the container and copy it to the agent's + When you first execute the &keylime; container, its services create a + certificate. Several agents require this certificate. Extract the + certificate from the container. Then, copy it to the agent's /var/lib/keylime/cv_ca/ directory. &prompt.root;podman cp \ @@ -128,7 +124,7 @@ keylime-control-plane-container:/var/lib/keylime/cv_ca/cacert.crt AGENT_HOST:/var/lib/keylime/cv_ca/ - Find more details about installing the agent in + For more details about installing the agent, see . diff --git a/tasks/klp-troubleshoot.xml b/tasks/klp-troubleshoot.xml index 20df42c6c..f7719ae6b 100644 --- a/tasks/klp-troubleshoot.xml +++ b/tasks/klp-troubleshoot.xml @@ -16,67 +16,61 @@
- Checking expiration date of the live patch + Checking the live patch expiration date - Make sure that the - lifecycle-data-sle-module-live-patching is installed, - then run the zypper lifecycle command. You should see - expiration dates for live patches in the Package end of support - if different from product section of the output. + Ensure the lifecycle-data-sle-module-live-patching + package is installed. Then, run the zypper lifecycle + command. The output shows expiration dates for live patches in the + Package end of support if different from product section. - Every live patch receives updates for 13 months from the release of the - underlying kernel package. The - Maintained - kernels, patch updates and lifecycle page allows you to check - expiration dates based on the running kernel version without installing - the product extension. + Every live patch receives updates for 13 months from the release of the + underlying kernel package. The Maintained kernels, patch updates, and + lifecycle page lets you check expiration dates. Use this page to check dates + based on the running kernel version without installing the product + extension: +
- Checking what kernel live patch packages are installed + Checking installed kernel live patch packages - The kernel is live-patched if a kernel-livepatch-* - package has been installed for the running kernel. You can use the command - zypper se --details kernel-livepatch-* to check what - kernel live patch packages are installed on your system. + A kernel-livepatch-* package is installed for the + running kernel if the kernel is live-patched. Use the + zypper se --details kernel-livepatch-* command to check + which kernel live patch packages are installed.
- Preventing reboot + Preventing a reboot - When the kernel-default package is installed, the - update manager prompts you to reboot the system. To prevent this message - from appearing, you can filter out kernel updates from the patching - operation. This can be done by adding package locks with Zypper. &susemgr; - also makes it possible to filter channel contents (see - Live - Patching with SUSE Manager). + When you install the kernel-default package, the update + manager prompts you to reboot the system. To prevent this, filter kernel + updates from the patching operation. You can do this by adding package locks + with Zypper. &susemgr; also lets you filter channel contents. See Live + Patching with SUSE Manager for more information: +
- Check patching status + Checking the patching status - You can check patching status using the klp status - command. To examine installed patches, run the klp -v - patches command. + Check patching status using the klp status command. To + examine installed patches, run klp -v patches.
Downgrading a kernel patch - If you find the latest live patch problematic, you can downgrade the - currently installed live patch back to its previous version. Keep in mind - that a system with kernel warnings or kernel error traces in the system - log may not be suitable for the patch downgrade procedure. If you are - unsure whether the system meets the requirements for a patch downgrade, - contact SUSE Technical Support for help. + If the latest live patch is problematic, downgrade it to the previous + version. A system with kernel warnings or kernel error traces in the system + log may not be suitable for downgrading. If you are unsure whether your + system is suitable for a patch downgrade, contact SUSE Technical Support. - To downgrade the latest kernel live patch, use the klp - downgrade command. This command automatically detects the - version of the latest live patch and installs the preceding one. + To downgrade the latest kernel live patch, use the + klp downgrade command. This command automatically detects + the latest live patch version and installs the previous one.