SUSE
diff --git a/‎xml/MAIN-SBP-AMD-EPYC-4-SLES15SP4.xml‎
Lines changed: 22 additions & 22 deletions b/‎xml/MAIN-SBP-AMD-EPYC-4-SLES15SP4.xml‎
Lines changed: 22 additions & 22 deletions
@@ -214,7 +214,7 @@
       EPYC Processors, there are several micro-architectural differences. The <emphasis
         role="italic">Instructions Per Cycle (IPC)</emphasis> has improved by 13% on average across
       a selected range of workloads, although the exact improvement is workload-dependent. The
-      improvements are due to a variety of factors including a larger L2 cache, improvements in
+      improvements result from a variety of factors including a larger L2 cache, improvements in
       branch prediction, the execution engine, the front-end fetching/decoding of instructions and
       additional instructions such as supporting AVX-512. The degree to which these changes affect
       performance varies between applications.</para>
@@ -323,7 +323,7 @@ node   0   1
         (TDPs)</emphasis> differ for the AMD EPYC 9004 Series Dense processor, with different
       frequency scaling limits and generally a lower peak frequency. While each individual core may
       achieve less peak performance than the AMD EPYC 9004 Series Classic Processor, the total peak
-      compute throughput available is higher due to the increased number of cores.</para>
+      compute throughput available is higher because of the increased number of cores.</para>
 
     <para>The intended use case and workloads determine which processor is superior. The key
       advantage of the AMD EPYC 9004 Series Dense Processor is packing more cores within the same
@@ -1699,11 +1699,11 @@ epyc:~ # perf script
       9004 Series Dense, the most important task is to set expectations. While super-linear scaling
       is possible, it should not be expected. It may be possible to achieve super-linear scaling in
       Cloud Environments for the number of instances hosted without performance loss if individual
-      containers or virtual machines are not utilising 100% of CPU. However, it should be planned
+      containers or virtual machines are not utilizing 100% of CPU. However, it should be planned
       carefully and tested. This would be particularly true in cases where multiple instances are
       hosted that have different times of day or year for active phases. The normal expectation is a
-      best case of 33% gain for CPU-intensive workloads due to the increased number of cores. But
-      sub-linear scaling is common due to resource contention. Contention between SMT siblings,
+      best case of 33% gain for CPU-intensive workloads because of the increased number of cores. But
+      sub-linear scaling is common because of resource contention. Contention between SMT siblings,
       memory bandwidth, memory availability, memory interconnects, thread communication overhead or
       peripheral devices may prevent perfect linear scaling even for perfectly parallelized
       applications. Similarly, not all applications can scale perfectly. It is possible for
@@ -1845,7 +1845,7 @@ epyc:~ # perf script
       <sect3 xml:id="sec-allocating-resources-hostos-kvm">
         <title>Reserving CPUs and memory for the host on KVM</title>
 
-        <para> When using KVM, sparing, for example, 24 cores (i.e., one full core for each CCX on both NUMA nodes) and 64 GB of RAM for the host OS is
+        <para> When using KVM, sparing, for example, 24 cores (that is one full core for each CCX on both NUMA nodes) and 64 GB of RAM for the host OS is
           done by stopping creating VMs when the total number of vCPUs of all VMs has reached 336
           (as each core has 2 threads) and when the total cumulative amount of allocated RAM has
           reached 690 GB. </para>
@@ -2017,7 +2017,7 @@ epyc:~ # perf script
       <title>Secure Encrypted Virtualization (SEV)</title>
 
       <para> Secure Encrypted Virtualization (SEV) and SEV with Encrypted State (SEV-ES) technologies, are available on the 5th Generation AMD
-        EPYC Processors when SUSE Linux Enterprise Server 15 SP4 is used both as an host and guest OS. They allow the memory of the VMs to be encrypted, enabling an high level of
+        EPYC Processors when SUSE Linux Enterprise Server 15 SP4 is used both as a host and guest OS. They allow the memory of the VMs to be encrypted, enabling a high level of
         confidentiality. SEV-ES is considered superior to plain SEV, as also CPU registers are encrypted when they are saved into the host memory.
         For using SEV-ES for a VM, we need to enable it in the VM's own configuration file, but there are preparation steps that needs to occur at
         the host level. </para>
@@ -2178,7 +2178,7 @@ Total                          346060.48       346246.34       692306.82
         are where the mapping between vCPUs and pCPUs is established (<parameter>vcpu</parameter>
         being the vCPU ID and <parameter>cpuset</parameter> being either one or a list of pCPU IDs). </para>
 
-      <para> In order to be able to create VMs with more than 255 vCPUs,
+      <para> To be able to create VMs with more than 255 vCPUs,
         the following element should be added in the
           <parameter>&lt;device></parameter> section:</para>
 
@@ -2262,7 +2262,7 @@ node   0   1
         <para>As said already, full cores must always be used. If possible always fully use CCXes/dies too.
           Since each die has 16 CPUs, that means that a VM with 288 vCPUs will use 9 CCXes on each of the 2 nodes
           (as 9 x 16 x 2 is indeed 288).
-          So, for instance, vCPUs 0 to 15 can be assigned to Cores L#0 to L#7 (and hence to CPUs P#0 to P#7 and P#192 to P#199), on node P#0;
+          So, for example, vCPUs 0 to 15 can be assigned to Cores L#0 to L#7 (and hence to CPUs P#0 to P#7 and P#192 to P#199), on node P#0;
           vCPUs 16 to 31 to Cores L#$8 to L#15, and so on. No vCPU will be pinned, on the other hand, on Cores L#72 to L#95.
           And the same on node P#1.
           In fact, this is what we call coherent 1-to-1 mapping between virtual and physical topologies.</para>
@@ -2323,7 +2323,7 @@ node   0
           them, respectively. And this is all possible without the need for any of the VMs to span
           more than one host NUMA node. </para>
 
-        <para>Of course, in all these cases, VMs are sharing dies, which means they will interfere among each others via the L3 caches.
+        <para>Of course, in all these cases, VMs are sharing dies, which means they will interfere with each other via the L3 caches.
           This may or may not be a problem, but there is no way around it, as soon as more VMs than the number of available dies are necessary.
           The performance impact of such sharing should be tolerable, in most cases, for these configurations, but this needs to be assessed with tests and benchmarks.
           If that is not the case, then the recommendation is to not go above 24 VMs.
@@ -2397,8 +2397,8 @@ node   0
       <bridgehead>CPU Oversubscription</bridgehead>
 
       <para> CPU oversubscription happens when the total cumulative number of vCPUs from all VMs
-        becomes higher than 384. Such situation unavoidably introduces latencies, and leads to lower
-        performance than when host resources are just enough. It is, however, impossible to tell a
+        becomes higher than 384. Such a situation inevitably introduces latencies, resulting in lower 
+        performance compared to when host resources are sufficient. It is, however, impossible to tell a
         priori by what extent this happens, at least not without a detailed knowledge of the actual
         workload. </para>
 
@@ -2498,7 +2498,7 @@ node   0
 
         <para> The <parameter>&lt;topology></parameter> element specifies the CPU
           characteristics. In this case, we are creating vCPUs which will be seen by the guest OS as
-          being arranged in 2 sockets, each of which has 9 dies, each of which has 8 cores with 2 threads (i.e., 16 CPUs).
+          being arranged in 2 sockets, each of which has 9 dies, each of which has 8 cores with 2 threads (that is 16 CPUs).
           And this is how we match, for the one big VM, the topology of the host. </para>
 
         <para> Each <parameter>&lt;cell></parameter> element defines one virtual NUMA node,
@@ -2605,7 +2605,7 @@ vm1:~ # cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
 0-15
         </screen>
 
-        <para>Note that, in order to achieve the outcome shown above, it is very important to use the following CPU topology description string, in the VM configuration:</para>
+        <para>Note that, to achieve the outcome shown above, it is very important to use the following CPU topology description string, in the VM configuration:</para>
 
         <screen>&lt;topology sockets="2" dies="9" cores="8" threads="2"/></screen>
 
@@ -2715,7 +2715,7 @@ vm1:~ # modprobe cpuidle-haltpoll
 &lt;/memoryBacking>
         </screen>
 
-        <para>In order to verify that the appropriate type of memory is being used by the VMs,
+        <para>To verify that the appropriate type of memory is being used by the VMs,
           one can check the content of <parameter>/proc/meminfo</parameter>, with the VMs running,
           and observe that all the pre-allocated Huge Pages are actually occupied.</para>
 
@@ -2878,21 +2878,21 @@ dmesg | grep SEV
         </mediaobject>
       </figure>
 
-      <para>The single thread results are basically identical between baremetal (blue rectangles) and inside of the VM (orange rectangles), for
+      <para>The single thread results identical between baremetal (blue rectangles) and inside of the VM (orange rectangles), for
         all the operations (<parameter>Copy</parameter>, <parameter>Scale</parameter>, <parameter>Add</parameter> and
         <parameter>Triadd</parameter>) of the benchmark.
       </para>
 
       <para> About the parallel case, remember that the VM is slightly "smaller"" than the host, in terms of number of CPUs. Therefore,
         we cannot run the parallel version of STREAM with as many thread as on the host. In fact, a very good result is reached, on the host, with twice as many threads as there are
-        Least Level Caches (LLCs), i.e., 48. If we do the same inside of the VM, there it means 36 threads.
+        Least Level Caches (LLCs), that is 48. If we do the same inside of the VM, there it means 36 threads.
         We can see, however, that the performance reached inside of the VM, even with 36 threads (dark red rectangles, ~616 GBytes/sec for the <parameter>Copy</parameter> operation)
         is close enough to the values achieved on the host with 48 threads (yellow rectangles, ~640 GBytes/sec for the <parameter>Copy</parameter> operation).
         For completeness, we also run the benchmark on the host with 36 threads (green rectangles, ~605 Gbytes/sec for the <parameter>Copy</parameter> operation); and as we
         could have expectd, the results of such baremetal runs and of the VM runs are very close.</para>
 
       <para> This clearly shows how proper tuning allows a single VM running on an AMD EPYC 7004 Series
-        Processor server to achieve a memory bandwidth performance that basically matches the one that we can reach directly on the host. </para>
+        Processor server to achieve a memory bandwidth performance that matches the one that we can reach directly on the host. </para>
 
       <note>
         <para>Inside of the VM, the STREAM benchmark was configured almost identically to what has been
@@ -2907,8 +2907,8 @@ dmesg | grep SEV
         is selected. In fact, since using that model builds a VM with only 2 LLCs (and also to other problems with the cache topology),
         running STREAM with twice as many threads as there are LLCs in the system, results in the benchmark spawning only 4 of them (yellow and green rectangles).
         And this, of course, dramatically reduces the performance. We can also see that, if we instead manually set the number of threads
-        to the value that we know to be the best for this VM (i.e., 36, dark red and cyan rectangles) performance are restored to how good
-        we know things can be from <xref linkend="fig-stream-bm-vm"/> (and from the <parameter>cpumodel</parameter> results, i.e. the blue and orange rectangles).</para>
+        to the value that we know to be the best for this VM (that is 36, dark red and cyan rectangles) performance are restored to how good
+        we know things can be from <xref linkend="fig-stream-bm-vm"/> (and from the <parameter>cpumodel</parameter> results, see the blue and orange rectangles).</para>
 
       <para>Actually, the cyan rectangles represent the absolute best result (~620 GBytes/sec for the <parameter>Copy</parameter> operation), probably thanks to the fact that the CPUIdle
         <parameter>haltpoll</parameter> governor is the most effective when coupled with <parameter>cpupassthrough</parameter>. However, the orange rectangles follow very closely
@@ -2990,10 +2990,10 @@ dmesg | grep SEV
         </mediaobject>
       </figure>
 
-      <para>We see in <xref linkend="fig-stream-single-vms-avg"/> how the single STREAM bandwidth stays basically flat until (and including when) 6 VMs are used.
+      <para>We see in <xref linkend="fig-stream-single-vms-avg"/> how the single STREAM bandwidth stays flat until (and including when) 6 VMs are used.
         Then it starts to decline a bit, as more VMs are packed on the NUMA nodes and, hence, compete for the bandwith of the memory controllers. However,
          <xref linkend="fig-stream-single-vms-sum"/> reminds us how the total bandwidth achieved, if we consider all the VMs involved in each experiments, actually goes up,
-        until it reaches the same level that we know (e.g., from <xref linkend="fig-stream-bm-vm"/>) it can touch.
+        until it reaches the same level that we know (for example, from <xref linkend="fig-stream-bm-vm"/>) it can touch.
       </para>
 
       <para><xref linkend="fig-stream-omp-vms-avg"/> and <xref linkend="fig-stream-omp-vms-sum"/> show the same, but when the parallel (via OpenMP) version of STREAM