Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions content/en/docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,58 @@ job.batch/volcano-admission-init 1/1 28s 6m10s
```

After the configuration is complete, you can use Volcano to deploy the AI/ML and big data workloads.


## Resource Requirements

The resources requested for Volcano pods can be customized as follows:

```yaml
# container resources
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 2
memory: 2Gi
```

The resource quotas of the volcano-admission component are related to the cluster scale. For details, see Table 1.

Table 1 Recommended requested resources and resource limits for volcano-admission

| Cluster Scale | CPU Request (m) | CPU Limit (m) | Memory Request (Mi) | Memory Limit (Mi) |
| ------------------ | --------------- | ------------- | ------------------- | ----------------- |
| 50 nodes | 200 | 500 | 500 | 500 |
| 200 nodes | 500 | 1000 | 1000 | 2000 |
| 1000 or more nodes | 1500 | 2500 | 3000 | 4000 |

The resource quotas of volcano-controller and volcano-scheduler are related to the number of cluster nodes and pods. The recommended values are as follows:

- If the number of nodes is less than 100, retain the default configuration. The requested CPUs are 500m, and the limit is 2000m. The requested memory is 500Mi, and the limit is 2000 Mi.
- If the number of nodes is greater than 100, increase the requested CPUs by 500m and the requested memory by 1000 MiB each time 100 nodes (10,000 pods) are added. Increase the CPU limit by 1500m relative to the CPU request, and increase the memory limit by 1000Mi relative to the memory request.

Recommended formula for calculating the requested value:

- Requested CPUs: Calculate the number of target nodes multiplied by the number of target pods, perform interpolation search based on the number of nodes in the cluster multiplied by the number of target pods in Table 2, and round up the request value and limit value that are closest to the specifications.

For example, for 2000 nodes and 20,000 pods, Number of target nodes x Number of target pods = 40 million, which is close to the specification of 700/70,000 (Number of cluster nodes x Number of pods = 49 million). According to the following table, set the requested CPUs to 4000m and the limit value to 5500m.
Comment on lines +154 to +156

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explanation for calculating the requested CPU is a bit complex and could be hard to follow. Phrases like "perform interpolation search" and "round up the request value and limit value that are closest to the specifications" could be simplified for better readability.

Consider rephrasing to be more direct, for example:
"To determine the CPU request, first calculate the product of the target number of nodes and pods. Then, find the row in Table 2 where the product of 'Nodes' and 'Pods' is the smallest value that is greater than or equal to your calculated product. Use the CPU request and limit from that row."


- Requested memory: It is recommended that 2.4 GiB memory be allocated to every 1000 nodes and 1 GiB memory be allocated to every 10,000 pods. The requested memory is the sum of these two values. (The obtained value may be different from the recommended value in Table 2. You can use either of them.)

Requested memory = Number of target nodes/1000 * 2.4 GiB + Number of target pods/10,000 * 1 GiB. For example, for 2000 nodes and 20,000 pods, the requested memory is 6.8 GiB (2000/1000 * 2.4 GiB + 20,000/10,000 * 1 GiB).

Table 2 Recommended requested resources and resource limits for volcano-controller and volcano-scheduler

| Nodes/Pods in a Cluster | CPU Request (m) | CPU Limit (m) | Memory Request (Mi) | Memory Limit (Mi) |
| ----------------------- | --------------- | ------------- | ------------------- | ----------------- |
| 50/5000 | 500 | 2000 | 500 | 2000 |
| 100/10000 | 1000 | 2500 | 1500 | 2500 |
| 200/20000 | 1500 | 3000 | 2500 | 3500 |
| 300/30000 | 2000 | 3500 | 3500 | 4500 |
| 400/40000 | 2500 | 4000 | 4500 | 5500 |
| 500/50000 | 3000 | 4500 | 5500 | 6500 |
| 600/60000 | 3500 | 5000 | 6500 | 7500 |
| 700/70000 | 4000 | 5500 | 7500 | 8500 |

55 changes: 55 additions & 0 deletions content/zh/docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,58 @@ job.batch/volcano-admission-init 1/1 28s 6m10s
```

一切配置就绪,您可以开始使用 Volcano 部署 AI/ML 和大数据负载了。现在您已经完成了 Volcano 的全部安装,您可以运行如下的例子测试安装的正确性:[样例](https://github.com/volcano-sh/volcano/tree/master/example)


## 资源要求

Volcano pod请求的资源可以按如下方式自定义:

```yaml
# container resources
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 2
memory: 2Gi
```

其中volcano-admission组件的资源配额设置与集群规模有关,参见表1。

表1 volcano-admission的建议值

| 集群规模 | CPU Request(m) | CPU Limit(m) | Memory Request(Mi) | Memory Limit(Mi) |
| -------------- | -------------- | ------------ | ------------------ | ---------------- |
| 50节点 | 200 | 500 | 500 | 500 |
| 200节点 | 500 | 1000 | 1000 | 2000 |
| 1000节点及以上 | 1500 | 2500 | 3000 | 4000 |

volcano-controller和volcano-scheduler组件的资源配额设置与集群节点和Pod规模相关,其建议值如下:

- 小于100个节点,可使用默认配置,即CPU的申请值为500m,限制值为2000m;内存的申请值为500Mi,限制值为2000Mi。
- 高于100个节点,每增加100个节点(10000个Pod),建议CPU的申请值增加500m,内存的申请值增加1000Mi;CPU的限制值建议比申请值多1500m,内存的限制值建议比申请值多1000Mi。

申请值推荐计算公式:

- CPU申请值:计算“目标节点数 * 目标Pod规模”的值,并在表2中根据“集群节点数 * Pod规模”的计算值进行插值查找,向上取最接近规格的申请值及限制值。

例如2000节点和2w个Pod的场景下,“目标节点数 * 目标Pod规模”等于4000w,向上取最接近的规格为700/7w(“集群节点数 * Pod规模”等于4900w),因此建议CPU申请值为4000m,限制值为5500m。
Comment on lines +159 to +161

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explanation for calculating the requested CPU is a bit complex and could be hard to follow. Phrases like "插值查找" (interpolation search) and "向上取最接近规格" (round up to the closest spec) could be simplified for better readability.

Consider rephrasing to be more direct, for example:
"要确定CPU申请值,首先计算目标节点数和Pod数的乘积。然后,在表2中找到“集群节点数/Pod规模”乘积大于且最接近目标乘积的那一行,并使用该行的CPU申请值和限制值。"


- 内存申请值:建议每1000个节点分配2.4Gi内存,每1w个Pod分配1Gi内存,二者叠加进行计算。(该计算方法相比表2中的建议值会存在一定的误差,通过查表或计算均可)

即:内存申请值 = 目标节点数/1000 * 2.4Gi + 目标Pod规模/1w * 1Gi。例如2000节点和2w个Pod的场景下,内存申请值 = 2 * 2.4Gi + 2 * 1G = 6.8Gi。

表2 volcano-controller和volcano-scheduler的建议值

| 集群节点数/Pod规模 | CPU Request(m) | CPU Limit(m) | Memory Request(Mi) | Memory Limit(Mi) |
| ------------------ | -------------- | ------------ | ------------------ | ---------------- |
| 50/5k | 500 | 2000 | 500 | 2000 |
| 100/1w | 1000 | 2500 | 1500 | 2500 |
| 200/2w | 1500 | 3000 | 2500 | 3500 |
| 300/3w | 2000 | 3500 | 3500 | 4500 |
| 400/4w | 2500 | 4000 | 4500 | 5500 |
| 500/5w | 3000 | 4500 | 5500 | 6500 |
| 600/6w | 3500 | 5000 | 6500 | 7500 |
| 700/7w | 4000 | 5500 | 7500 | 8500 |
Comment on lines +171 to +178

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Table 2, the "集群节点数/Pod规模" column uses a mix of k (for thousands) and w (for ten thousands) to abbreviate numbers. This is inconsistent. For clarity and consistency with the English version of this document, it would be better to write out the full numbers (e.g., 5000, 10000, 20000, etc.).

Suggested change
| 50/5k | 500 | 2000 | 500 | 2000 |
| 100/1w | 1000 | 2500 | 1500 | 2500 |
| 200/2w | 1500 | 3000 | 2500 | 3500 |
| 300/3w | 2000 | 3500 | 3500 | 4500 |
| 400/4w | 2500 | 4000 | 4500 | 5500 |
| 500/5w | 3000 | 4500 | 5500 | 6500 |
| 600/6w | 3500 | 5000 | 6500 | 7500 |
| 700/7w | 4000 | 5500 | 7500 | 8500 |
| 50/5000 | 500 | 2000 | 500 | 2000 |
| 100/10000 | 1000 | 2500 | 1500 | 2500 |
| 200/20000 | 1500 | 3000 | 2500 | 3500 |
| 300/30000 | 2000 | 3500 | 3500 | 4500 |
| 400/40000 | 2500 | 4000 | 4500 | 5500 |
| 500/50000 | 3000 | 4500 | 5500 | 6500 |
| 600/60000 | 3500 | 5000 | 6500 | 7500 |
| 700/70000 | 4000 | 5500 | 7500 | 8500 |