Skip to content

Commit 38fbfdc

Browse files
Feat gpu (#585)
* feat: add GPU support for karpenter * fix
1 parent c5ff709 commit 38fbfdc

4 files changed

Lines changed: 43 additions & 18 deletions

File tree

website/docs/using-qovery/configuration/application.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
last_modified_on: "2025-09-16"
2+
last_modified_on: "2025-10-14"
33
title: "Application"
44
description: "Learn how to configure your Application on Qovery"
55
---
@@ -102,6 +102,7 @@ Within this section, you will need to define the resources to be assigned to you
102102

103103
- vCPU: the vCPU assigned to each instance of your application. The default is 500m (0.5 vCPU).
104104
- RAM: the amount of RAM assigned to each instance of your application. The default is 512MB.
105+
- GPU: the amount of GPU assigned to each instance of your application. The default is 0. The GPU nodepool must be enabled on your cluster to be able to use GPU instances (see [AWS with Karpenter cluster setup documentation][docs.using-qovery.configuration.clusters.aws-with-karpenter]).
105106
- Number of instances (Application Auto-scaling): select the minimum and the maximum number of instances of your application that can run within your cluster. The number of instances running at an insant t is automatically managed by Kubernetes (Application auto-scaling) and it is based on real-time CPU consumption. When your app goes above 60% of CPU consumption for 5 minutes, your app will be auto-scaled and more instances will be added. It is transparent.
106107
Qovery runs your application on Kubernetes and relies on [metrics-server](https://github.com/kubernetes-sigs/metrics-server) service to auto-scale your app.
107108

@@ -282,6 +283,14 @@ Default is 512MB.
282283

283284
</Alert>
284285

286+
#### GPU
287+
288+
To configure the amount of GPU that your app needs, adjust the setting in `Resources` section of the application configuration.
289+
290+
<Alert type="info">
291+
Default is 0. The GPU nodepool must be enabled on your cluster to be able to use GPU instances.
292+
</Alert>
293+
285294
Please note that in this section you configure the CPU allocated by the cluster for your application and that cannot consume more than this value. Even if the application is underused and consume less resources, the cluster will still reserve the selected amount of CPU. If your application requires more RAM than requested, it will be killed by the kubernetes scheduler.
286295

287296
#### Auto-scaling
@@ -559,6 +568,7 @@ In the application overview, click on the `3 dots` button and remove the applica
559568
[docs.using-qovery.configuration.advanced-settings]: /docs/using-qovery/configuration/advanced-settings/
560569
[docs.using-qovery.configuration.application-health-checks]: /docs/using-qovery/configuration/application-health-checks/
561570
[docs.using-qovery.configuration.clusters#use-custom-domain-and-wildcard-tls-for-the-whole-cluster-beta]: /docs/using-qovery/configuration/clusters/#use-custom-domain-and-wildcard-tls-for-the-whole-cluster-beta
571+
[docs.using-qovery.configuration.clusters.aws-with-karpenter]: /docs/using-qovery/configuration/clusters/aws-with-karpenter/
562572
[docs.using-qovery.configuration.environment-variable#connecting-to-a-database]: /docs/using-qovery/configuration/environment-variable/#connecting-to-a-database
563573
[docs.using-qovery.configuration.environment-variable#connecting-to-another-application]: /docs/using-qovery/configuration/environment-variable/#connecting-to-another-application
564574
[docs.using-qovery.configuration.environment-variable]: /docs/using-qovery/configuration/environment-variable/

website/docs/using-qovery/configuration/application.md.erb

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ Within this section, you will need to define the resources to be assigned to you
9090

9191
- vCPU: the vCPU assigned to each instance of your application. The default is 500m (0.5 vCPU).
9292
- RAM: the amount of RAM assigned to each instance of your application. The default is 512MB.
93+
- GPU: the amount of GPU assigned to each instance of your application. The default is 0. The GPU feature must be enabled on your cluster to be able to use GPU instances (see [AWS with Karpenter cluster setup documentation][docs.using-qovery.configuration.clusters.aws-with-karpenter]).
9394
- Number of instances (Application Auto-scaling): select the minimum and the maximum number of instances of your application that can run within your cluster. The number of instances running at an insant t is automatically managed by Kubernetes (Application auto-scaling) and it is based on real-time CPU consumption. When your app goes above 60% of CPU consumption for 5 minutes, your app will be auto-scaled and more instances will be added. It is transparent.
9495
Qovery runs your application on Kubernetes and relies on [metrics-server](https://github.com/kubernetes-sigs/metrics-server) service to auto-scale your app.
9596

@@ -270,6 +271,14 @@ Default is 512MB.
270271

271272
</Alert>
272273

274+
#### GPU
275+
276+
To configure the amount of GPU that your app needs, adjust the setting in `Resources` section of the application configuration.
277+
278+
<Alert type="info">
279+
Default is 0. The GPU feature must be enabled on your cluster to be able to use GPU instances.
280+
</Alert>
281+
273282
Please note that in this section you configure the CPU allocated by the cluster for your application and that cannot consume more than this value. Even if the application is underused and consume less resources, the cluster will still reserve the selected amount of CPU. If your application requires more RAM than requested, it will be killed by the kubernetes scheduler.
274283

275284
#### Auto-scaling

website/docs/using-qovery/configuration/clusters/aws-with-karpenter.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
last_modified_on: "2025-07-07"
2+
last_modified_on: "2025-10-14"
33
title: "AWS EKS with Karpenter"
44
description: "Learn how to configure your AWS Kubernetes clusters with Karpenter on Qovery"
55
---
@@ -44,6 +44,7 @@ To confirm, click `Next`.
4444
In the `Set Resources` window, select:
4545

4646
* `Karpenter`: Toggle the switch to enable Karpenter on your AWS EKS cluster
47+
* `Node disk size (GB)`: Specify the disk capacity allocated per worker node, determining the amount of data each node can store. The minimum value is 20GB.
4748
* `Instance types scopes`: By editing it, you can apply different filters to the node architectures, categories, families, and sizes. On the right, you can view all the instance types that match the applied filters. This means Karpenter will be able to spawn nodes on any of the listed instance types.
4849
* `Architectures`: by default both `AMD64` and `ARM64` architectures are selected.
4950
* `Default build architecture`: by default `AMD64`. If you build your application with the Qovery CI, your application will be built using this architecture by default.
@@ -55,6 +56,9 @@ In the `Set Resources` window, select:
5556
<img src="/img/configuration/clusters/spot_usage.png" alt="Spot usage" />
5657
</p>
5758

59+
* `Enable GPU Nodepool configuration`: If you want to run GPU workloads on your cluster, you can enable this option to create a dedicated nodepool for GPU instances. You will then be able to select the GPU instance types you want to use on this nodepool. To enable spot instances, toggle the spot instance flag.
60+
61+
5862
<Alert type="warning">
5963
Instance type selection from your Qovery Console has direct consequences on your cloud provider’s bill. While Qovery allows you to switch to a different instance type whenever you want, it is your sole responsibility to keep an eye on your infrastructure costs, especially when you want to upsize.
6064

@@ -334,12 +338,14 @@ Qovery deploys two node pools by default:
334338
- **Stable node pool**: Used for single instances and internal Qovery applications. For example, any containerized databases or application having the number of minimum instances set to 1, will be deployed on this nodepool. On this nodepool the consolidation is deactivated by default.
335339
- **Default node pool**: Designed to handle general workloads and serves as the foundation for deploying most applications.
336340

337-
Qovery allows you to modify the resources allocated to your cluster:
341+
An additional GPU node pool can be present if you have enabled the GPU node pool configuration when creating your cluster (can be enabled afterwards).
338342

339-
##### Shared settings for both nodepools:
340-
- **Instance types**: Define the list of instance types that can be used.
341-
- **Spot instances**: Enable or disable spot instances.
342-
- **Node disk size (GB)**: Specify the disk capacity allocated per worker node, determining the amount of data each node can store.
343+
##### Settings for nodepools:
344+
- **Instance types**: Define the list of instance types that can be used. (Shared for Stable and Default nodepools)
345+
- **Spot instances**: Enable or disable spot instances. (Shared across the three nodepools)
346+
- **Node disk size (GB)**: Specify the disk capacity allocated per worker node, determining the amount of data each node can store. (Shared for Stable and Default nodepools)
347+
- **Consolidation schedule** *(Stable nodepool only)*: Optimizes resource usage by consolidating workloads onto fewer nodes. This feature is not available for the default nodepool, as consolidation can happen at any time. We recommend enabling this option; otherwise, nodes will never be consolidated, leading to unnecessary infrastructure costs.
348+
- **Node pool limits**: Configure CPU and memory limits to ensure nodes stay within defined resource constraints, preventing excessive costs.
343349

344350
<Alert type="warning">
345351
Instance type selection from your Qovery Console has direct consequences on your cloud provider’s bill. While Qovery allows you to switch to a different instance type whenever you want, it is your sole responsibility to keep an eye on your infrastructure costs, especially when you want to upsize.
@@ -348,9 +354,6 @@ For more information on the instance types provided by each cloud provider and t
348354

349355
</Alert>
350356

351-
##### Nodepool specific settings:
352-
- **Consolidation schedule** *(Stable nodepool only)*: Optimizes resource usage by consolidating workloads onto fewer nodes. This feature is not available for the default nodepool, as consolidation can happen at any time. We recommend enabling this option; otherwise, nodes will never be consolidated, leading to unnecessary infrastructure costs.
353-
- **Node pool limits**: Configure CPU and memory limits to ensure nodes stay within defined resource constraints, preventing excessive costs.
354357

355358
#### Mirroring registry
356359

website/docs/using-qovery/configuration/clusters/aws-with-karpenter.md.erb

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ To confirm, click `Next`.
4141
In the `Set Resources` window, select:
4242

4343
* `Karpenter`: Toggle the switch to enable Karpenter on your AWS EKS cluster
44+
* `Node disk size (GB)`: Specify the disk capacity allocated per worker node, determining the amount of data each node can store. The minimum value is 20GB.
4445
* `Instance types scopes`: By editing it, you can apply different filters to the node architectures, categories, families, and sizes. On the right, you can view all the instance types that match the applied filters. This means Karpenter will be able to spawn nodes on any of the listed instance types.
4546
* `Architectures`: by default both `AMD64` and `ARM64` architectures are selected.
4647
* `Default build architecture`: by default `AMD64`. If you build your application with the Qovery CI, your application will be built using this architecture by default.
@@ -52,6 +53,9 @@ In the `Set Resources` window, select:
5253
<img src="/img/configuration/clusters/spot_usage.png" alt="Spot usage" />
5354
</p>
5455

56+
* `Enable GPU Nodepool configuration`: If you want to run GPU workloads on your cluster, you can enable this option to create a dedicated nodepool for GPU instances. You will then be able to select the GPU instance types you want to use on this nodepool. To enable spot instances, toggle the spot instance flag.
57+
58+
5559
<Alert type="warning">
5660
Instance type selection from your Qovery Console has direct consequences on your cloud provider’s bill. While Qovery allows you to switch to a different instance type whenever you want, it is your sole responsibility to keep an eye on your infrastructure costs, especially when you want to upsize.
5761

@@ -331,12 +335,14 @@ Qovery deploys two node pools by default:
331335
- **Stable node pool**: Used for single instances and internal Qovery applications. For example, any containerized databases or application having the number of minimum instances set to 1, will be deployed on this nodepool. On this nodepool the consolidation is deactivated by default.
332336
- **Default node pool**: Designed to handle general workloads and serves as the foundation for deploying most applications.
333337

334-
Qovery allows you to modify the resources allocated to your cluster:
338+
An additional GPU node pool can be present if you have enabled the GPU node pool configuration when creating your cluster (can be enabled afterwards).
335339

336-
##### Shared settings for both nodepools:
337-
- **Instance types**: Define the list of instance types that can be used.
338-
- **Spot instances**: Enable or disable spot instances.
339-
- **Node disk size (GB)**: Specify the disk capacity allocated per worker node, determining the amount of data each node can store.
340+
##### Settings for nodepools:
341+
- **Instance types**: Define the list of instance types that can be used. (Shared for Stable and Default nodepools)
342+
- **Spot instances**: Enable or disable spot instances. (Shared across the three nodepools)
343+
- **Node disk size (GB)**: Specify the disk capacity allocated per worker node, determining the amount of data each node can store. (Shared for Stable and Default nodepools)
344+
- **Consolidation schedule** *(Stable nodepool only)*: Optimizes resource usage by consolidating workloads onto fewer nodes. This feature is not available for the default nodepool, as consolidation can happen at any time. We recommend enabling this option; otherwise, nodes will never be consolidated, leading to unnecessary infrastructure costs.
345+
- **Node pool limits**: Configure CPU and memory limits to ensure nodes stay within defined resource constraints, preventing excessive costs.
340346

341347
<Alert type="warning">
342348
Instance type selection from your Qovery Console has direct consequences on your cloud provider’s bill. While Qovery allows you to switch to a different instance type whenever you want, it is your sole responsibility to keep an eye on your infrastructure costs, especially when you want to upsize.
@@ -345,9 +351,6 @@ For more information on the instance types provided by each cloud provider and t
345351

346352
</Alert>
347353

348-
##### Nodepool specific settings:
349-
- **Consolidation schedule** *(Stable nodepool only)*: Optimizes resource usage by consolidating workloads onto fewer nodes. This feature is not available for the default nodepool, as consolidation can happen at any time. We recommend enabling this option; otherwise, nodes will never be consolidated, leading to unnecessary infrastructure costs.
350-
- **Node pool limits**: Configure CPU and memory limits to ensure nodes stay within defined resource constraints, preventing excessive costs.
351354

352355
#### Mirroring registry
353356

0 commit comments

Comments
 (0)