Skip to content

create guide/recomendation for custom cluster sizes #32444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: self-managed-docs/v25.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions doc/user/content/sql/appendix-cluster-sizes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,40 @@ menu:
weight: 900
---

## Default Cluster Sizes
Copy link
Contributor

@kay-kim kay-kim May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


{{% self-managed/materialize-cluster-sizes %}}

## Custom Cluster Sizes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the content into the appendix to lower the prominence since we don't want to encourage people to override the defaults.


When installing the Materialize Helm chart, you can override the [default
cluster sizes and resource allocations](/sql/appendix-cluster-sizes/). These
cluster sizes are used for both internal clusters, such as the `system_cluster`,
as well as user clusters.

{{< tip >}}

In general, you should not have to override the defaults. At minimum, we
recommend that you keep the 25-200cc cluster sizes.

{{</ tip >}}

```yaml
operator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, ... I'm guessing that if using terraforms, people would set these via https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_helm_values ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we hide this from our documentation (which is probably good) but if you pull down any of the sample values or defaults it'll be there for people to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider for the future is I'm familiar with adding some blurb about "If you need guidance on ..., offer ...." kind of a blurbs. We're not there yet but custom cluster sizing might be one of those.

clusters:
sizes:
<size>:
workers: <int>
scale: 1 # Generally, should be set to 1.
cpu_exclusive: <bool>
cpu_limit: <float> # e.g., 6
credits_per_hour: "0.0" # N/A for self-managed.
disk_limit: <string> # e.g., "93150MiB"
memory_limit: <string> # e.g., "46575MiB"
```

{{< yaml-table data="best_practices/sizing_recommendation" >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have some random blurbs scattered across (listing here)

  • If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory

  • If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage

  • 2:1 disk-to-RAM ratio with spill-to-disk enabled.

  • 2:1 disk-to-RAM ratio with spill-to-disk enabled.

Once we settle on what's what, will rework into unified statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory
If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage

Where is this coming from? I'm pretty sure we always want 1:8.

2:1 disk-to-RAM ratio with spill-to-disk enabled
sounds right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, that's right, I've never seen cores to disk ratio before but this makes sense to me.


{{< note >}}

If you have modified the default cluster size configurations, you can query the
Expand Down
81 changes: 81 additions & 0 deletions doc/user/data/best_practices/sizing_recommendation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@

columns:
- column: "Field"
- column: "Type"
- column: "Description"
- column: "Recommendation"
rows:
- Field: "**workers**"
Type: int
Description: |

The number of timely workers in your cluster replica.

Recommendation: |

Use 1 worker per CPU core, with a minimum of 1 worker.

- Field: "**scale**"
Type: int
Description: |

The number of pods (i.e., processes) to use in a cluster replica; used to
scale out replicas horizontally. Each pod will be provisioned using the
settings defined in the size definition.

Recommendation: |

Generally, this should be set to 1. This should only be greater than 1 when a replica needs to take on limits that are greater than the maximum limits permitted on a single node.

- Field: "**cpu_exclusive**"
Type: bool
Description: |

The flag that determines if the workers should attempt to pin to a particular CPU core.
Recommendation: |

<a name="cpu_exclusive"></a>

Set to true **if and only if** the [`cpu_limit`](#cpu_limit) is a whole
number and the CPU management policy in the k8s cluster is set to static.

- Field: "**cpu_limit**"
Type: float
Description: |

<a name="cpu_limit"></a>
The k8s limit for CPU for a replica pod in cores.
Recommendation: |

Prefer whole number values to enable CPU affinity. Kubernetes only allows
CPU Affinity for pods taking a whole number of cores.

If the value is not a whole number, set [`cpu_exclusive`](#cpu_exclusive) to false.

- Field: "**memory_limit**"
Type: float
Description: |

The k8s limit for memory for a replica pod in bytes.
Recommendation: |

For most workloads, use an approximate **1:8** CPU-to-memory ratio (1 core
: 8 GiB). This can vary depending on your workload characteristics.

- Field: "**disk_limit**"
Type: float
Description: |

The size of the NVMe persistent volume to provision for a replica pod in bytes.
Recommendation: |

When spill-to-disk is enabled, use a **1:2** memory-to-disk ratio. Materialize spills data to disk when memory is insufficient, which can impact performance.

- Field: "**credits_per_hour**"
Type: string
Description: |

This is a cloud attribute that should be set to "0.00" in self-managed.
Recommendation: |

Set to "0.00" for self-managed deployments.