Skip to content

create guide/recomendation for custom cluster sizes #32444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: self-managed-docs/v25.1
Choose a base branch
from

Conversation

jubrad
Copy link
Contributor

@jubrad jubrad commented May 9, 2025

Motivation

Currently self-managed users are more or less flying blind when it comes to cluster sizes. We should offer some recommendations/guidance here. This is probably not the way to do it, but it's a start. It largely matches the defaults we've provided with some explanation. This should probably have some product review.

*This is a very rough draft, just trying to get the ball rolling on this.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jubrad jubrad requested a review from a team as a code owner May 9, 2025 00:21
@jubrad jubrad requested a review from kay-kim May 9, 2025 16:30
@jubrad jubrad force-pushed the wip-custom-cluster-size-recomendations branch from 80938d7 to 94a9bf0 Compare May 9, 2025 19:59
{{% self-managed/materialize-cluster-sizes %}}

## Custom Cluster Sizes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the content into the appendix to lower the prominence since we don't want to encourage people to override the defaults.

memory_limit: <string> # e.g., "46575MiB"
```

{{< yaml-table data="best_practices/sizing_recommendation" >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have some random blurbs scattered across (listing here)

  • If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory

  • If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage

  • 2:1 disk-to-RAM ratio with spill-to-disk enabled.

  • 2:1 disk-to-RAM ratio with spill-to-disk enabled.

Once we settle on what's what, will rework into unified statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory
If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage

Where is this coming from? I'm pretty sure we always want 1:8.

2:1 disk-to-RAM ratio with spill-to-disk enabled
sounds right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, that's right, I've never seen cores to disk ratio before but this makes sense to me.

@@ -7,8 +7,40 @@ menu:
weight: 900
---

## Default Cluster Sizes
Copy link
Contributor

@kay-kim kay-kim May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{</ tip >}}

```yaml
operator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, ... I'm guessing that if using terraforms, people would set these via https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_helm_values ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we hide this from our documentation (which is probably good) but if you pull down any of the sample values or defaults it'll be there for people to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider for the future is I'm familiar with adding some blurb about "If you need guidance on ..., offer ...." kind of a blurbs. We're not there yet but custom cluster sizing might be one of those.

Copy link
Contributor Author

@jubrad jubrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaningless review since I "authored" this PR but I approve!

sizes:
<size>:
workers: <int>
scale: <int>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to let users set scale to anything but 1? Larger scales aren' much tested and might come with caveats (network bandwidth requirements) that aren't well documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Will update to have scale: 1 # Generally, should be set to 1.

We do show, however, in our default settings for 6400cc a scale value of 2. Hope that's okay ... since it is what it is set to. https://preview.materialize.com/materialize/32444/self-managed/v25.1/sql/appendix-cluster-sizes/#default-cluster-sizes

scale: <int>
cpu_exclusive: <bool>
cpu_limit: <float> # e.g., 6
credits_per_hour: <string> # e.g., "0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credits_per_hour could be optional? That's a different change, but we should document that it's just a number for accounting purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably worth back-porting a "0.00" default. I can do this.

Recommendation: |

Prefer whole number values to enable CPU affinity. Kubernetes only allows
CPU Affinity for pods taking a whole number of cores (not hyperthreads).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hyperthread mention is incorrect here. Kubernetes doesn't really distinguish between cores and their hyperthreads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants