-
Notifications
You must be signed in to change notification settings - Fork 472
create guide/recomendation for custom cluster sizes #32444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: self-managed-docs/v25.1
Are you sure you want to change the base?
create guide/recomendation for custom cluster sizes #32444
Conversation
80938d7
to
94a9bf0
Compare
{{% self-managed/materialize-cluster-sizes %}} | ||
|
||
## Custom Cluster Sizes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the content into the appendix to lower the prominence since we don't want to encourage people to override the defaults.
memory_limit: <string> # e.g., "46575MiB" | ||
``` | ||
|
||
{{< yaml-table data="best_practices/sizing_recommendation" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have some random blurbs scattered across (listing here)
-
If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory
-
If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage
-
2:1 disk-to-RAM ratio with spill-to-disk enabled.
-
2:1 disk-to-RAM ratio with spill-to-disk enabled.
Once we settle on what's what, will rework into unified statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If spill-to-disk is not enabled: 1:8 ratio of vCPU to GiB memory
If spill-to-disk is enabled (Recommended): 1:16 ratio of vCPU to GiB local instance storage
Where is this coming from? I'm pretty sure we always want 1:8.
2:1 disk-to-RAM ratio with spill-to-disk enabled
sounds right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this coming from? I'm pretty sure we always want 1:8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, that's right, I've never seen cores to disk ratio before but this makes sense to me.
@@ -7,8 +7,40 @@ menu: | |||
weight: 900 | |||
--- | |||
|
|||
## Default Cluster Sizes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{</ tip >}} | ||
|
||
```yaml | ||
operator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, ... I'm guessing that if using terraforms, people would set these via https://github.com/MaterializeInc/terraform-aws-materialize?tab=readme-ov-file#input_helm_values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we hide this from our documentation (which is probably good) but if you pull down any of the sample values or defaults it'll be there for people to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider for the future is I'm familiar with adding some blurb about "If you need guidance on ..., offer ...." kind of a blurbs. We're not there yet but custom cluster sizing might be one of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaningless review since I "authored" this PR but I approve!
sizes: | ||
<size>: | ||
workers: <int> | ||
scale: <int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to let users set scale to anything but 1? Larger scales aren' much tested and might come with caveats (network bandwidth requirements) that aren't well documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will update to have scale: 1 # Generally, should be set to 1.
We do show, however, in our default settings for 6400cc
a scale value of 2
. Hope that's okay ... since it is what it is set to. https://preview.materialize.com/materialize/32444/self-managed/v25.1/sql/appendix-cluster-sizes/#default-cluster-sizes
scale: <int> | ||
cpu_exclusive: <bool> | ||
cpu_limit: <float> # e.g., 6 | ||
credits_per_hour: <string> # e.g., "0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
credits_per_hour could be optional? That's a different change, but we should document that it's just a number for accounting purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably worth back-porting a "0.00" default. I can do this.
Recommendation: | | ||
|
||
Prefer whole number values to enable CPU affinity. Kubernetes only allows | ||
CPU Affinity for pods taking a whole number of cores (not hyperthreads). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hyperthread mention is incorrect here. Kubernetes doesn't really distinguish between cores and their hyperthreads.
Motivation
Currently self-managed users are more or less flying blind when it comes to cluster sizes. We should offer some recommendations/guidance here. This is probably not the way to do it, but it's a start. It largely matches the defaults we've provided with some explanation. This should probably have some product review.
*This is a very rough draft, just trying to get the ball rolling on this.
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.