[ci] Update pulumi k8s operator to v2 #1017

nicu-da · 2025-06-06T07:45:14Z

Pull Request Checklist

Cluster Testing

If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

Include any change that might be observable by our partners or affect their deployment in the release notes.
Specify fixed issues with Fixes #n, and mention issues worked on using #n
Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

nicu-da · 2025-06-06T07:45:26Z

/hdm_test

nicu-da · 2025-06-06T09:47:43Z

/hdm_test

nicu-da · 2025-06-06T11:42:23Z

/hdm_test

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

moritzkiefer-da

Thanks! Do you have a link to a successful HDM test?

I am slightly nervous about just replacing this. It seems ideally we would get a few weeks of experience running this in internal clusters instead of just forcibly rolling it out to dev/test/mainnet with the next upgrade.

How hard is it to keep this is an option that we only enable on internal clusters for now?

moritzkiefer-da · 2025-06-10T07:41:43Z

cluster/deployment/scratchnetd/.envrc.vars


 # Upgrade workarounds, include a GH issue to remove them once the base version changes

 # TODO(#14679): Remove


why do we have this stuff at all?

It does not make sense indeed, let me clean it up

cluster/deployment/scratchnetd/config.yaml

moritzkiefer-da · 2025-06-10T07:44:38Z

cluster/pulumi/common/src/operator/stack.ts

+              namespace: namespaceName,
+            },
+            spec: {
+              image: `pulumi/pulumi:${semver.gt(pulumiVersion, minimumPulumiVersionRequired) ? pulumiVersion : minimumPulumiVersionRequired}-nonroot`,


does this still go through our cache?

My understanding is that yes as we basically cache everything as it always goes through that proxy. @isegall-da is that correct?

I don't think so, I think only the DinD is configured to use the cache, so that images we pull in "docker pull" in tests are pulled from there.
But I'm not sure we care. How often do we expect this to be re-pulled per cluster? (the rate limit is per IP AFAIK)

We needed the cache for jobs in CI that were pulling multiple images per each CI run, hence hit rate limits on docker.io

But I'm not sure we care. How often do we expect this to be re-pulled per cluster? (the rate limit is per IP AFAIK)

I guess ciperiodic might be the most with one pod per stack? Hopefully still infrequent enough

Rate limit seems to be 100 per IP address per 6 hours. I think we should be ok. Famous last words...

And I think that k8s has some internal cache too, doesn't it? (didn't help with docker pull in DinD, but will for the image for a pod)

nicu-da · 2025-06-10T08:08:48Z

Thanks! Do you have a link to a successful HDM test?

I have one where the steps that were touched by this did succeed but the full workflow failed because it was during the switch to splice.
I will trigger a new one though to test it out with the new repo.

I am slightly nervous about just replacing this. It seems ideally we would get a few weeks of experience running this in internal clusters instead of just forcibly rolling it out to dev/test/mainnet with the next upgrade.

How hard is it to keep this is an option that we only enable on internal clusters for now?

It's actually fairly simple as we can keep the operator deployment on an old reference if needed. Though I wouldn't postpone it more than required tbh. Once it's merged to main updating CILR should provide the feedback we need in a few days of testing.

moritzkiefer-da · 2025-06-10T08:11:39Z

It's actually fairly simple as we can keep the operator deployment on an old reference if needed. Though I wouldn't postpone it more than required tbh. Once it's merged to main updating CILR should provide the feedback we need in a few days of testing.

fair, I guess we could just revert if needed reasonably easily as well.

moritzkiefer-da

thanks!

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

nicu-da · 2025-06-10T11:18:16Z

/hdm_test

nicu-da · 2025-06-10T12:32:21Z

/hdm_test

.github/workflows/cluster_tests.yml

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

isegall-da · 2025-06-10T12:55:18Z

cluster/pulumi/operator/src/operator.ts

-          cpu: 5,
-          memory: config.optionalEnv('OPERATOR_MEMORY_LIMIT') || '20G',
+          cpu: 2,
+          memory: config.optionalEnv('OPERATOR_MEMORY_LIMIT') || '4G',


This is now per stack, which is why we need 80% less, right?

Yes, will update it as well after we deploy to prod and see actual usage but the main container should be fairly light

isegall-da

LGTM, thank you!

nicu-da added 14 commits June 6, 2025 12:48

[ci] Update pulumi k8s operator to v2

1d36757

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

use working image

ffad90f

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

set google credentials everywhere

2d07749

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

add log

84c3e22

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

fix paths

c9daba8

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

remove public config

20ed6b9

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

remove public config

6f7df72

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

fix workspace name

b34664d

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

fix workspace name

1dd308d

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

fix unset

015a294

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

rollback package config

3c21bd8

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

[static] update expected

67877e7

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

[static] update expected

4e5599c

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

[ci] set public splice to true

b8c57bf

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

nicu-da force-pushed the nicuda/pulumi/operator_v2_upgrade branch from 62bc31b to b8c57bf Compare June 6, 2025 12:57

nicu-da marked this pull request as ready for review June 10, 2025 07:36

Merge branch 'main' into nicuda/pulumi/operator_v2_upgrade

6eed11a

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

nicu-da requested review from isegall-da and moritzkiefer-da June 10, 2025 07:40

moritzkiefer-da reviewed Jun 10, 2025

View reviewed changes

moritzkiefer-da approved these changes Jun 10, 2025

View reviewed changes

[static] .

46b4d14

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

isegall-da reviewed Jun 10, 2025

View reviewed changes

.github/workflows/cluster_tests.yml Outdated Show resolved Hide resolved

[static] .

be2e1b0

Signed-off-by: Nicu Reut <nicu.reut@digitalasset.com>

isegall-da reviewed Jun 10, 2025

View reviewed changes

isegall-da approved these changes Jun 10, 2025

View reviewed changes

nicu-da merged commit 3fce9d9 into main Jun 10, 2025
41 checks passed

nicu-da deleted the nicuda/pulumi/operator_v2_upgrade branch June 10, 2025 13:14


		# Upgrade workarounds, include a GH issue to remove them once the base version changes

		# TODO(#14679): Remove

[ci] Update pulumi k8s operator to v2 #1017

[ci] Update pulumi k8s operator to v2 #1017

Uh oh!

Conversation

nicu-da commented Jun 6, 2025

Pull Request Checklist

Cluster Testing

PR Guidelines

Merge Guidelines

Uh oh!

nicu-da commented Jun 6, 2025

Uh oh!

nicu-da commented Jun 6, 2025

Uh oh!

nicu-da commented Jun 6, 2025

Uh oh!

moritzkiefer-da left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicu-da commented Jun 10, 2025

Uh oh!

moritzkiefer-da commented Jun 10, 2025

Uh oh!

moritzkiefer-da left a comment

Choose a reason for hiding this comment

Uh oh!

nicu-da commented Jun 10, 2025

Uh oh!

nicu-da commented Jun 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isegall-da left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants