Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft a policy for updating Go versions across the Kubernetes codebase/infra #1139

Open
ixdy opened this issue Jan 24, 2019 · 65 comments
Open
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@ixdy
Copy link
Member

ixdy commented Jan 24, 2019

Kubernetes produces a new minor release every 3 months, and currently supports the 3 most recent releases. This means that a given release is supported for around 8-9 months.

Go produces a new release approximately every 6 months, and supports the 2 most recent releases. This means that a given release is supported for about a year.

The release schedules are slightly out of sync, however; the newest Go release usually occurs close to code freeze for a Kubernetes release, so we do not update until the next minor release, trailing the Go release by a few months.

This means that we often end up supporting a Kubernetes release well past the Go version it's using was supported; for example, we supported Kubernetes 1.10 through December 2018, but Go 1.9 (which it used) went out of support at the end of August 2018.

Similarly, Kubernetes 1.11 (expected to be supported through March 2019) and Kubernetes 1.12 (expected to be supported through June 2019) use Go 1.10, but this release will likely go out of support by Februrary 2019.

Historically we had some performance issues with new Go releases, which is why we generally avoided updating close to code freeze. There have also been occasional incompatible changes (particularly around gofmt), which make updates messier.

Still, we've tested the latest Go release extensively on master (and even have a release using it, Kubernetes 1.13), so we should probably update our older release branches, and perhaps make this a new policy going forward.


From @justaugustus in kubernetes/test-infra#25355:

#25723 bumped the golang version to go1.18, so I've updated this to only include the new main variant. @kubernetes/release-engineering -- Needs /lgtm!

I missed, have we tested on -experimental at all yet? 1.18 looks like it has some exciting / major changes :D

@BenTheElder -- We created the go-canary variant in #18328 to prevent disruption for other experimental jobs when changing Golang major versions.

The experimental variant is used in enough non-SIG Release/Testing, that I've never felt comfortable with subjecting that many jobs to the change.

So right now, if someone is feeling adventurous: use go-canary. If not, they can use the master variant which doesn't get moved to "next" Golang until it lands in k/k.

That said, we don't really have a policy re: variants. @kubernetes/sig-testing-leads -- WDYT?

From @BenTheElder:

The experimental variant is used in enough non-SIG Release/Testing, that I've never felt comfortable with subjecting that many jobs to the change.

Users opting into -experimental know what they're getting into :-), we've rolled go ahead of k/k in the past here many many times.

IMHO we should be running with new go versions pretty extensively in CI before rolling them to the main repo, that's why -experimental exists (though test-infra now manages go version in repo anyhow ...). Any project choosing to use -experimental is signing up to be experimented on, if that's not suitable for their usage, they need to use another image.

KRTE in particular is more explicitly not even supported for other purposes than testing kubernetes with kind, for this reason (to avoid other usage blocking changes), but "experimental" in the tag should be a pretty good hint + past behavior of regular go upgrades.

I don't think go-canary is used widely enough, and the point of -experimental vs "go-canary" is that it might not only be go that we need to ensure is safe to upgrade, the same CI jobs may also be used to trial e.g. docker upgrades, it's wasteful to have jobs running only intended to be used for go upgrade testing.

At different points in time we have different things to update in the CI environment that should receive the same level of soaking through some e2e canary jobs etc before rolling to main/master / ...

I'm not sure what happened where this go round, but the go1.18 update was extremely not smooth, it doesn't feel like it got soaked in CI ahead of updating.

From @liggitt:

if the intent of this is just to enable master/main kubekins tags to be identical, this seems harmless.

I agree we should stop bumping master (and now main) images to a new version of go before demonstrating green runs on the go-canary / experimental image, but that seems outside the scope of this PR.

@justaugustus, is there an issue tracking the fallout of the last CI go bump to 1.18 and how we'll change the rollout for go 1.19?

@ixdy
Copy link
Member Author

ixdy commented Jan 24, 2019

/sig release

@BenTheElder
Copy link
Member

BenTheElder commented Jan 24, 2019

cc @dims @cblecker

FWIW, I support upgrading. It will be a bit of work, but it will mean we continue to use supported go going forward.

The gofmt incompatibility is only development time, and other changes should be safe and compatible, and we've already hopefully verified scaling issues with the newer go.

@ixdy
Copy link
Member Author

ixdy commented Jan 24, 2019

This issue was at least partially motivated by the recent Go security releases.

I think in the short term we should make new patch releases of k8s 1.11 and 1.12 with go1.10.8, but we should aim to move those to go1.11.x before go1.10 goes out of support next month.

@dims
Copy link
Member

dims commented Jan 24, 2019

@BenTheElder @ixdy - i'd support this new policy

cc @kubernetes/sig-release

@dims
Copy link
Member

dims commented Jan 24, 2019

@BenTheElder @ixdy we should get buy-in from the current patch release managers too

@ixdy
Copy link
Member Author

ixdy commented Jan 24, 2019

cc @foxish @feiskyer @aleksandra-malinowska, @tpepper @kubernetes/product-security-team

@cblecker
Copy link
Member

I'm supportive of this policy too, but it will mean more work maintaining our release branches, especially with things like rules_go/bazel.

@ixdy
Copy link
Member Author

ixdy commented Jan 25, 2019

if we don't want to maintain compatibility with the latest bazel + rules_go (historically we've generally kept those locked), we can always override the go version. We're already doing this on release-1.11:
https://github.com/kubernetes/kubernetes/blob/3a10094374f22b621b3379a2f6cb81025ffe543f/build/root/WORKSPACE#L50-L67

@feiskyer
Copy link
Member

+1.

if we don't want to maintain compatibility with the latest bazel + rules_go (historically we've generally kept those locked), we can always override the go version.

This seems easier to do upgrading. But with this, will there some build issues because of override?

@tpepper
Copy link
Member

tpepper commented Jan 25, 2019

Seems like we must at least trial moving our old branches forward to golang 1.11.x over the next weeks then (ie: see if scale test especially are okay). In the meantime need to release updated k8s builds with the golang patch fix release of the versions we're current using.

@ixdy
Copy link
Member Author

ixdy commented Jan 25, 2019

But with this, will there some build issues because of override?

There shouldn't be. rules_go is pretty agnostic about Go versions.

@ixdy
Copy link
Member Author

ixdy commented Feb 22, 2019

One catch: go1.10 -> go1.11 had a lot of breaking changes (fmt and vet), which makes the update harder than just "update the version".

@spiffxp
Copy link
Member

spiffxp commented Feb 24, 2019

/priority important-soon
No milestone, AFAICT this isn't directly relevant to v1.14 but instead all previous releases

@ixdy
Copy link
Member Author

ixdy commented Feb 26, 2019

Go 1.12 is out now, which means that Kubernetes 1.11 and 1.12 are both now using an unsupported version of Go (1.10).

@liggitt
Copy link
Member

liggitt commented Feb 26, 2019

One catch: go1.10 -> go1.11 had a lot of breaking changes (fmt and vet), which makes the update harder than just "update the version".

bumping go versions can cut both ways. as an example, golang/go#27044 caused bottleneck issues for clients that needed to open more than a few hundred watches (aggregator, kubelet).

That issue is resolved in go1.12, which could be good to pick up, but the opposite would have happened when bumping to the version of go that first included golang/go#27044.

@ixdy
Copy link
Member Author

ixdy commented Feb 26, 2019

hm, so it sounds like go1.11 had some pretty serious regressions for Kubernetes? what should we do, then?

@dims
Copy link
Member

dims commented Feb 26, 2019

if we want to do this we should it ASAP! we still have a week and half away to code freeze. we will be slammed with PRs the closer we get to code freeze. bisect(s) are going to get harder with volume if we run into regressions.

@spiffxp
Copy link
Member

spiffxp commented Feb 26, 2019

Speaking about this in SIG Release, I have no strong opinion at present on the version of go used for previous releases.

I am willing to say yes if someone wants to try landing 1.12 support in master prior to Friday, otherwise I would rather punt this to the next release.

@ixdy
Copy link
Member Author

ixdy commented Feb 26, 2019

I originally created this issue only to discuss updating the version of Go in older Kubernetes releases (notably k8s 1.11 and 1.12).

@calebamiles
Copy link
Contributor

I'm going to put on my RelEng lead hat and say, that at least for me, we're outside of the window where we should attempt this for 1.14. I think in general we need a pipeline that encodes the criteria we would use to evaluate the correctness of a go version bump. I would like to see such tooling applied to previously released versions of Kubernetes so we test again on lower risk targets. Let's perhaps work on some concrete requirements, in collaboration with SIGs like Scalability, and see some demos of how validation could work.

/cc @hoegaarden

@liggitt
Copy link
Member

liggitt commented Feb 26, 2019

Go 1.12 is out now, which means that Kubernetes 1.11 and 1.12 are both now using an unsupported version of Go (1.10).

This is the part that concerns me the most. Is it better to set ourselves up for needing to scramble to update and validate if (when?) go security releases come out, or to get ourselves on versions that will be supported for the duration of the k8s release's support life?

@cblecker
Copy link
Member

Open PR to update master to 1.12: kubernetes/kubernetes#74632

Dunno if this is a 1.14.0 thing, or if we should leave it for later.. but at least we can get CI rolling and see how it performs.

@BenTheElder
Copy link
Member

Please note that such scrambling still goes through some maintainers right now such as @ixdy with pretty low remaining bandwidth (AFAICT anyhow...) to push the necessary build images etc.
IMHO it's preferable to have a plan for upgrading instead of scrambling at the last minute to figure out how to get patched go.

@ixdy
Copy link
Member Author

ixdy commented Mar 28, 2019

The kubernetes 1.12 branch is now using an unsupported version of go (1.10.8). Have we come to a consensus on what we want to do about this? (Kubernetes 1.11 is also affected, but I'm not sure if we're planning any more support of that branch given that 1.14 has been released.)

@feiskyer
Copy link
Member

I'm ok of upgrading to golang 1.12 for release-1.12. @cblecker @ixdy Would you like to validate and land it?

@liggitt
Copy link
Member

liggitt commented Mar 29, 2019

there is a latency regression and proxy flushing regression we need to resolve before we transition any other release branches to go1.12

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@justaugustus
Copy link
Member

/remove-lifecycle stale
/milestone v1.16

@wojtek-t
Copy link
Member

there is a latency regression and proxy flushing regression we need to resolve before we transition any other release branches to go1.12

This has been fixed in the meantime - flush was fixed in 1.12.1 (or 2) IIRC, and performance regression was fixed in 1.12.5 - so we should be safe.

[BTW, we're also working with golang team now to validate upcoming 1.13 release, so that we won't be surprised again (at least from performance POV)].

@dims
Copy link
Member

dims commented Jul 13, 2019

@justaugustus @wojtek-t @feiskyer @ixdy - are we there yet? :) (what's left to be done)

@RobertKielty
Copy link
Member

@cpanato @LappleApple unfortunately I will not be able to contribute to this work.

@LappleApple
Copy link

@sethmccombs Hey, would you like to work on this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 23, 2021
@justaugustus
Copy link
Member

/remove-lifecycle stale
/milestone v1.22

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 23, 2021
@k8s-ci-robot k8s-ci-robot modified the milestones: v1.21, v1.22 May 23, 2021
@justaugustus
Copy link
Member

@kubernetes/release-engineering is pairing on the draft here: https://docs.google.com/document/d/1npm6aXRyXIgrgL1SRbFn16GkNGkTH7zWEYy4sO5JccU/edit?usp=drivesdk

Dan is point.
/assign @hasheddan

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2021
@cpanato
Copy link
Member

cpanato commented Oct 21, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 21, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2022
@justaugustus justaugustus added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2022
@justaugustus justaugustus modified the milestones: v1.22, v1.25, v1.24 Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests