dotnet · richlander · May 13, 2025 · May 13, 2025 · May 13, 2025 · May 13, 2025
diff --git a/docs/project/os-onboarding.md b/docs/project/os-onboarding.md
@@ -14,8 +14,9 @@ Continuing with the idea of pragmatism, if you only read this far, you've got th
 
 References:
 
-- [New Operating System Version Onboarding Guide](https://github.com/dotnet/dnceng/blob/main/Documentation/ProjectDocs/OS%20Onboarding/Guidance.md)
+- [OS Support Matrix](./os-support.md)
 - [.NET OS Support Tracking](https://github.com/dotnet/core/issues/9638)
+- [New Operating System Version Onboarding Guide](https://github.com/dotnet/dnceng/blob/main/Documentation/ProjectDocs/OS%20Onboarding/Guidance.md)
 
 ## Context
 
@@ -25,43 +26,34 @@ Nearly all the APIs that touch native code (networking, cryptography) and deal w
 
 ## Approach
 
-Our rule is that we declare support (for all [supported .NET releases](https://github.com/dotnet/core/blob/main/releases.md)) for a new OS version after it is validated in dotnet/runtime `main`. We will only hold support on additional testing in special cases (which are uncommon).
+For all OSes we aim to support, we aim to provide support on OS release day. We only require validation in dotnet/runtime `main` (for all [supported .NET releases](https://github.com/dotnet/core/blob/main/releases.md)), often relying on [non-final OS builds](https://github.com/dotnet/runtime/pull/111768#issuecomment-2617229139).
 
-We aim to have "day of" support for about half the OSes we support, including Azure Linux, Ubuntu LTS, and Windows. This means we need to perform ahead-of-time signoff on [non-final builds](https://github.com/dotnet/runtime/pull/111768#issuecomment-2617229139).
 
 Our testing philosophy is based on perceived risk and past experience. The effective test matrix is huge, the product of OSes \* supported versions \* architectures.  We try to make smart choices to **skip testing most of the matrix** while retaining much of the **practical coverage**. We also know where we tend to get bitten most when we don't pay sufficient attention. For example, our bug risk across Linux, macOS, and Windows is not uniform.
 
-We  use pragmatism and efficiency to drive our decision making. All things being equal, we'll choose the lowest cost approach.
+We use pragmatism and efficiency to drive our decision making. All things being equal, we'll choose the lowest cost approach.
 
-## Testing
+## OS Lifecycle
 
-Testing is the bread and butter of OS onboarding, particularly for a mature runtime like ours. New OS support always needs some form of test enablement.
+We update `main` to bleeding edge OS versions, even pre-release versions. This approach provides us with confidence for new OS releases and reduces remediation cost in release branches. We also find that new OS releases require product and test updates to support significant changes in foundational components.
 
-Linux, Wasm, and some Windows testing is done in container images. This approach enables us to test many and regularly changing OS versions in a fixed/limited VM environment. The container image creation/update process is self-service (discussed later).
-
-We use VMs (Linux and Windows) and raw metal hardware (Android and Apple) in cases where containers are not practical or where direct testing is desired. This is the primary model for Apple and Windows OSes. The VMs and mobile/Apple hardware are relatively slow to change and require support from dnceng (discussed later).
+There are special considerations when `main` is the next .NET LTS (odd years):
 
-### Adding coverage
+- New Debian releases tends to ship in the middle of odd years (our LTS year). It is best to [add coverage as early as possible](https://github.com/dotnet/runtime/pull/111768), in part because [Preview 1 ships pre-release Debian version](https://github.com/dotnet/dotnet-docker/discussions/6272) in container images.
+- Ubuntu LTS ship 6 months after .NET LTS. It is important that this combination has excellent support. It is recommended that we move forward with Ubuntu interim builds (examples: [24.10](https://github.com/dotnet/runtime/pull/111504), [25.04](https://github.com/dotnet/runtime/pull/113405)) requiring us to update to the next LTS during servicing (and then not update again).
 
-New OS coverage should be added/tested first in `main`. If changes are required, we should prove them out first in `main` before committing to shipping them in a servicing release, if necessary.
+We update `release` branches primarily to accommodate EOL OS references.  Alpine, Azure Linux, and Fedora are examples of OSes with shorter release cycles than .NET that require regular remediation.
 
-There are multiple reasons to add a new OS reference in a release branch:
+We avoid testing multiple versions of an OS in a single branch. We will often have multiple versions of an OS across branches, with older branches having references to older OSes. We believe that this approach provides sufficient coverage and is most likely to align with user behavior. It also aligns with the container images that we publish.
 
-- Known product (as opposed to test) breaks that require validation and regression testing.
-- Past experience suggests that coverage is required to protect against risk.
-- OS version is or [will soon go EOL](https://github.com/dotnet/runtime/issues/111818#issuecomment-2613642202) and should be replaced by a newer version.
 
-For example, we frequently need to backport Alpine updates to release branches to avoid EOL references but less commonly for Ubuntu, given the vast difference in support length.
-
-A good strategy is to keep `main` at the bleeding edge of new OS versions. That way those references have a decent chance of never needing remediation once they end up in release branches.
-
-### Updating or removing coverage
+## Testing
 
-We will often replace an older OS version with a new one, when it comes available. This approach is an effective strategy of maintaining the same level of coverage and of remediating EOL OSes ahead of time. For the most part, we don't need to care about a specific version. We just want coverage for the OS, like Alpine.
+Testing is the bread and butter of OS onboarding, particularly for a mature runtime like ours. New OS support always needs some form of test enablement.
 
-We should remediate any EOL OS references in our codebase. They don't serve any benefit and come with some risk. They are also likely to result in compliance tickets (that come with a deadline) that we want to avoid.
+Linux, Wasm, and some Windows testing is done in container images. This approach enables us to test many and regularly changing OS versions in a fixed/limited VM environment. The container image creation/update process is self-service (discussed later).
 
-In the case that a .NET version will be EOL in <6 (and certainly <3) months, new coverage can typically be skipped. We may even be able to skip remediating EOL OS references. We often opt to stop updating [supported OSes](https://github.com/dotnet/core/blob/main/os-lifecycle-policy.md) late in support period for related reasons. A lazy approach is often the best approach late in the game. Don't upset what's working.
+We use VMs (Linux and Windows) and raw metal hardware (Android and Apple) in cases where containers are not practical or where direct testing is desired. This is the primary model for Apple and Windows OSes. The VMs and mobile/Apple hardware are relatively slow to change and require support from dnceng (discussed later).
 
 ## Building
 
@@ -71,7 +63,7 @@ We use both containers and VMs for building, depending on the OS. If we test in
 
 Our primary concern is ensuring that we are using [supported operating systems and tools for our build](https://github.com/dotnet/runtime/tree/main/docs/workflow/requirements).
 
-Our Linux build containers are based on Azure Linux. We [typically need to update them](https://github.com/dotnet/runtime/issues/112191) with a new version of Azure Linux once per release. We do not update the toolset, however. That's fixed, per release.
+Our Linux build containers are based on Azure Linux. We [typically need to update them](https://github.com/dotnet/runtime/issues/112191) with a new version of Azure Linux once per release. Toolset updates are [limited to patch versions](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1422).
 
 For Apple, we likely need to make an adjustment at each macOS or iOS release to account for an Xcode version no longer being supported.
 

diff --git a/docs/project/os-support.md b/docs/project/os-support.md
@@ -0,0 +1,38 @@
+# OS Support Matrix
+
+.NET is a cross-platform product, requiring good coverage across [supported OSes](https://github.com/dotnet/core/blob/main/release-notes/10.0/supported-os.md).
+
+This document describes our coverage intent. It is a higher-level description than our pipelines.
+
+Pipelines:
+
+- [Runtime](https://github.com/dotnet/runtime/blob/main/eng/pipelines/coreclr/templates/helix-queues-setup.yml)
+- [Libraries](https://github.com/dotnet/runtime/blob/main/eng/pipelines/libraries/helix-queues-setup.yml)
+
+## Run types
+
+We rely on multiple levels of testing to provide good coverage at reasonable cost. Our testing uses OSes both as a vehicle to test .NET itself and to validate discover distro-specific breakage. This is why we test multiple OSes for each run type.
+
+- Inner loop -- Baseline set of tests that validate correct functional behavior, for PRs and branch builds.
+- Outer loop -- A much larger set of tests that validate expected edge case behavior, for ([on-demand builds)](https://github.com/dotnet/runtime/pull/115415#issuecomment-2864759316).
+- Extra platforms -- Additional OSes that are run in a rolling build that can target either inner our outer loop tests.
+
+The libraries pipeline defines these run types. The runtime pipeline has only the inner loop run type.
+
+The remainder of the document defines the tiers we apply to each operating system family. The tiers will be adapted over time, as needed. Architecture is another aspect of coverage. It isn't covered in this document.
+
+## Linux Tiers
+
+The following tiers apply for Linux. 
+
+- Tier 1: Azure Linux, Debian, Ubuntu (in inner and outer loop)
+- Tier 2: Alpine and CentOS Stream (in inner loop)
+- Tier 3: Fedora and OpenSUSE (in extra platforms)
- Tier 1: Azure Linux, Debian, Ubuntu (in inner and outer loop)
- Tier 2: Alpine and CentOS Stream (in inner loop)
- Tier 3: Fedora and OpenSUSE (in extra platforms)
+- Tier 1: present in inner and outer loop (Azure Linux, Debian, Ubuntu)
+- Tier 2: present in inner loop only (Alpine and CentOS Stream)
+- Tier 3: present in extra platforms only (Fedora and OpenSUSE)
+
- Tier 1: Azure Linux, Debian, Ubuntu (in inner and outer loop)
- Tier 2: Alpine and CentOS Stream (in inner loop)
- Tier 3: Fedora and OpenSUSE (in extra platforms)
+- Tier 1: present in inner and outer loop (Azure Linux, Debian, Ubuntu)
+- Tier 2: present in inner loop only (Alpine and CentOS Stream)
+- Tier 3: present in extra platforms only (Fedora and OpenSUSE)
+
+
+## Windows Tiers
+
+The following tiers apply for windows.
+
+- Tier 1: Windows Server 2016, Windows Server 2022, Windows Server 2025, Windows 11
+
+Note: "Windows 10" references in pipelines are really Windows Server 2016.