Skip to content

Add new 'az zones' extension command #8704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8f3b4ed
initial working version
nielsams Apr 23, 2025
8826e4d
add table output
nielsams Apr 23, 2025
21b85ee
added location data lookup
nielsams Apr 24, 2025
526cc47
added network provider
nielsams Apr 24, 2025
c6039ab
fix output formatting
nielsams Apr 24, 2025
15eb080
add basic compute types
nielsams Apr 24, 2025
13fb891
formatting fixes
nielsams Apr 24, 2025
393cc72
add more common resource types
nielsams Apr 24, 2025
30ec6c1
allowing to omit dependent resources for clarity
nielsams Apr 24, 2025
4c7e672
add more resource types
nielsams Apr 24, 2025
1bf201e
improve readme
nielsams Apr 25, 2025
0cc3b73
bug fixes and new resource types
nielsams Apr 25, 2025
35e1c89
more resource types
nielsams Apr 25, 2025
605c497
temp remove notificationhubs
nielsams Apr 25, 2025
0234dec
add more resource types
nielsams Apr 25, 2025
be94d39
adding tests and bugfixes
nielsams Apr 28, 2025
2d01598
minor structural changes
nielsams Apr 28, 2025
e36cb81
fix whitespace issues
nielsams Apr 28, 2025
c2697be
new types and bugfixes
nielsams Apr 28, 2025
70dbfd4
syntax improvements
nielsams Apr 29, 2025
19fb1b1
formatting fixes
nielsams Apr 29, 2025
71005bd
added resource types
nielsams Apr 29, 2025
233ab5b
readme updates
nielsams Apr 29, 2025
1aad2dd
fix note box syntax
nielsams Apr 29, 2025
7f6dd37
fix note box syntax
nielsams Apr 29, 2025
c8a201a
shorten parameter name
nielsams Apr 29, 2025
9026850
metadata updates
nielsams Apr 29, 2025
2fb2f69
requested metadata updates
nielsams Apr 29, 2025
0d3b99b
update incorrect codeowners path
nielsams Apr 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -324,4 +324,6 @@

/src/azext_durabletask/ @RyanLettieri

/src/acat @qinqingxu @Sherylueen @yongxin-ms @wh-alice
/src/acat @qinqingxu @Sherylueen @yongxin-ms @wh-alice

/src/zones/ @nielsams
8 changes: 8 additions & 0 deletions src/zones/HISTORY.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. :changelog:

Release History
===============

1.0.0b1
++++++
* Initial preview release.
69 changes: 69 additions & 0 deletions src/zones/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Microsoft Azure CLI 'zones' Extension

This package is for the 'zones' extension.
i.e. 'az zones'

This CLI Extension helps validate the zone redundancy status of resources within a specific scope.
For each resource, one of the following statuses will be returned:
Unknown # Unable to verify status. You'll need to check the resource manually.
Yes # Resource is configured for zone redundancy
Always # Resource is always zone redundant, no configuration needed
No # Resource is not configured for zone redundancy, but could be in another configuration
Never # Resource cannot be configured for zone redundancy
Dependent # Resource is zone redundant if parent or related resource is zone redundant
NoZonesInRegion # The region the resource is deployed in does not have Availability Zones

> [!NOTE]
> This extension is in active development. While an effort has been made to include the most common resource types and their zone redundancy configuration, there are still plenty of resource types missing. More will be added in future releases. In the meantime, if you need specific resources added or have found errors, please raise a Github issue.

## When should you use this?

In order to build a fully zone redundant application, you need to satisfy three criteria:

1) Enable zone redundancy on all PaaS resources in the application
2) Ensure zonal resources are spread across all zones. These are the resources that take a 'zones' attribute in their definition.
3) Validate that your application code is able to handle the loss of a zone, e.g. that connections are retried properly when a dependency is unreachable.

The _zones_ CLI extension can help with the first two steps. By running this against a specific resource group that contains your production resources, you can be sure that you have not overlooked any resources in your quest for zone redundancy. If the results show 'No' on one of your resources, that means that you need to change the configuration to enable ZR. If it shows 'Never', that probably means you need to deploy multiple of those resources to the different zones manually.

The third step can be validated using Chaos Engineering practices. On Azure, look into Chaos Studio to get started with that.

Suggested use for this extension:
- Manually run this against the production subscription or resource group(s) to validate that all resources have zone redundanct enabled
- Run this as part of your CI/CD pipelines, validating zone redundancy of the resources after deployment in the (pre-)production environment. Consider failing the pipeline if any of the resource results contains _No_ as the result. Note that _no_ only occurs in cases where zone redundancy was not enabled, but could be if the resource was configured differently.

## USAGE

Validate all resources in current scope to which you have read access:

```bash
az zones validate
```

Get the results in human-readable table format:

```bash
az zones validate --output table
```

Validate all resources in specific resource groups to which you have read access:

```bash
az zones validate --resource-groups <resource_group1>,<resource_group2>,...
```

Omit 'dependent' resources from the output. These are resources that by themselves cannot be zone redundant, but take on the status of their parent or related resource. This can be useful for improving readability of the results:

```bash
az zones validate --omit-dependent-resources
```

## Important Notes

- The extension still has missing resource types. These are shown as _Unknown_ in the results. It is essential that you validate zone redundancy of these resources yourself, since your whole application is only zone redundant is all resources are zone redundant.

- The _zones_ CLI extension can only help with resources you can view, i.e. for which you have read access. You must ensure that all relevant resources are indeed listed in the results.

- While this extension is a useful tool in validating zone redundancy on resources, you are still responsible for reviewing the [Reliability Guides](https://learn.microsoft.com/azure/reliability/overview-reliability-guidance) for all the services you use in your applications, as these may contain important information regarding operation in high availability scenarios. Ultimately, the product reliability guides are the authoritative source for zone redundancy guidance.

- Zonal services are considered to be Zone Redundant if they are deployed to at least 2 zones.
41 changes: 41 additions & 0 deletions src/zones/azext_zones/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------


import importlib
from pathlib import Path
from azure.cli.core import AzCommandsLoader
from azext_zones._help import helps # pylint: disable=unused-import

# Import all the resource type validator modules dynamically:
validators_dir = Path(__file__).parent / "resource_type_validators"
for file in validators_dir.glob("*.py"):
if file.name != "__init__.py":
module_name = f".resource_type_validators.{file.stem}"
importlib.import_module(module_name, package=__package__)


class ZonesCommandsLoader(AzCommandsLoader):

def __init__(self, cli_ctx=None):
from azure.cli.core.commands import CliCommandType
from azext_zones._client_factory import cf_zones
zones_custom = CliCommandType(
operations_tmpl='azext_zones.custom#{}',
client_factory=cf_zones)
super(ZonesCommandsLoader, self).__init__(cli_ctx=cli_ctx,
custom_command_type=zones_custom)

def load_command_table(self, args):
from azext_zones.commands import load_command_table
load_command_table(self, args)
return self.command_table

def load_arguments(self, command):
from azext_zones._params import load_arguments
load_arguments(self, command)


COMMAND_LOADER_CLS = ZonesCommandsLoader
124 changes: 124 additions & 0 deletions src/zones/azext_zones/_argHelper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
import json
from collections import OrderedDict
from knack.util import todict
from knack.log import get_logger

from .vendored_sdks.resourcegraph.models import ResultTruncated
from .vendored_sdks.resourcegraph.models import QueryRequest, QueryRequestOptions, QueryResponse, ResultFormat, Error
from azure.cli.core._profile import Profile
from azure.core.exceptions import HttpResponseError
from azure.cli.core.azclierror import BadRequestError, AzureInternalError


__SUBSCRIPTION_LIMIT = 1000
__MANAGEMENT_GROUP_LIMIT = 10
__logger = get_logger(__name__)


def build_arg_query(resource_groups, attributes):
# type: (list[str], list[str]) -> str

query = "Resources"
if resource_groups is not None and len(resource_groups) > 0:
query += " | where resourceGroup in ({0})".format(','.join(f"'{item}'" for item in resource_groups.split(',')))

if attributes is not None and len(attributes) > 0:
query += " | project {0}".format(', '.join(attributes))

return query


def execute_arg_query(
client, graph_query, first, skip, subscriptions, management_groups, allow_partial_scopes, skip_token):

mgs_list = management_groups
if mgs_list is not None and len(mgs_list) > __MANAGEMENT_GROUP_LIMIT:
mgs_list = mgs_list[:__MANAGEMENT_GROUP_LIMIT]
warning_message = "The query included more management groups than allowed. "\
"Only the first {0} management groups were included for the results. "\
"To use more than {0} management groups, "\
"see the docs for examples: "\
"https://aka.ms/arg-error-toomanysubs".format(__MANAGEMENT_GROUP_LIMIT)
__logger.warning(warning_message)

subs_list = None
if mgs_list is None:
subs_list = subscriptions or _get_cached_subscriptions()
if subs_list is not None and len(subs_list) > __SUBSCRIPTION_LIMIT:
subs_list = subs_list[:__SUBSCRIPTION_LIMIT]
warning_message = "The query included more subscriptions than allowed. "\
"Only the first {0} subscriptions were included for the results. "\
"To use more than {0} subscriptions, "\
"see the docs for examples: "\
"https://aka.ms/arg-error-toomanysubs".format(__SUBSCRIPTION_LIMIT)
__logger.warning(warning_message)

response = None
try:
result_truncated = False

request_options = QueryRequestOptions(
top=first,
skip=skip,
skip_token=skip_token,
result_format=ResultFormat.object_array,
allow_partial_scopes=allow_partial_scopes
)

request = QueryRequest(
query=graph_query,
subscriptions=subs_list,
management_groups=mgs_list,
options=request_options)
response = client.resources(request) # type: QueryResponse
if response.result_truncated == ResultTruncated.true:
result_truncated = True

if result_truncated and first is not None and len(response.data) < first:
__logger.warning("Unable to paginate the results of the query. "
"Some resources may be missing from the results. "
"To rewrite the query and enable paging, "
"see the docs for an example: https://aka.ms/arg-results-truncated")

except HttpResponseError as ex:
if ex.model.error.code == 'BadRequest':
raise BadRequestError(json.dumps(_to_dict(ex.model.error), indent=4)) from ex

raise AzureInternalError(json.dumps(_to_dict(ex.model.error), indent=4)) from ex

result_dict = dict()
result_dict['data'] = response.data
result_dict['count'] = response.count
result_dict['total_records'] = response.total_records
result_dict['skip_token'] = response.skip_token

return result_dict


def _get_cached_subscriptions():
# type: () -> list[str]

cached_subs = Profile().load_cached_subscriptions()
return [sub['id'] for sub in cached_subs]


def _to_dict(obj):
if isinstance(obj, Error):
return _to_dict(todict(obj))

if isinstance(obj, dict):
result = OrderedDict()

# Complex objects should be displayed last
sorted_keys = sorted(obj.keys(), key=lambda k: (isinstance(obj[k], dict), isinstance(obj[k], list), k))
for key in sorted_keys:
if obj[key] is None or obj[key] == [] or obj[key] == {}:
continue

result[key] = _to_dict(obj[key])
return result

if isinstance(obj, list):
return [_to_dict(v) for v in obj]

return obj
9 changes: 9 additions & 0 deletions src/zones/azext_zones/_client_factory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

def cf_zones(cli_ctx, _):
from azure.cli.core.commands.client_factory import get_mgmt_service_client
from .vendored_sdks.resourcegraph import ResourceGraphClient
return get_mgmt_service_client(cli_ctx, ResourceGraphClient)
19 changes: 19 additions & 0 deletions src/zones/azext_zones/_clients.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from azure.cli.core.util import send_raw_request
from azure.cli.core.commands.client_factory import get_subscription_id


# pylint: disable=too-few-public-methods
class MgmtApiClient():

def query(self, cmd, method, resource, api_version, requestBody):
management_hostname = cmd.cli_ctx.cloud.endpoints.resource_manager
sub_id = get_subscription_id(cmd.cli_ctx)
url_fmt = ("{}/subscriptions/{}/{}?api-version={}")
request_url = url_fmt.format(
management_hostname.strip('/'),
sub_id,
resource,
api_version)

r = send_raw_request(cmd.cli_ctx, method, request_url, body=requestBody)
return r.json()
25 changes: 25 additions & 0 deletions src/zones/azext_zones/_help.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# coding=utf-8
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

from knack.help_files import helps # pylint: disable=unused-import


helps['zones'] = """
type: group
short-summary: Commands to validate Availability Zone Configuration. Use one of the options below.
"""

helps['zones validate'] = """
type: command
short-summary: Validates zone redundancy status of all resources in the current subscription context for which you have read access.
examples:
- name: Validate zone redundancy status of all resources in the specified resource group
text: |-
az zones validate --resource-groups myProductionRG --omit-dependent
- name: Validate zone redundancy status of all resources in the specified resource group, but omit the dependent/child resources
text: |-
az zones validate --resource-groups myProductionRG --omit-dependent
"""
41 changes: 41 additions & 0 deletions src/zones/azext_zones/_locationDataHelper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from ._clients import MgmtApiClient
from knack.log import get_logger


class LocationDataHelper:

_location_data = None
_logger = None

def __init__(self, cmd):
self.cmd = cmd
self._logger = get_logger(__name__)

def fetch_location_data(self):
if not LocationDataHelper._location_data:
# query(cls, cmd, method, resource, api-version, requestBody):
LocationDataHelper._location_data = MgmtApiClient.query(self,
self.cmd,
"GET",
"locations",
"2022-12-01",
None
)

self._logger.debug("Loaded location data successfully.")

def region_has_zones(self, region):
if LocationDataHelper._location_data is None:
return None

# While 'global' is not a valid region, we want to return true for global resources
if region == 'global':
return True

if LocationDataHelper._location_data:
location_data = LocationDataHelper._location_data.get('value', [])
for location in location_data:
if location['name'].lower() == region.lower():
return 'availabilityZoneMappings' in location

return None
14 changes: 14 additions & 0 deletions src/zones/azext_zones/_params.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
# pylint: disable=line-too-long

from azure.cli.core.commands.parameters import get_three_state_flag


def load_arguments(self, _):

with self.argument_context('zones validate') as c:
c.argument('resource_group_names', options_list=['--resource-groups', '-g'], help='Name of the resource groups, comma separated.', required=False)
c.argument('omit_dependent', options_list=['--omit-dependent'], help='Omit dependent resources from validation.', arg_type=get_three_state_flag(), required=False)
Loading
Loading