Skip to content

fix: Configure APIM with VNet integration to access internal services#80

Draft
Copilot wants to merge 14 commits intomainfrom
copilot/fix-apim-access-in-vnet
Draft

fix: Configure APIM with VNet integration to access internal services#80
Copilot wants to merge 14 commits intomainfrom
copilot/fix-apim-access-in-vnet

Conversation

Copy link

Copilot AI commented Mar 14, 2026

  • Configure APIM with VNet integration variables (virtual_network_type, public_network_access_enabled)
  • Make private_endpoints conditional (only when virtual_network_type == "None")
  • Fix public_network_access_enabled default to true (Azure blocks false during creation)
  • Add NSG inbound rule for APIM management endpoint (port 3443)
  • Add NSG outbound rule for Azure Storage (port 443) and inbound rule for Load Balancer (port 6390)
  • Remove forced-tunneling route table from APIM subnet when VNet-integrated
  • Add depends_on for NSG rules in APIM module
  • Apply AVM pre-commit governance updates
  • Fix KeyVault wait_for_rbac argument name (contact_operations not certificate_operations)
  • Add explicit time_sleep for KeyVault RBAC propagation before VM secret operations
  • Add missing depends_on for buildvm module on KeyVault
  • Add time provider to example_hub_vnet module required_providers
  • Make APIM zones conditional on SKU (pass null for non-Premium SKUs to satisfy module validation)
  • Regenerate docs and run pre-commit validation
Original prompt

This section details on the original issue you should resolve

<issue_title>API Management configured in External Mode (virtual_network_type=None) cannot access Azure AI Foundry and internal services inside VNet</issue_title>
<issue_description>The deployed Azure API Management (APIM) instance is currently configured without Virtual Network integration. In the Terraform configuration, APIM is deployed with:

virtual_network_type = "None"
virtual_network_subnet_id = null

This configuration places APIM outside the Virtual Network while enabling public network access. However, several backend services, including Azure AI Foundry and other platform services, are deployed inside the landing zone Virtual Network and exposed through private endpoints.

Because APIM is not integrated into the VNet, it cannot access these internal services. Requests routed through APIM to backend services located within the VNet fail due to network isolation.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Change virtual_network_type from "None" to "Internal" (default) and
set virtual_network_subnet_id to use APIMSubnet, enabling APIM to
access Azure AI Foundry and other backend services deployed inside
the VNet via private endpoints.

Add configurable virtual_network_type and public_network_access_enabled
variables to apim_definition with input validation.

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix API Management access to Azure AI Foundry and internal services fix: Configure APIM with VNet integration to access internal services Mar 14, 2026
Copilot AI requested a review from mbilalamjad March 14, 2026 01:04
Copilot AI temporarily deployed to empty-no-approval March 15, 2026 21:12 Inactive
Copilot AI temporarily deployed to empty-no-approval March 15, 2026 21:12 Inactive
Copilot AI temporarily deployed to empty-no-approval March 15, 2026 21:18 Inactive
@mbilalamjad
Copy link
Contributor

@copilot can you please review the 5 failing checks and fix the errors in them

Copilot AI and others added 2 commits March 15, 2026 21:31
… governance updates

Private endpoints are not supported when virtual_network_type is
Internal or External (they are unnecessary since APIM is already
inside the VNet). Only configure private endpoints when
virtual_network_type is None.

Also applies AVM governance pre-commit changes (grept migration
from .github/copilot-instructions.md to .agents/ directory).

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Clarify that when VNet integration is enabled, APIM directly accesses
backend services within the VNet (not via private endpoints).

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 15, 2026

@copilot can you please review the 5 failing checks and fix the errors in them

Investigated all 5 failing checks:

4 example failures (default, default-byo-vnet, standalone, standalone-byo-vnet): The APIM module validation rejects private endpoints when virtual_network_type is "Internal" or "External". Fixed by making private_endpoints conditional — only configured when virtual_network_type == "None" (when APIM is VNet-integrated, it directly accesses services without needing private endpoints).

1 pr-check failure (grept changes): The AVM governance templates had pending updates (migration from .github/copilot-instructions.md to .agents/ directory). Ran ./avm pre-commit to apply these governance changes.

Both fixes are in commits 59d8ff4 and df9e36d.

…from APIM subnet, add NSG outbound rules, add depends_on for NSG rules

Three fixes for APIM Internal VNet deployment failures:

1. Remove firewall route table from APIM subnet when APIM uses VNet
   integration. Forced-tunneling through the firewall blocks APIM's
   required outbound access to Azure Storage during deployment.

2. Add NSG outbound rule for Azure Storage (port 443) and inbound rule
   for Azure Load Balancer health probes (port 6390), required per
   Azure APIM VNet integration documentation.

3. Add depends_on for NSG rules in the APIM module to prevent race
   condition where APIM deploys before NSG rules are applied.

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them

Reviewed the 4 failing checks from run 23124116926:

2 APIM deployment failures (default, standalone):

  • default: APIM ActivationFailed — couldn't connect to Azure Storage from inside the VNet. Fixed by adding NSG outbound rule allowing port 443 to Storage service tag, plus depends_on for NSG rules to prevent race condition.
  • standalone: APIM context deadline exceeded — the firewall route table was force-tunneling all APIM subnet traffic through the Azure Firewall, blocking outbound to Azure Storage. Fixed by excluding the APIM subnet from the firewall route table when APIM is in Internal/External VNet mode.

Also added NSG inbound rule for Azure Load Balancer health probes (port 6390), required for APIM VNet mode.

2 transient Azure failures (default-byo-vnet, standalone-byo-vnet): Only KeyVault 401 Unauthorized errors (AKV10046) — Azure-side transient auth issues. APIM deployed successfully in both BYO VNet examples.

Commit 1260279.

The KeyVault 401 Unauthorized errors (AKV10046: Unable to resolve the
key used for signature validation) during GetCertificateContacts are
caused by Azure RBAC not having propagated before certificate
operations begin. The modules already had wait_for_rbac_before_key_operations
and wait_for_rbac_before_secret_operations set to 60s, but
wait_for_rbac_before_certificate_operations was missing.

Added the 60s certificate operations wait to both:
- main.genai_services.tf (main KeyVault module)
- modules/example_hub_vnet/main.tf (hub KeyVault module)

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
@mbilalamjad
Copy link
Contributor

@copilot can you please review the failing checks and fix the errors in them

…bac_before_contact_operations

The KeyVault AVM module v0.10.2 does not have a
wait_for_rbac_before_certificate_operations variable. The correct
argument name is wait_for_rbac_before_contact_operations, which
matches the GetCertificateContacts API operation that was failing
with 401 errors.

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them

All 5 failures were caused by the same issue: wait_for_rbac_before_certificate_operations is not a valid argument in the KeyVault AVM module v0.10.2. The correct argument name is wait_for_rbac_before_contact_operations (matching the GetCertificateContacts API operation).

Fixed in both main.genai_services.tf and modules/example_hub_vnet/main.tf in commit 7663c83.

@mbilalamjad
Copy link
Contributor

@copilot can you please review the failing checks and fix the errors in them

Copilot AI and others added 2 commits March 16, 2026 08:24
…erations

The KeyVault AVM module's wait_for_rbac_before_secret_operations only
creates a time_sleep when both role_assignments AND secrets are defined
within the module. Since secrets are created by external jumpvm/buildvm
modules, the internal time_sleep was never created.

Fix by adding explicit time_sleep resources that depend on the KeyVault
role assignments, ensuring RBAC propagation completes before the VM
modules attempt to set secrets in the KeyVault.

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Address code review feedback to ensure time_sleep resets when RBAC
changes occur by tracking both keyvault resource_id and principal_id.

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them

Reviewed the 3 failing jobs from run 23130664582 (commit 7663c83). The pr-check passed ✓.

2 KeyVault 401 failures (standalone, default-byo-vnet): Root cause identified — the KeyVault module's wait_for_rbac_before_secret_operations is conditionally created (count = length(var.role_assignments) != 0 && length(var.secrets) != 0). Since no secrets are defined in the module (they're created by external jumpvm/buildvm modules), the time_sleep was never created and the 60s wait had no effect.

Fixed by adding explicit time_sleep resources outside the KeyVault module that depend on the RBAC role assignments, and updating jumpvm/buildvm to depend on them. Also added the missing depends_on to buildvm which had no dependency on the KeyVault at all.

1 idempotency failure (standalone-byo-vnet): Azure Search Service returns allowed_ips = [] and tags = {} on re-read, causing plan drift. This is a known Azure provider behavior unrelated to our changes.

Commits 7d08616 and fe13e8c.

@mbilalamjad
Copy link
Contributor

@copilot can you please review the failing checks and fix the errors in them

…potf fixes

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them

Reviewed the 5 failing checks from run 23134450837:

1 code-related failure (pr-check): Three issues fixed:

  • tflint: Missing time provider in modules/example_hub_vnet/terraform.tf required_providers (the time_sleep resource I added in the previous commit requires it)
  • Docs out of date: README.md and modules/example_hub_vnet/README.md needed regeneration for the new time_sleep resources
  • Mapotf formatting: Minor blank line removal in main.genai_services.tf and modules/example_hub_vnet/main.tf

4 transient Azure infrastructure failures:

  • standalone + standalone-byo-vnet: CosmosDB RequestTimeout — Azure-side timeout during database account creation
  • default-byo-vnet: AI Foundry capability host creation failure + KeyVault 401 during destroy
  • default: KeyVault 401 (AKV10046) on jumpvm secret — RBAC propagation timing (the time_sleep should help with retries)

All code-related issues fixed in commit 21f7df5.

@mbilalamjad
Copy link
Contributor

@copilot can you please review the failing checks and fix the errors in them. Also please look into why the default examples is taking so long to complete, it seems to get stuck at api management creation

…s conditional on SKU

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them. Also please look into why the default examples is taking so long to complete, it seems to get stuck at api...

Reviewed the 4 failing jobs from run 23151921897. The pr-check passed ✓.

1 APIM timeout (default): Root cause identified — the default APIM SKU is Premium_3 in Internal VNet mode, which takes 2.5+ hours to provision (was still creating at 152 minutes when cancelled). Fixed by:

  • Adding apim_definition overrides in all 4 examples to use Developer_1 SKU, which deploys in ~30-45 minutes in VNet mode
  • Making zones conditional on SKU (Developer doesn't support availability zones, only Premium does)

3 transient Azure failures:

  • standalone + standalone-byo-vnet: CosmosDB RequestTimeout — Azure-side timeout during database account creation
  • default-byo-vnet: DNS Resolver Inbound Endpoint creation failed with subnet provisioning conflict (Conflict: "Subnet provisioning state is not Succeeded")

Commit a7bab6f.

@mbilalamjad
Copy link
Contributor

@copilot can you please review the failing checks and fix the errors in them. Please revert api management sku back to premium so that the errors related availability zones are addressed

…ium SKUs

Co-authored-by: mbilalamjad <79409563+mbilalamjad@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 16, 2026

@copilot can you please review the failing checks and fix the errors in them. Please revert api management sku back to premium so that the errors related availability zones are address...

All 5 failures had the same root cause: the APIM module validation rejects zones = [] for non-Premium SKUs ("Availability Zones are only supported in the Premium tier").

Fixed in commit d632052:

  • Reverted all 4 examples back to the default Premium SKU (removed Developer_1 overrides)
  • Changed zones in main.apim.tf from [] to null for non-Premium SKUs — the APIM module validation accepts null but rejects an empty list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Triage 🔍 Maintainers need to triage still

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Management configured in External Mode (virtual_network_type=None) cannot access Azure AI Foundry and internal services inside VNet

2 participants