-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Area
Infrastructure (Bicep/infra)
Chore Type
Documentation
Description
The infra/main.bicep file has insufficient documentation that makes it difficult for developers to understand, maintain, and extend the infrastructure. Key documentation gaps include:
- Missing comprehensive header documentation explaining the template's purpose and architecture
- Inadequate parameter descriptions lacking usage examples and validation requirements
- No documentation of resource dependencies and deployment order
- Missing inline comments explaining complex logic and calculations
- No examples of typical parameter configurations for different environments
- Lack of troubleshooting guidance for common deployment issues
- No change impact documentation for breaking changes
Justification
Comprehensive documentation reduces onboarding time for new team members, prevents deployment errors through clear guidance, enables faster troubleshooting, supports compliance and audit requirements, and aligns with Azure Well-Architected Framework operational excellence principles.
Acceptance Criteria
- Add comprehensive template header with purpose, architecture overview, and usage instructions
- Enhance all parameter descriptions with examples, validation rules, and impact warnings
- Document resource dependencies and deployment sequence
- Add inline comments explaining complex logic and business rules
- Create parameter configuration examples for different deployment scenarios
- Add troubleshooting section with common issues and solutions
- Document breaking changes and migration guidance
- Include links to relevant Azure documentation and best practices
- Add version history and changelog documentation
- Update
infrastructure-deployment-bicep-avm.mdspecification with documentation requirements
Resolution Documentation
Step 1: Add Comprehensive Template Header
/*
=============================================================================
AZURE AI FOUNDRY JUMPSTART - MAIN INFRASTRUCTURE TEMPLATE
=============================================================================
PURPOSE:
This Bicep template deploys a complete Azure AI Foundry environment with
supporting infrastructure including networking, security, storage, and AI services.
ARCHITECTURE:
- Azure AI Foundry Hub and Projects for AI/ML workloads
- Azure Cognitive Services for AI capabilities
- Azure Storage Account for data and model storage
- Virtual Network with private endpoints for security
- Key Vault for secrets management
- Log Analytics for monitoring and diagnostics
DEPLOYMENT MODES:
- Public: Resources accessible from internet with IP restrictions
- Private: Resources only accessible through private networking
- Hub-Only: Deploy only the hub without projects
- Full: Complete deployment with sample projects and configurations
PREREQUISITES:
- Azure subscription with appropriate permissions
- Resource group (created automatically if not exists)
- Valid Azure AD principal ID for role assignments
- Network configuration planning for private deployments
ESTIMATED DEPLOYMENT TIME: 15-30 minutes
ESTIMATED COST: $50-200/month depending on configuration
VERSION: 2.0.0
LAST UPDATED: 2024-01-15
COMPATIBILITY: Bicep 0.24+, Azure CLI 2.57+, PowerShell 7.0+
AUTHORS: Azure AI Foundry Team
REPOSITORY: https://github.com/PlagueHO/azure-ai-foundry-jumpstart
DOCUMENTATION: https://aka.ms/aifoundry-jumpstart-docs
*/Step 2: Enhanced Parameter Documentation
@description('''
Environment name used for resource naming and tagging.
USAGE: Short identifier for the deployment environment (dev, test, prod, etc.)
NAMING PATTERN: Resources will be named as {workload}-{environment}-{resourceType}-{instance}
EXAMPLES:
- "dev" -> Results in resources like "aifoundry-dev-st-001"
- "prod" -> Results in resources like "aifoundry-prod-st-001"
CONSTRAINTS:
- Must be 1-10 characters
- Only lowercase letters and numbers allowed
- Cannot start or end with a number
- Avoid Azure reserved words
IMPACT: Changing this value will create new resources and may cause data loss
''')
@minLength(1)
@maxLength(10)
@pattern('^[a-z]([a-z0-9]*[a-z])?$')
param environmentName string
@description('''
Azure AD Object ID of the user or service principal for role assignments.
USAGE: Principal that will have administrative access to deployed resources
HOW TO FIND:
- Azure CLI: az ad signed-in-user show --query id -o tsv
- PowerShell: (Get-AzContext).Account.ExtendedProperties.HomeAccountId.Split('.')[0]
- Azure Portal: Azure Active Directory > Users > Select user > Object ID
PERMISSIONS GRANTED:
- Cognitive Services Contributor on AI services
- Storage Blob Data Contributor on storage account
- Key Vault Administrator on key vault
- Machine Learning Workspace Contributor on AI Foundry hub
SECURITY NOTE: This principal will have significant permissions. Use principle of least privilege.
''')
@pattern('^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$')
param principalId string
@description('''
Array of public IPv4 addresses or CIDR ranges for Azure AI Foundry network access control.
USAGE: Restricts access to AI Foundry services to specified IP addresses/ranges
FORMAT: ["192.168.1.1", "10.0.0.0/24", "203.0.113.0/28"]
EXAMPLES:
- Single IP: ["203.0.113.1"]
- Multiple IPs: ["203.0.113.1", "203.0.113.2"]
- CIDR range: ["10.0.0.0/24"]
- Mixed: ["203.0.113.1", "10.0.0.0/24"]
DEFAULT BEHAVIOR: Empty array allows access from any IP (not recommended for production)
SECURITY IMPACT: Only specified IPs can access AI Foundry services
TROUBLESHOOTING: If you can't access services, verify your public IP is in this list
PRIVATE NETWORKING: When deployPrivateNetworking=true, this setting is supplementary to private access
''')
param aiFoundryIpAllowList array = []Step 3: Resource Dependency Documentation
/*
=============================================================================
RESOURCE DEPLOYMENT SEQUENCE AND DEPENDENCIES
=============================================================================
DEPLOYMENT ORDER:
1. Virtual Network and Subnets (if deployPrivateNetworking=true)
2. Network Security Groups
3. Storage Account
4. Key Vault
5. Log Analytics Workspace
6. Azure AI Foundry Hub (depends on storage, key vault, log analytics)
7. Private Endpoints (if deployPrivateNetworking=true)
8. Cognitive Services (AI Foundry account)
9. AI Foundry Projects (depends on hub and cognitive services)
10. Sample Data Upload (if deploySampleData=true)
CRITICAL DEPENDENCIES:
- AI Foundry Hub REQUIRES: Storage Account, Key Vault, Log Analytics
- AI Foundry Projects REQUIRE: AI Foundry Hub, Cognitive Services
- Private Endpoints REQUIRE: Virtual Network, Target Resources
- Role Assignments REQUIRE: Principal ID and Target Resources
PARALLEL DEPLOYMENT:
- Storage Account, Key Vault, and Log Analytics can deploy in parallel
- Network resources can deploy in parallel with other services
- AI Foundry Projects can deploy in parallel once hub is ready
CONDITIONAL DEPLOYMENTS:
- Private networking resources only deploy when deployPrivateNetworking=true
- Sample projects only deploy when aiFoundryProjectsFromJson=true
- OpenAI models only deploy when deploySampleOpenAiModels=true
*/Step 4: Complex Logic Documentation
// RESOURCE NAMING LOGIC
// Uses Azure Cloud Adoption Framework naming conventions with abbreviations
// Pattern: {workload}-{environment}-{resourceType}-{instance}
var resourceNamePrefix = 'aif-${environmentName}'
var uniqueSuffix = take(uniqueString(resourceGroup().id), 6) // Ensures global uniqueness
/*
STORAGE ACCOUNT NAMING CHALLENGE:
Storage accounts have strict naming requirements:
- Must be globally unique across ALL of Azure
- 3-24 characters, lowercase letters and numbers only
- Cannot contain hyphens or other special characters
SOLUTION: Combine abbreviated prefix + environment + unique hash
*/
var storageAccountName = 'st${environmentName}aif${uniqueSuffix}'
/*
CONDITIONAL DEPLOYMENT LOGIC:
Many resources are conditionally deployed based on feature flags.
This allows the template to support multiple deployment scenarios
from a single template while maintaining simplicity.
PATTERN USED:
resource conditionalResource 'Microsoft.Resource/type@version' = if (deployCondition) {
// Resource definition
}
BENEFITS:
- Single template for all scenarios
- Reduced complexity compared to multiple templates
- Easier maintenance and testing
*/
// Private networking is only deployed when explicitly enabled
var deployPrivateEndpoints = deployPrivateNetworking && deployKeyVault && deployLogAnalytics
/*
SAMPLE DATA CONDITIONAL LOADING:
Sample configurations are loaded from JSON files only when needed.
This prevents template errors when files don't exist in minimal deployments.
LOADING PATTERN:
var conditionalData = condition ? loadJsonContent('./file.json') : []
ERROR PREVENTION:
Always check the condition before loading to prevent file not found errors
during template validation.
*/
var aiFoundryProjects = aiFoundryProjectsFromJson ?
loadJsonContent('./sample-ai-foundry-projects.json') : []Step 5: Configuration Examples
/*
=============================================================================
COMMON PARAMETER CONFIGURATIONS
=============================================================================
DEVELOPMENT ENVIRONMENT:
{
"environmentName": "dev",
"principalId": "12345678-1234-1234-1234-123456789012",
"deployPrivateNetworking": false,
"aiFoundryIpAllowList": ["203.0.113.0/24"],
"deploySampleData": true,
"aiFoundryProjectsFromJson": true
}
PRODUCTION ENVIRONMENT:
{
"environmentName": "prod",
"principalId": "12345678-1234-1234-1234-123456789012",
"deployPrivateNetworking": true,
"aiFoundryIpAllowList": ["10.0.0.0/8", "172.16.0.0/12"],
"deploySampleData": false,
"aiFoundryProjectsFromJson": false,
"enableDiagnosticSettings": true
}
MINIMAL DEPLOYMENT (Hub Only):
{
"environmentName": "minimal",
"principalId": "12345678-1234-1234-1234-123456789012",
"deployPrivateNetworking": false,
"aiFoundryIpAllowList": [],
"deploySampleData": false,
"aiFoundryProjectsFromJson": false,
"deployKeyVault": false,
"deployLogAnalytics": false
}
ZERO-TRUST SECURITY:
{
"environmentName": "secure",
"principalId": "12345678-1234-1234-1234-123456789012",
"deployPrivateNetworking": true,
"aiFoundryIpAllowList": [],
"deploySampleData": false,
"enableDiagnosticSettings": true,
"keyVaultPurgeProtectionEnabled": true
}
*/Step 6: Troubleshooting Guide
/*
=============================================================================
COMMON DEPLOYMENT ISSUES AND SOLUTIONS
=============================================================================
ERROR: "Storage account name already exists"
CAUSE: Storage account names must be globally unique
SOLUTION: Change environmentName parameter or delete existing storage account
PREVENTION: Use unique environment names for each deployment
ERROR: "Principal ID not found or insufficient permissions"
CAUSE: Invalid principal ID or user lacks necessary permissions
SOLUTION:
1. Verify principal ID format (GUID)
2. Ensure user has Contributor role on subscription/resource group
3. For service principals, ensure proper API permissions
COMMAND: az ad signed-in-user show --query id -o tsv
ERROR: "Private endpoint subnet not found"
CAUSE: Private networking enabled but subnet doesn't exist
SOLUTION: Ensure deployPrivateNetworking=true creates required subnets
PREVENTION: Deploy virtual network resources before private endpoints
ERROR: "AI Foundry hub deployment failed"
CAUSE: Dependencies not ready (storage, key vault, or log analytics)
SOLUTION: Enable all required dependencies or wait for sequential deployment
COMMAND: az deployment group show --name <deployment-name> --resource-group <rg>
ERROR: "Access denied to Key Vault"
CAUSE: Principal doesn't have Key Vault permissions
SOLUTION: Verify principalId has Key Vault Administrator role
TIMING: Role assignments can take 5-10 minutes to propagate
ERROR: "IP address not in allow list"
CAUSE: Your IP address is not in aiFoundryIpAllowList
SOLUTION: Add your public IP to the allow list
COMMAND: curl -s https://api.ipify.org (to find your public IP)
PERFORMANCE ISSUES:
- Large file uploads may timeout: Increase timeout values
- Many projects deployment slow: Deploy projects in smaller batches
- Private endpoint DNS resolution: Allow time for DNS propagation (5-15 minutes)
*/Step 7: Version History Documentation
/*
=============================================================================
VERSION HISTORY AND BREAKING CHANGES
=============================================================================
VERSION 2.0.0 (Current):
- Added support for Azure AI Foundry (new service)
- Migrated from Azure ML Workspace to AI Foundry Hub
- Added private networking support
- Implemented Azure Verified Modules (AVM)
- BREAKING: Parameter names changed to camelCase convention
- BREAKING: Resource naming pattern changed
VERSION 1.5.0:
- Added Key Vault integration
- Enhanced security with private endpoints
- Added diagnostic settings support
- Improved error handling and validation
VERSION 1.0.0:
- Initial release with basic Azure ML Workspace
- Storage account and cognitive services integration
- Basic networking support
MIGRATION FROM 1.x TO 2.0:
1. Update parameter names (snake_case -> camelCase)
2. Update resource references in dependent templates
3. Test deployment in non-production environment
4. Plan for resource recreation (some resources will be replaced)
5. Backup data before migration
DEPRECATION NOTICES:
- Azure ML Workspace direct deployment (use AI Foundry Hub instead)
- Legacy parameter names (will be removed in v3.0)
- Hard-coded resource names (use parameter overrides)
*/Side Effects
- File Size: Adding comprehensive documentation will increase template file size
- Maintenance: Documentation must be kept in sync with code changes
- Complexity: More comprehensive documentation may initially appear overwhelming to new users
- Translation: Multi-language support may be needed for global teams
- Tool Compatibility: Some deployment tools may have character limits on descriptions
Priority
Medium - Improves developer experience and maintainability
Additional Context
This addresses the technical debt category of insufficient documentation identified in the infrastructure audit. Good documentation is essential for team collaboration, onboarding, troubleshooting, and compliance. The Azure Well-Architected Framework emphasizes documentation as a key component of operational excellence. This enhancement will significantly improve the developer experience and reduce support overhead.