Version: 1.0 Last Updated: 2026-04-12
Agentic Microservices extend the traditional microservices pattern by embedding AI agent capabilities directly inside each service boundary. Instead of bolting a monolithic AI layer on top of existing services, each microservice carries its own agent runtime that can:
- Plan dynamically at runtime based on context (not just execute static call sequences)
- Use tools via the Model Context Protocol (MCP) to call other services
- Maintain memory across interactions using a tiered persistence strategy
- Route between models (SLM for simple queries, LLM for complex ones) based on real-time complexity assessment
- Self-heal by detecting, classifying, and remediating infrastructure incidents autonomously
This pattern is distinct from both traditional microservices (deterministic, code-defined flows) and monolithic AI gateways (single-point bottleneck for all AI interactions).
Holiday Peak Hub is a framework AND a product, not just a reference implementation. lib/holiday_peak_lib/ is an opinionated agentic-microservices framework with stable, versioned seams that architects fork and adopt. apps/ is a production-grade retail platform built on it — 26 agents running in production-shape under real SLOs, AGC weighted-canary routing, continuous evaluation, three-tier memory, and full observability. Both halves ship together. Distribution under Azure-Samples/ is a channel, not a quality tier.
Canonical positioning: .github/instructions/repository-purpose.instructions.md.
| Characteristic | Implementation |
|---|---|
| Framework layer | holiday-peak-lib provides BaseRetailAgent, AgentBuilder, ModelTarget, FastAPIMCPServer, three-tier memory, guardrails, routing strategy, evaluation runners, telemetry — versioned, contracted, designed for adoption |
| Product layer | 1 transactional microservice (crud-service) + 26 agent services across CRM/eCommerce/Inventory/Logistics/Product Management/Search/Truth Layer + 1 Next.js frontend |
| Microsoft Agent Framework (MAF) | agent-framework>=1.2.0 runs agent_framework.Agent in-process over a pluggable ChatClient (default agent_framework_foundry.FoundryChatClient). The retired portal-agent runtime path was removed in Wave 4c; Foundry remains the model-deployment, telemetry, and evaluation backend. |
| SLM-first routing | Every request starts with GPT-5-nano (fast, cheap); complex queries escalate to GPT-5 (rich) |
| Three-tier memory | Hot (Redis, <50ms) → Warm (Cosmos DB, 100-500ms) → Cold (Blob, archival) |
| Agent-to-agent communication | MCP protocol for structured tool calls between agents |
| Event-driven async | Azure Event Hubs for decoupled CRUD → Agent processing |
| GitOps deployment | Flux CD reconciles rendered Helm manifests to AKS with namespace isolation |
| Self-healing runtime | Incident lifecycle: detect → classify → remediate → verify → escalate |
| 1796 automated tests | 1136 lib + 660 app tests, 89% coverage, CI/CD enforced |
This reference architecture is built entirely on Microsoft's AI and cloud platform:
| Layer | Technology | Purpose | Documentation |
|---|---|---|---|
| AI Runtime | Microsoft Agent Framework | Agent execution, tool forwarding, message protocol | MAF Python API |
| AI Models | Azure AI Foundry | Model hosting, Agents V2 API, prompt governance | Foundry Agents quickstart |
| Search | Azure AI Search | Vector + hybrid search, semantic ranking | AI Search overview |
| Warm Memory | Azure Cosmos DB | User profiles, search history, agent state | Cosmos DB for NoSQL |
| Hot Memory | Azure Cache for Redis | Session state, real-time context | Redis quickstart |
| Cold Memory | Azure Blob Storage | Archival, catalog snapshots, images | Blob Storage overview |
| Messaging | Azure Event Hubs | Async CRUD → Agent event processing | Event Hubs Python SDK |
| Compute | Azure Kubernetes Service | Namespace-isolated agent pods with KEDA autoscaling | AKS overview |
| API Gateway | Azure API Management | Traffic management, auth, AI policies | APIM overview |
| CRUD Data | Azure Database for PostgreSQL | Transactional data (orders, products, users) | PostgreSQL Flexible Server |
| Secrets | Azure Key Vault | Connection strings, API keys, certificates | Key Vault overview |
| Observability | Azure Monitor + Application Insights | Distributed tracing, KQL queries | OpenTelemetry for Azure |
| GitOps | Flux CD on AKS | Continuous reconciliation, drift detection | Flux on AKS |
| Frontend | Azure Static Web Apps | Next.js 15 hosting with managed SSL | SWA overview |
| Identity | Microsoft Entra ID | JWT validation, RBAC, managed identity | Entra ID overview |
| CI/CD | GitHub Actions | Build, test, deploy workflows | Actions documentation |
Each agent service is a self-contained FastAPI application that:
- Owns its domain logic
- Hosts its own agent runtime (via
BaseRetailAgent) - Exposes REST endpoints for Frontend/CRUD and MCP tools for agents
- Subscribes to Event Hub topics for async processing
- Maintains independent memory namespaces
Every request starts with the fast (SLM) model. The agent evaluates complexity and only escalates to the rich (LLM) model when needed. This reduces cost by 60-80% while maintaining quality for complex queries.
Agents operate on context assembled from three tiers with different latency and cost profiles, read in parallel via asyncio.gather:
- Hot (Redis): Session state, <50ms reads, TTL-based expiry
- Warm (Cosmos DB): User profiles, search history, enrichment state, 100–500ms
- Cold (Blob Storage): Interaction logs, catalog snapshots, archival
CRUD operations publish domain events to Azure Event Hubs. Agents subscribe to relevant topics and process asynchronously:
Frontend → CRUD Service → Event Hubs → Agent(s)
↑ publish ↓ consume
(order.created) (enrich, classify, alert)
Agents expose tools via FastAPIMCPServer that other agents can invoke:
@mcp.tool()
async def get_inventory_status(sku: str) -> dict:
"""Check real-time stock level for an SKU."""
return await adapter.check_stock(sku)This enables compositional intelligence: the checkout agent can call the inventory agent's tools without tight coupling.
The EnrichmentGuardrail validates agent outputs before they reach downstream consumers:
- Schema conformance (Pydantic model validation)
- Content policy enforcement (no hallucinated data)
- Confidence thresholds (reject low-confidence enrichments)
- Audit trail (every decision logged with evidence)
All services are deployed via rendered Kubernetes manifests reconciled by Flux CD:
- CI builds container images → pushes to ACR
- Helm templates rendered → committed to manifests branch
- Flux detects changes → applies to AKS namespaces
- Health checks validate rollout → auto-rollback on failure
- Drift detection ensures desired state is maintained
The platform includes autonomous incident management:
- Detection: Health probes, Azure Monitor alerts, APIM diagnostics
- Classification: Incident type mapping (pod crash, memory pressure, model degradation)
- Remediation: Strategy-specific handlers (AKS restart, APIM circuit break, Redis flush)
- Verification: Post-remediation health check with configurable thresholds
- Escalation: Human notification when automated remediation fails
The 26 agent services are organized into 7 bounded contexts:
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#FFB3BA','primaryTextColor':'#000','primaryBorderColor':'#FF8B94','lineColor':'#BAE1FF','secondaryColor':'#BAE1FF','tertiaryColor':'#FFFFFF'}}}%%
graph TB
subgraph CRUD["CRUD Service (PostgreSQL)"]
API["REST API<br/>31 endpoints"]
end
subgraph ECOM["eCommerce Domain"]
CS["Catalog Search"]
PDE["Product Detail Enrichment"]
CI["Cart Intelligence"]
CKS["Checkout Support"]
OS["Order Status"]
end
subgraph CRM["CRM Domain"]
PA["Profile Aggregation"]
SP["Segmentation &<br/>Personalization"]
CAM["Campaign Intelligence"]
SA["Support Assistance"]
end
subgraph INV["Inventory Domain"]
AT["Alerts & Triggers"]
HC["Health Check"]
JIT["JIT Replenishment"]
RV["Reservation Validation"]
end
subgraph LOG["Logistics Domain"]
CSel["Carrier Selection"]
ETA["ETA Computation"]
RS["Returns Support"]
RID["Route Issue Detection"]
end
subgraph PM["Product Management Domain"]
ACP["ACP Transformation"]
ASO["Assortment Optimization"]
CSV["Consistency Validation"]
NC["Normalization &<br/>Classification"]
end
subgraph SRCH["Search Domain"]
SEA["Search Enrichment Agent"]
end
subgraph TRUTH["Truth Layer Domain"]
TI["Truth Ingestion"]
TE["Truth Enrichment"]
TH["Truth HITL"]
TX["Truth Export"]
end
UI["Next.js Frontend"] --> API
API -->|events| EH["Azure Event Hubs"]
EH --> ECOM & CRM & INV & LOG & PM & SRCH & TRUTH
API <-->|sync + circuit breaker| ECOM
Start with Architecture Overview and the ADR Index to understand the decision landscape. Review the MAF Integration Rationale for the agent runtime design.
Read the lib README to understand the micro-framework, then look at any app's main.py + adapters.py + agents.py for the service pattern. The Standalone Deployment Guide covers single-service deployment.
Start with the Infrastructure README and Deployment Guide. Review ADR-017 (Flux CD) and ADR-026 (Namespace Isolation) for the GitOps model.
Review the Foundry Agent Invocation Flow and MAF Integration Rationale. The SLM-first routing logic is in lib/src/holiday_peak_lib/agents/base_agent.py.
| Approach | Pros | Cons | When to Use |
|---|---|---|---|
| Agentic Microservices (this repo) | Domain isolation, independent scaling, per-agent memory | More services to operate, cross-agent latency | Multi-domain platforms with distinct AI capabilities per domain |
| Monolithic AI Gateway | Single deployment, centralized model management | Single point of failure, no domain isolation, all-or-nothing scaling | Simple chatbot or single-purpose AI applications |
| AI Sidecar Pattern | Minimal code changes to existing services | Limited agent autonomy, no inter-agent communication | Adding AI to legacy services without rewrite |
| Orchestrator Agent | Centralized planning, simpler routing | Bottleneck at orchestrator, hard to scale horizontally | Workflows with a single decision-maker |
| Topic | Official Documentation |
|---|---|
| Building AI agents with Microsoft Agent Framework | MAF Python SDK |
| Azure AI Foundry agent creation and management | Foundry Agents overview |
| Model Context Protocol (MCP) for tool integration | MCP specification |
| Microservices architecture on Azure | Azure Architecture Center — Microservices |
| Event-driven architecture patterns | Azure Architecture Center — Event-driven |
| AKS best practices | AKS baseline architecture |
| Cosmos DB data modeling | Cosmos DB data modeling |
| Azure Well-Architected Framework | WAF overview |
- Architecture Overview — System context and container views
- ADR Index — 35 architecture decision records
- MAF Integration Rationale — Why MAF is wrapped in the lib
- Standalone Deployment Guide — Single-service AKS deployment
- Lib README — Micro-framework API reference
- Infrastructure README — Bicep provisioning and AKS operations
- Project Status — Current state and recent changes
- Warm (Cosmos DB): Profiles and history, 100-500ms
- Cold (Blob): Archival data, seconds
The EnrichmentGuardrail enforces that AI-generated content is always grounded in company-owned data (PIM, DAM, CRM). Agents never generate without a verifiable internal data source.
Flux CD reconciles rendered Helm manifests from a manifests branch. Each domain gets its own Kubernetes namespace with RBAC boundaries and network policies (ADR-017, ADR-026).
An autonomous incident lifecycle (detect → classify → remediate → verify → escalate) handles infrastructure misconfigurations without human intervention, with audit trails and allowlisted remediation actions.
- Understand the architecture: Start with Solution Architecture Overview
- Explore the lib: Read lib/README.md for the shared framework
- Deploy a single service: Follow the Standalone Deployment Guide
- Deploy everything: Use
azd upwith the Deployment Guide - Run tests:
python -m pytestat the repository root (1796 tests)