This document outlines the standards and best practices for designing, building, and operating microservices-based applications at Bayat. Following these guidelines ensures consistent, maintainable, and scalable microservices architectures.
- Microservices Principles
- Service Design
- Communication Patterns
- Data Management
- API Design
- Service Discovery
- Deployment and Scaling
- Monitoring and Observability
- Security
- Testing
- DevOps Practices
- Service Templates
- Governance
- Migration Strategies
All microservices-based applications at Bayat should adhere to the following core principles:
- Single Responsibility: Each service should focus on a specific business capability
- Autonomy: Services should be developed, deployed, and scaled independently
- Resilience: Services should be designed to gracefully handle failures
- Decentralization: Avoid shared databases and centralized governance
- Observable: Services must expose metrics, logs, and traces
- Evolutionary Design: Design for change and continuous refactoring
- Domain-Driven: Align services with business domains
- Automation: Maximize automation for development, testing, and operations
Define service boundaries based on:
- Business capabilities and domains
- Team structure and ownership
- Data cohesion and access patterns
- Change frequency and scalability requirements
A well-designed service should:
- Encapsulate a clear business capability
- Own its data and expose it only through well-defined interfaces
- Be independently deployable and testable
- Have a focused and maintainable codebase (< 10,000 LOC guideline)
Consider the following factors when determining service size:
- Too Large: Multiple domains, overlapping concerns, complex interfaces, diverse data needs
- Too Small: Tight coupling to other services, excessive inter-service communication, limited functionality
- Just Right: Clear responsibility, manageable codebase, reasonable interface, team ownership
Standardize on the following service patterns:
- API Services: Expose functionality to clients and other services
- Processing Services: Handle background and asynchronous workloads
- Integration Services: Adapt and translate between external systems
- Aggregation Services: Combine multiple service responses for clients
- Infrastructure Services: Provide common platform capabilities
For real-time interactions:
- Use REST for simple CRUD operations and queries
- Use GraphQL for complex data queries and aggregations
- Use gRPC for high-performance internal service communication
- Implement proper timeouts, retries, and circuit breakers
- Document all synchronous APIs comprehensively
For event-driven and decoupled operations:
- Use message queues (RabbitMQ, Amazon SQS) for task distribution
- Use event streaming (Kafka, Kinesis) for event sourcing and analysis
- Implement idempotent receivers to handle duplicate events
- Design for at-least-once delivery semantics
- Maintain event schemas and versioning
Scenario | Recommended Pattern |
---|---|
User-initiated actions | Synchronous REST/GraphQL |
Data querying | Synchronous REST/GraphQL |
Long-running operations | Asynchronous with callbacks |
System events | Event streaming |
Cross-service workflows | Orchestration service or choreography |
High-volume data processing | Event streaming |
- All service-to-service communication must be encrypted (TLS)
- Authentication required for all service calls
- Use standard headers for tracing, correlation IDs, and tenant information
- Implement graceful degradation when dependent services are unavailable
- Each service owns and manages its data
- No direct database access from outside the service
- Data is only exposed through service APIs
- Consider CQRS for complex read/write patterns
Implement database-per-service pattern:
- Each service has its own logical database
- Services should not share database instances in production
- Choose the appropriate database type for each service (relational, NoSQL, time-series, etc.)
- Implement proper data backup and recovery procedures
For maintaining consistency across services:
- Prefer eventual consistency where possible
- Use the Saga pattern for distributed transactions
- Implement compensating transactions for failure recovery
- Consider event sourcing for complex state tracking
- Document consistency guarantees for each service
When duplicating data across services:
- Clearly document the source of truth
- Implement data synchronization mechanisms
- Use event-based updates to propagate changes
- Track synchronization health and alert on inconsistencies
All service APIs must follow these standards:
- Use HTTP status codes correctly
- Implement consistent error response formats
- Use hypermedia links when appropriate (HATEOAS)
- Follow RESTful resource naming conventions
- Version all APIs explicitly
Document all APIs using:
- OpenAPI (Swagger) for REST APIs
- GraphQL Schema with descriptions for GraphQL APIs
- Protocol Buffers with comments for gRPC APIs
- Include example requests and responses
- Document error conditions and handling
Implement proper API versioning:
- Use semantic versioning for API changes
- Include version in URL path (
/v1/resources
) - Support at least one previous version during transition periods
- Document deprecation timelines
- Provide migration guides for breaking changes
Use API gateways for:
- Request routing
- Authentication and authorization
- Rate limiting and throttling
- Response caching
- Analytics and monitoring
- Cross-cutting concerns (CORS, compression)
Implement service discovery using:
- DNS-based discovery for simplicity
- Service registry (Consul, etcd) for dynamic environments
- Load balancers for stable endpoints
- Kubernetes service discovery when applicable
For client-side discovery:
- Use service registries for real-time service information
- Implement health-check-aware client libraries
- Consider client-side load balancing for high-volume calls
- Cache service information with appropriate TTLs
For server-side discovery:
- Use load balancers or API gateways
- Configure proper health checks
- Implement proper failover mechanisms
- Document discovery endpoints
Package services as containers:
- Use Docker for containerization
- Create minimal container images
- Avoid storing sensitive data in containers
- Scan container images for vulnerabilities
- Follow container security best practices
Use Kubernetes for container orchestration:
- Deploy each service as a separate Kubernetes deployment
- Use namespaces for environment isolation
- Implement proper resource requests and limits
- Use horizontal pod autoscaling based on metrics
- Configure appropriate liveness and readiness probes
Define scaling policies for each service:
- Identify scaling metrics (CPU, memory, requests per second, queue depth)
- Set appropriate minimum and maximum instance counts
- Configure scaling thresholds based on performance testing
- Document scaling behaviors and limitations
- Test scaling under load
Example Kubernetes HPA configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: 500
Each service must expose the following metrics:
- Request count, latency, and error rates
- Resource utilization (CPU, memory, disk, network)
- Business-specific metrics
- Dependencies health and performance
- Queue depths and processing rates
Use Prometheus or similar systems for metrics collection and alerting.
Implement standardized logging:
- Use structured logging formats (JSON)
- Include correlation IDs in all logs
- Log at appropriate levels (ERROR, WARN, INFO, DEBUG)
- Centralize logs with Elasticsearch, Splunk, or similar
- Implement log retention policies
Log entry example:
{
"timestamp": "2023-03-12T15:22:31.123Z",
"level": "INFO",
"service": "order-service",
"instance": "order-service-7d9f6b5b9c-2jkvn",
"traceId": "4c0f8c2b-cb4e-11ed-afa1-0242ac120002",
"userId": "user-123",
"message": "Order created successfully",
"orderId": "order-456",
"orderValue": 129.99
}
Implement distributed tracing:
- Use OpenTelemetry for instrumentation
- Propagate trace context across service boundaries
- Sample traces based on environment and traffic volume
- Collect traces in Jaeger, Zipkin, or similar systems
- Analyze traces for performance optimization
Implement comprehensive health checks:
- Liveness: Basic check if service is running
- Readiness: Check if service can handle requests
- Dependency: Check status of critical dependencies
- Business logic: Verify critical functionalities
- Deep health: End-to-end verification of key flows
Example health check implementation:
@Controller('health')
export class HealthController {
constructor(
private db: DatabaseService,
private messageQueue: MessageQueueService,
) {}
@Get('liveness')
async liveness() {
return { status: 'UP' };
}
@Get('readiness')
async readiness() {
const dbStatus = await this.db.ping();
const mqStatus = await this.messageQueue.ping();
return {
status: dbStatus.healthy && mqStatus.healthy ? 'UP' : 'DOWN',
details: {
database: dbStatus,
messageQueue: mqStatus,
}
};
}
}
Implement robust authentication and authorization:
- Use OAuth 2.0 or OpenID Connect for authentication
- Implement role-based access control (RBAC)
- Use short-lived tokens and proper token validation
- Implement proper API key management for service-to-service communication
- Audit all authentication and authorization decisions
Secure the network layer:
- Encrypt all service-to-service communication (mutual TLS)
- Implement network policies to restrict traffic between services
- Use API gateways for external traffic
- Scan for network vulnerabilities regularly
- Document network topology and security controls
Follow secure coding practices:
- Validate all inputs
- Protect against common vulnerabilities (OWASP Top 10)
- Use secure dependencies and keep them updated
- Perform regular security scans of codebase and containers
- Implement proper error handling to avoid information disclosure
Manage secrets securely:
- Use dedicated secrets management solutions (HashiCorp Vault, AWS Secrets Manager)
- Never store secrets in code or configuration files
- Rotate secrets regularly
- Audit secret access
- Implement least privilege for secret access
Test individual service components:
- Aim for high test coverage (>80%)
- Mock external dependencies
- Focus on business logic and error handling
- Automate unit tests in CI/CD pipeline
Test service integrations:
- Test API contracts
- Verify database interactions
- Test messaging patterns
- Use test containers for dependencies
Test services in isolation:
- Deploy single service with test dependencies
- Verify all endpoints and functionality
- Test scaling and resource utilization
- Verify metrics and logging
Test complete service interactions:
- Deploy realistic service topology
- Test critical user journeys
- Verify distributed transactions
- Test failure scenarios and recovery
Verify service performance:
- Establish performance baselines
- Test maximum throughput
- Measure response times under load
- Identify bottlenecks
- Verify scaling behavior
Implement comprehensive CI/CD:
- Automated builds for every commit
- Unit and integration tests for every build
- Vulnerability scanning
- Static code analysis
- Automated deployments to development environments
- Controlled promotions to higher environments
Example CI/CD Pipeline:
# GitLab CI/CD pipeline example
stages:
- build
- test
- scan
- deploy-dev
- integration-test
- deploy-staging
- performance-test
- deploy-prod
build:
stage: build
script:
- docker build -t ${SERVICE_NAME}:${CI_COMMIT_SHA} .
- docker push ${SERVICE_NAME}:${CI_COMMIT_SHA}
unit-test:
stage: test
script:
- npm run test:unit
integration-test:
stage: test
script:
- npm run test:integration
security-scan:
stage: scan
script:
- trivy image ${SERVICE_NAME}:${CI_COMMIT_SHA}
- sonarqube-scanner
deploy-dev:
stage: deploy-dev
script:
- helm upgrade --install ${SERVICE_NAME} ./charts/${SERVICE_NAME} --set image.tag=${CI_COMMIT_SHA} -n development
environment:
name: development
# Additional stages for higher environments
Manage infrastructure through code:
- Use Terraform, CloudFormation, or similar for infrastructure provisioning
- Use Helm charts for Kubernetes deployments
- Version control all infrastructure code
- Review infrastructure changes
- Test infrastructure changes in non-production environments
Implement feature flags for controlled releases:
- Use centralized feature flag management
- Deploy code with features disabled by default
- Enable features gradually
- Monitor feature impact
- Support easy rollback through feature disablement
Provide standardized service templates for common patterns:
- REST API service
- Event processor service
- GraphQL service
- Background worker service
- Integration service
Each template should include:
- Basic service structure
- Standardized logging, metrics, and tracing
- Health checks
- Documentation templates
- CI/CD configuration
- Deployment manifests
Example Service Structure:
service-name/
├── src/
│ ├── api/ # API controllers and routes
│ ├── core/ # Core business logic
│ ├── config/ # Service configuration
│ ├── models/ # Data models
│ ├── repositories/ # Data access
│ ├── services/ # Business services
│ └── utils/ # Utilities
├── test/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── Dockerfile # Container definition
├── charts/ # Helm charts for deployment
│ └── service-name/
├── .gitlab-ci.yml # CI/CD configuration
├── package.json # Dependencies
└── README.md # Documentation
Develop and maintain shared libraries for common functionalities:
- Authentication and authorization
- Logging and observability
- Resilience patterns (circuit breakers, retries)
- Common data models
- API clients
Maintain a central service catalog:
- Document all services and their responsibilities
- Track service owners and dependencies a
- Monitor service health and SLAs
- Review service metrics and quality
Example Service Registry Entry:
name: order-service
description: Manages order creation and processing
team: commerce
owner: [email protected]
repository: https://git.bayat.io/commerce/order-service
api-documentation: https://docs.bayat.io/apis/order-service
dependencies:
- user-service
- product-service
- payment-service
sla:
availability: 99.95%
latency_p95: 300ms
technologies:
language: Node.js
framework: NestJS
database: MongoDB
messaging: RabbitMQ
Establish architecture review process:
- Review new service proposals
- Evaluate changes to service boundaries
- Assess technology choices
- Verify compliance with standards
- Provide design feedback and guidance
Continuously improve standards:
- Gather feedback from development teams
- Review and update standards quarterly
- Communicate changes clearly
- Provide migration paths for existing services
- Run architecture town halls for knowledge sharing
Guidelines for decomposing monoliths:
- Start with domains: Identify bounded contexts in the monolith
- Strangler pattern: Gradually replace monolith functionality
- Data separation: Extract service-specific data incrementally
- API facade: Create API layer over the monolith
- Prioritize value: Begin with high-value or problematic areas
Recommended implementation sequence:
- Create the new service with its own database
- Implement data synchronization from monolith to service
- Redirect reads to the new service
- Redirect writes to the new service
- Migrate historical data
- Remove functionality from the monolith
Address common migration challenges:
- Shared data: Use data replication or views initially
- Transactions: Implement sagas for distributed transactions
- Authentication: Create unified authentication service
- Deployment: Use blue/green deployments for seamless transition
- Testing: Create comprehensive test suite before migration
Version | Date | Description |
---|---|---|
1.0 | 2025-03-20 | Initial version |