Skip to content

Latest commit

 

History

History
675 lines (496 loc) · 22.5 KB

event-driven.md

File metadata and controls

675 lines (496 loc) · 22.5 KB

Event-Driven Architecture Patterns

This document outlines the standards, patterns, and best practices for implementing event-driven architectures at Bayat.

Table of Contents

Introduction

Event-driven architecture (EDA) is an architectural pattern that promotes the production, detection, consumption of, and reaction to events. An event can be defined as a significant change in state or an occurrence that is of interest to the system or its users.

Benefits of Event-Driven Architecture

  • Loose Coupling: Services communicate without direct dependencies
  • Scalability: Components can scale independently
  • Resilience: Failure in one component doesn't affect others directly
  • Flexibility: Easier to add new components that react to existing events
  • Real-time Processing: Enables immediate reaction to events as they occur

When to Use Event-Driven Architecture

Consider EDA when:

  • Building systems with asynchronous workflows
  • Implementing real-time features
  • Designing microservices that need to coordinate without tight coupling
  • Managing complex business processes across multiple services
  • Creating systems that need audit trails of all state changes

Core Concepts

Events

An event represents a fact that occurred at a specific point in time. Events are immutable and should be expressed in the past tense.

Event Types

  • Domain Events: Represent business-significant occurrences within a domain
  • Integration Events: Used for communication between different bounded contexts
  • Command Events: Represent instructions to perform actions
  • Query Events: Request information without changing state
  • Notification Events: Inform subscribed parties about system occurrences

Event Producers

Components that detect, capture, and publish events to an event broker.

Event Consumers

Components that subscribe to events and react to them.

Event Broker

Middleware that routes events from producers to consumers.

Event Store

Persistent storage of events, serving as the source of truth.

Event Patterns

Event Notification

The simplest pattern where a service publishes an event to notify other services about a change.

┌──────────┐    ┌──────────┐    ┌──────────┐
│ Producer │───>│  Broker  │───>│ Consumer │
└──────────┘    └──────────┘    └──────────┘

When to use: For simple coordination between services where consumers don't need complete information.

Event-Carried State Transfer

Events contain the complete relevant state, reducing the need for consumers to query back for additional data.

┌──────────┐    ┌──────────┐    ┌──────────┐
│ Producer │───>│  Broker  │───>│ Consumer │
└──────────┘    └──────────┘    └──────────┘
      │                               │
      │                               │
      v                               v
┌──────────┐                    ┌──────────┐
│ Producer │                    │ Consumer │
│ Database │                    │ Database │
└──────────┘                    └──────────┘

When to use: When consumers need comprehensive information and reducing network calls is important.

Event Sourcing

All changes to application state are stored as a sequence of events, which can be replayed to reconstruct current state.

┌──────────┐    ┌──────────┐    ┌──────────┐
│   User   │───>│ Commands │───>│  Domain  │
│   Input  │    │          │    │  Logic   │
└──────────┘    └──────────┘    └──────────┘
                                      │
                                      v
                                ┌──────────┐
                                │  Events  │
                                │          │
                                └──────────┘
                                      │
                                      v
                                ┌──────────┐
                                │  Event   │
                                │  Store   │
                                └──────────┘
                                      │
                                      v
                                ┌──────────┐
                                │  Read    │
                                │  Model   │
                                └──────────┘

When to use: When you need a complete audit trail, time-travel debugging, or complex event processing.

Command Query Responsibility Segregation (CQRS)

Separates read and write operations into different models, often with different data stores.

┌──────────┐                    ┌──────────┐
│  Write   │                    │   Read   │
│ Requests │                    │ Requests │
└──────────┘                    └──────────┘
      │                               │
      v                               v
┌──────────┐                    ┌──────────┐
│  Write   │                    │   Read   │
│  Model   │                    │  Model   │
└──────────┘                    └──────────┘
      │                               ▲
      v                               │
┌──────────┐    ┌──────────┐    ┌──────────┐
│  Events  │───>│  Event   │───>│ Projector│
│          │    │  Store   │    │          │
└──────────┘    └──────────┘    └──────────┘

When to use: When read and write workloads have significantly different requirements or scaling needs.

Saga Pattern

Coordinates a sequence of local transactions where each transaction updates data within a single service.

┌──────────┐    ┌──────────┐    ┌──────────┐
│  Step 1  │───>│  Step 2  │───>│  Step 3  │
└──────────┘    └──────────┘    └──────────┘
      │               │               │
      v               v               v
┌──────────┐    ┌──────────┐    ┌──────────┐
│Compensate│    │Compensate│    │Compensate│
│  Step 1  │<───│  Step 2  │<───│  Step 3  │
└──────────┘    └──────────┘    └──────────┘

When to use: For managing distributed transactions across multiple services.

Communication Patterns

Publish-Subscribe

Multiple consumers subscribe to event types and receive all events of those types.

Best practices:

  • Use topic-based routing for logical organization
  • Implement message filtering at the broker when possible
  • Design with multiple consumers in mind

Point-to-Point

Events are delivered to exactly one consumer from a pool of potential consumers.

Best practices:

  • Use for work distribution and load balancing
  • Ensure idempotent consumers for reliability
  • Consider message expiration policies

Request-Reply over Events

A synchronous interaction pattern implemented with asynchronous events.

Best practices:

  • Include correlation IDs to match replies with requests
  • Set appropriate timeouts
  • Have fallback mechanisms for missed replies

Event Schema Design

Event Structure

All events should follow a consistent structure:

{
  "eventId": "uuid-string",
  "eventType": "domain.entity.action.occurred",
  "eventTime": "ISO-8601 timestamp",
  "version": "schema-version",
  "producer": "service-name",
  "data": {
    // Event-specific payload
  },
  "metadata": {
    // Additional contextual information
    "correlationId": "uuid-string",
    "causationId": "uuid-string",
    "traceId": "uuid-string"
  }
}

Schema Evolution

Follow these principles for evolving event schemas:

  1. Backward Compatibility: New schema versions must accept data produced with older schemas
  2. Forward Compatibility: Old consumers should be able to process events from newer producers
  3. Additive Changes: Only add fields, don't remove or change existing ones
  4. Default Values: Provide sensible defaults for new fields
  5. Versioning: Include schema version in the event metadata

Schema Registry

Use a schema registry to:

  • Centrally manage event schemas
  • Validate event conformance
  • Enable schema evolution governance
  • Generate client code for various languages

Event Storage and Routing

Message Brokers

Choose the appropriate broker based on requirements:

  • Apache Kafka: For high-throughput, persistent event streaming
  • RabbitMQ: For complex routing requirements and traditional message patterns
  • AWS SNS/SQS: For AWS-native applications with moderate throughput
  • Google Pub/Sub: For GCP-native applications
  • Azure Event Hubs/Service Bus: For Azure-native applications

Event Store Considerations

When implementing an event store:

  • Append-Only: The event store should be an immutable, append-only log
  • Optimistic Concurrency: Use optimistic concurrency control for event streams
  • Snapshots: Implement snapshotting for performance with long event streams
  • Partitioning: Partition events by aggregate ID or other logical boundaries
  • Compression: Consider compression for storage efficiency

Error Handling

Message Processing Failures

  1. Dead Letter Queues (DLQ): Route unprocessable messages to a DLQ for later analysis
  2. Retry Policies: Implement exponential backoff for transient failures
  3. Circuit Breakers: Prevent cascading failures when downstream services fail
  4. Poison Message Handling: Identify and isolate messages that consistently cause failures

Event Processing Guarantees

Choose the appropriate guarantee level:

  • At-most-once: Events may be lost but never processed twice (lowest overhead)
  • At-least-once: Events are never lost but may be processed multiple times (requires idempotency)
  • Exactly-once: Events are processed exactly once (highest overhead, not always possible)

Idempotent Consumers

Design consumers to be idempotent by:

  1. Using natural idempotency when possible (e.g., setting vs. incrementing)
  2. Tracking processed event IDs to detect duplicates
  3. Using database transactions to make updates atomic
  4. Designing compensating actions for non-idempotent operations

Monitoring and Observability

Key Metrics to Monitor

  1. Latency: Time from event production to consumption
  2. Throughput: Events processed per time unit
  3. Error Rates: Failed events vs. successful events
  4. Queue Depths: Number of unprocessed events
  5. Processing Time: Time taken to process each event
  6. Dead Letter Counts: Number of events in DLQs

Event Tracing

Implement distributed tracing by:

  1. Propagating trace IDs through events
  2. Correlating events in the same business transaction
  3. Visualizing event flows across services
  4. Measuring time spent in each processing stage

Event Logging

Log the following information:

  1. Event metadata (ID, type, timestamp)
  2. Processing milestones (received, processed, forwarded)
  3. Errors with context
  4. Business-significant event details (sanitized of sensitive data)

Testing Strategies

Unit Testing Event Handlers

Test event handlers in isolation by:

  1. Mocking event sources
  2. Verifying correct state changes
  3. Ensuring idempotency with duplicate events
  4. Testing error handling paths

Integration Testing

Test the interaction between components by:

  1. Using embedded event brokers for testing
  2. Verifying event publication
  3. Confirming subscription and processing
  4. Testing schema compatibility

End-to-End Testing

Validate complete event flows by:

  1. Tracing events through the entire system
  2. Verifying eventual consistency
  3. Testing failure recovery
  4. Measuring performance characteristics

Implementation Guidelines

Event Storming

Use event storming workshops to:

  1. Identify domain events collaboratively
  2. Understand event flows and dependencies
  3. Define service boundaries around events
  4. Discover missing events or processes

Domain-Driven Design Integration

Align event-driven architecture with DDD by:

  1. Modeling events as domain concepts
  2. Using bounded contexts to define event ownership
  3. Designing aggregates that emit domain events
  4. Implementing context maps to show relationships between bounded contexts

Performance Considerations

Optimize for:

  1. Batching: Group events for efficient processing
  2. Partitioning: Distribute events for parallel processing
  3. Backpressure: Implement mechanisms to handle overload
  4. Caching: Cache reference data needed for event processing

Tooling and Infrastructure

Recommended Technologies

Based on project context, consider:

Event Brokers

  • Apache Kafka: For high-throughput event streaming
  • RabbitMQ: For traditional messaging patterns
  • NATS: For lightweight, high-performance messaging
  • Cloud Provider Services:
    • AWS: EventBridge, SNS, SQS, Kinesis
    • Azure: Event Hubs, Service Bus, Event Grid
    • GCP: Pub/Sub, Dataflow

Event Stores

  • EventStoreDB: Purpose-built event sourcing database
  • Apache Kafka: For event sourcing and stream processing
  • PostgreSQL/MySQL: With append-only tables for simpler implementations
  • MongoDB/DynamoDB: For document-based event storage

Stream Processing

  • Apache Kafka Streams: For JVM-based stream processing
  • Apache Flink: For complex event processing with exactly-once semantics
  • Apache Spark Streaming: For batch and stream processing
  • AWS Lambda / Azure Functions / GCP Cloud Functions: For serverless event processing

Monitoring

  • Prometheus & Grafana: For metrics collection and visualization
  • ELK Stack: For log aggregation and analysis
  • Jaeger/Zipkin: For distributed tracing
  • Datadog/New Relic: For integrated observability platforms

Governance

Event Ownership

Define clear ownership by:

  1. Assigning each event type to a specific team/service
  2. Documenting ownership in a central catalog
  3. Establishing change management processes for events
  4. Creating feedback channels for event consumers

Event Discovery

Implement event discovery mechanisms:

  1. Event Catalog: Document all events, their schemas, producers, and consumers
  2. Self-Registration: Enable services to register their events automatically
  3. Runtime Discovery: Allow services to discover available events at runtime
  4. Documentation: Maintain comprehensive, up-to-date event documentation

Change Management

  1. Versioning Strategy: Follow semantic versioning for event schemas
  2. Deprecation Policy: Define timeline and process for deprecating events
  3. Consumer Impact Analysis: Assess impact of changes on existing consumers
  4. Backwards Compatibility Period: Maintain compatibility for an appropriate period

Migration Strategies

Transitioning from Synchronous Architecture

  1. Strangler Fig Pattern: Gradually replace synchronous calls with events
  2. Dual-Writing: Write to both old and new systems during transition
  3. Event Backbone: Introduce an event backbone alongside existing communication
  4. Decomposition: Break monoliths into services connected by events

Implementation Roadmap

  1. Start Small: Begin with non-critical, bounded areas
  2. Prove Value: Demonstrate benefits with metrics
  3. Expand Gradually: Incrementally apply to more domains
  4. Refine Practices: Continuously improve based on experiences

Common Pitfalls to Avoid

  1. Event Overload: Too many fine-grained events creating noise
  2. Missing Events: Forgetting to model important domain events
  3. Tight Coupling: Using events in ways that create hidden dependencies
  4. Inconsistent Schemas: Lacking standardization across events
  5. Insufficient Monitoring: Not tracking event flows end-to-end

Framework-Specific Implementation

Spring Framework (Java)

// Event definition
@Value
public class OrderCreatedEvent {
    private final String orderId;
    private final String customerId;
    private final BigDecimal amount;
    private final LocalDateTime createdAt;
}

// Publishing events
@Service
public class OrderService {
    private final ApplicationEventPublisher eventPublisher;
    
    @Autowired
    public OrderService(ApplicationEventPublisher eventPublisher) {
        this.eventPublisher = eventPublisher;
    }
    
    @Transactional
    public Order createOrder(OrderRequest request) {
        // Business logic to create order
        Order order = // ...
        
        // Publish domain event
        eventPublisher.publishEvent(new OrderCreatedEvent(
            order.getId(),
            order.getCustomerId(),
            order.getTotalAmount(),
            order.getCreatedAt()
        ));
        
        return order;
    }
}

// Consuming events
@Service
public class NotificationService {
    @EventListener
    public void handleOrderCreatedEvent(OrderCreatedEvent event) {
        // Handle the event
        sendOrderConfirmation(event.getOrderId(), event.getCustomerId());
    }
}

Node.js (with EventEmitter)

// Event-driven service
class OrderService extends EventEmitter {
  constructor(orderRepository) {
    super();
    this.orderRepository = orderRepository;
  }
  
  async createOrder(orderData) {
    // Business logic
    const order = await this.orderRepository.save(orderData);
    
    // Emit domain event
    this.emit('orderCreated', {
      orderId: order.id,
      customerId: order.customerId,
      amount: order.totalAmount,
      createdAt: new Date().toISOString()
    });
    
    return order;
  }
}

// Event consumer
const orderService = new OrderService(orderRepository);

// Register event handler
orderService.on('orderCreated', async (event) => {
  await notificationService.sendOrderConfirmation(
    event.orderId, 
    event.customerId
  );
});

Kafka Consumer (Java)

@Service
public class OrderEventConsumer {
    private final NotificationService notificationService;
    
    @Autowired
    public OrderEventConsumer(NotificationService notificationService) {
        this.notificationService = notificationService;
    }
    
    @KafkaListener(topics = "order-events", groupId = "notification-service")
    public void consume(ConsumerRecord<String, String> record) {
        try {
            OrderEvent event = parseEvent(record.value());
            
            if ("ORDER_CREATED".equals(event.getType())) {
                notificationService.sendOrderConfirmation(
                    event.getData().get("orderId"),
                    event.getData().get("customerId")
                );
            }
        } catch (Exception e) {
            // Error handling
            log.error("Error processing order event", e);
        }
    }
    
    private OrderEvent parseEvent(String json) {
        // Parse JSON to OrderEvent object
        // ...
    }
}

Case Studies

Case Study 1: E-commerce Order Processing

Challenge: Building a scalable order processing system that handles variable load.

Solution:

  1. Implemented event-driven architecture with the following events:

    • OrderCreated
    • PaymentProcessed
    • InventoryReserved
    • OrderFulfilled
    • ShipmentCreated
    • OrderDelivered
  2. Used Apache Kafka as the event backbone with:

    • Topic-per-event-type structure
    • Consistent event schema with versioning
    • Dead letter queues for failed events
  3. Implemented the Saga pattern for transaction management:

    • Coordinated steps across multiple services
    • Implemented compensating transactions
    • Handled timeout and failure scenarios

Results:

  • Scaled to handle 10x order volume during peak periods
  • Reduced order processing time by 40%
  • Achieved 99.99% reliability in order processing
  • Enabled new services to be added without modifying existing ones

Case Study 2: Real-time Analytics Pipeline

Challenge: Building a real-time analytics system processing millions of user interactions.

Solution:

  1. Implemented event sourcing for user interaction data:

    • Captured all user events in an immutable log
    • Used event-carried state transfer pattern
    • Applied CQRS for specialized read models
  2. Used stream processing to:

    • Calculate real-time metrics
    • Detect anomalies and patterns
    • Update materialized views
  3. Implemented multiple projections for different analytical needs:

    • Time-series analysis
    • User behavior aggregations
    • Funnel analysis

Results:

  • Achieved sub-second latency for analytics updates
  • Reduced infrastructure costs by 35%
  • Enabled complex analytics without impacting user-facing services
  • Simplified addition of new analytics dimensions