name

adr-028-starlark-saga-cel-valuation

description

Tiered logic execution using Starlark for sagas and CEL for valuations

triggers

Implementing tenant-configurable workflow orchestration

Adding dynamic ledger posting rules that vary by tenant

Defining valuation formulas without code deployment

Breaking 1:1 position-to-posting coupling for multi-tenant flexibility

instructions

Use Starlark (go.starlark.net) for multi-step workflow orchestration (sagas, settlement runs). Use CEL (google/cel-go) for stateless expressions (validation, valuation, fungibility). Starlark scripts MUST call CEL-based valuation strategies for financial calculations. Store saga definitions in Reference Data service for tenant-specific versioning.

28. Starlark Saga Orchestration with CEL Valuation

Date: 2025-01-20

Status

Accepted (PRD approved for baseline)

Context

ADR-0014 establishes CEL for validation and fungibility expressions. However, saga orchestration remains hardcoded in Go, creating limitations:

Current State: Hardcoded 1:1 Coupling

// withdrawal_orchestrator.go:292-467 - RIGID
// One position log entry always creates exactly two ledger postings
saga.AddStep("post_ledger", func(ctx context.Context) error {
    // DEBIT customer account
    o.finAcctClient.CaptureLedgerPosting(ctx, debitRequest)
    // CREDIT clearing account
    o.finAcctClient.CaptureLedgerPosting(ctx, creditRequest)
    // No flexibility for tenant-specific requirements
})

The Multi-Tenant Challenge

Different tenants require different posting patterns for the same business operation:

Tenant	Withdrawal Posting Pattern	Use Case
Basic Bank	2 postings: Customer DEBIT, Clearing CREDIT	Standard double-entry
Tax Jurisdiction	3 postings: + Withholding Tax CREDIT	Regulatory withholding
Fee-Based	4 postings: + Fee Income CREDIT, adjusted customer amount	Transaction fees
Four-Corner	4 postings: Clearing house settlement model	Interbank clearing

Current limitation: Adding a new posting pattern requires code changes, testing, and deployment.

Why CEL Alone Is Insufficient

CEL excels at stateless expressions but cannot handle:

Sequential step execution with compensation
Conditional branching based on intermediate results
Loops over dynamic posting lists
State management across saga steps

The "Golden Stack" Solution

Use each tool for its strength:

Tool	Role	Responsibility
Starlark	The Conductor	Orchestration, branching, loops, step management
CEL	The Calculator	Formula execution, validation, numeric precision

Decision Drivers

Tenant flexibility: Different posting patterns without code deployment
1:N posting support: One position entry can generate multiple ledger postings
Separation of concerns: Business logic (Starlark) vs financial physics (CEL)
Auditability: Versioned saga definitions with full history
Performance: CEL for high-frequency calculations (~100ns), Starlark for orchestration
Safety: Both languages are sandboxed and deterministic; Starlark guarantees bounded execution via recursion depth limits and timeouts (no while loops by language design)
Bi-temporal consistency: Valuation replay requires deterministic execution

Considered Options

Continue hardcoding sagas in Go
CEL-only with complex expression chaining
Starlark for sagas + CEL for valuations (chosen)
Full scripting language (Lua, JavaScript)

Decision Outcome

Chosen option: Starlark for sagas + CEL for valuations, because it provides the optimal balance of flexibility, safety, and performance for a multi-tenant financial platform.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│                        Reference Data Service                             │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  Saga Definitions (Starlark)                                       │  │
│  │  • Step sequencing              Per-tenant versioned               │  │
│  │  • posting_rules() function     Hot-reloadable                     │  │
│  │  • Compensation mapping         Auditable                          │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  Valuation Rules (CEL)          Already in ADR-014                 │  │
│  │  • validation_expression        ~100ns execution                   │  │
│  │  • fungibility_expression       Compiled bytecode                  │  │
│  │  • valuation_expression (NEW)   Financial calculations             │  │
│  └────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                          Saga Runtime (Go)                                │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────────────┐ │
│  │ Starlark VM     │   │ CEL Evaluator   │   │ Step Handler Registry   │ │
│  │ (go.starlark)   │   │ (existing)      │   │                         │ │
│  │                 │   │                 │   │ position_keeping.*      │ │
│  │ Executes:       │   │ Evaluates:      │   │ financial_accounting.*  │ │
│  │ • posting_rules │   │ • tax calc      │   │ repository.*            │ │
│  │ • step sequence │   │ • fee calc      │   │                         │ │
│  └─────────────────┘   └─────────────────┘   └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘

Saga Definition Schema

Note: Tenant isolation is achieved via PostgreSQL schema-per-tenant (search_path). No tenant_id column needed - each tenant's schema contains only their data.

-- Lives in Reference Data service (each tenant's schema)
CREATE TABLE saga_definitions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Identification
    name VARCHAR(64) NOT NULL,           -- "withdrawal", "deposit", "settlement"
    version INTEGER NOT NULL DEFAULT 1,

    -- Starlark script (the orchestration logic)
    script TEXT NOT NULL,
    script_hash VARCHAR(64) NOT NULL,    -- SHA-256 for cache invalidation

    -- Lifecycle
    status VARCHAR(16) NOT NULL DEFAULT 'DRAFT',  -- DRAFT, ACTIVE, DEPRECATED

    -- Validation tracking
    last_validated_at TIMESTAMPTZ,
    validation_errors JSONB,

    -- Metadata
    description TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    activated_at TIMESTAMPTZ,
    deprecated_at TIMESTAMPTZ,

    -- Bi-temporal for replay
    valid_from TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    valid_to TIMESTAMPTZ,

    UNIQUE(name, version),
    CHECK (status IN ('DRAFT', 'ACTIVE', 'DEPRECATED')),
    CHECK (script <> '')
);

CREATE INDEX idx_saga_definitions_lookup
    ON saga_definitions(name, status);

CREATE INDEX idx_saga_definitions_bitemporal
    ON saga_definitions(name, valid_from, valid_to);

Validation Phases

Saga definitions pass through three validation phases with increasing strictness:

Phase	Trigger	Strictness	Failures
DRAFT	On save	Warnings only	Store with warnings
ACTIVATION	Status → ACTIVE	Hard errors	Reject activation
DEPRECATION	Status → DEPRECATED	Impact analysis	Warn of dependents

DRAFT validation (warnings):

Syntax check (Starlark parses)
Unknown step handler references
Unreachable code paths
Missing compensation handlers

ACTIVATION validation (errors):

All step handlers must exist and be registered
Attribute references must match instrument schema
External handlers without idempotency support must have Pre-Step Check
Circular saga references prohibited
CEL expressions must compile

DEPRECATION validation:

List all sagas that invoke_saga() this definition
Warn operator of downstream impact

Starlark Saga Example: Tenant-Configurable Withdrawal

# sagas/withdrawal/tenant-with-tax.star
# Stored in reference-data service, loaded at runtime

def posting_rules(ctx):
    """Generate ledger postings for a withdrawal.

    Args:
        ctx: Execution context with amount, account, tenant config

    Returns:
        List of posting instructions for Financial Accounting service
    """
    postings = []
    remaining_amount = ctx.amount

    # 1. Withholding tax (if applicable)
    if ctx.tenant.requires_withholding_tax:
        # CEL for calculation - fast, auditable
        tax_amount = cel_eval(ctx.tenant.withholding_tax_expr, {
            "amount": ctx.amount,
            "account_type": ctx.account.type,
        })

        if tax_amount > 0:
            postings.append(posting(
                account_id = resolve_account("tax_withholding", ctx.currency),
                direction = "CREDIT",
                amount = tax_amount,
                description = "Withholding tax",
            ))
            remaining_amount = remaining_amount - tax_amount

    # 2. Transaction fee (if configured)
    if ctx.tenant.transaction_fee_expr:
        # CEL for fee calculation
        fee = cel_eval(ctx.tenant.transaction_fee_expr, {
            "amount": ctx.amount,
            "account_tier": ctx.account.tier,
        })

        if fee > 0:
            postings.append(posting(
                account_id = resolve_account("fee_income", ctx.currency),
                direction = "CREDIT",
                amount = fee,
                description = "Transaction fee",
            ))
            remaining_amount = remaining_amount - fee

    # 3. Customer account debit (net amount after deductions)
    postings.append(posting(
        account_id = ctx.account.id,
        direction = "DEBIT",
        amount = remaining_amount,
        description = "Withdrawal from account",
    ))

    # 4. Clearing account credit (always full original amount)
    postings.append(posting(
        account_id = resolve_account("clearing", ctx.currency),
        direction = "CREDIT",
        amount = ctx.amount,
        description = "Clearing credit",
    ))

    return postings


def compensation_rules(ctx, completed_postings):
    """Generate compensation postings to reverse completed work.

    Args:
        ctx: Original execution context
        completed_postings: List of postings that were successfully created

    Returns:
        List of compensating posting instructions (reversed directions)
    """
    compensations = []
    for p in completed_postings:
        compensations.append(posting(
            account_id = p.account_id,
            direction = "CREDIT" if p.direction == "DEBIT" else "DEBIT",
            amount = p.amount,
            description = "COMP: " + p.description,
        ))
    return compensations


# Saga definition
saga(
    name = "withdrawal",
    version = "2.0.0",

    steps = [
        step(
            name = "log_position",
            action = "position_keeping.initiate_log",
            params = lambda ctx: {
                "direction": "DEBIT",
                "amount": ctx.amount,
                "account_id": ctx.account.id,
            },
            compensation = "position_keeping.cancel_log",
        ),
        step(
            name = "post_ledger",
            action = "financial_accounting.post_entries",
            params = lambda ctx: {
                "postings": posting_rules(ctx),  # Dynamic 1:N posting generation
            },
            compensation = lambda ctx, result: {
                "postings": compensation_rules(ctx, result.completed_postings),
            },
        ),
        step(
            name = "save_account",
            action = "repository.save",
            compensation = None,  # No-op: last step
        ),
    ],

    # CEL preconditions (fast validation before saga starts)
    preconditions = [
        "ctx.amount > 0",
        "ctx.account.status == 'ACTIVE'",
        "ctx.account.available_balance >= ctx.amount",
    ],
)

CEL Valuation Expressions

Extend the existing CEL compiler to support valuation expressions:

-- Add valuation_expression to instrument_definitions (extends ADR-014)
ALTER TABLE instrument_definitions
ADD COLUMN valuation_expression TEXT;

-- Example: Energy pricing with time-of-use tariffs
UPDATE instrument_definitions
SET valuation_expression = 'qty * lookup_tariff(attrs.tou_period, attrs.tariff_zone)'
WHERE code = 'KWH';

-- Example: Transaction fee calculation
-- Stored per-tenant in tenant_config
INSERT INTO tenant_valuation_rules (tenant_id, rule_name, cel_expression)
VALUES
    ('tenant-uuid', 'withholding_tax', 'amount * 0.15'),
    ('tenant-uuid', 'transaction_fee', 'max(1.00, amount * 0.001)');

Go Runtime Implementation

package saga

import (
    "context"

    "go.starlark.net/starlark"
    "github.com/google/cel-go/cel"
)

// SagaRuntime executes Starlark saga definitions with CEL valuation support.
type SagaRuntime struct {
    refDataClient  ReferenceDataClient
    celCompiler    *cel.Compiler
    stepHandlers   StepHandlerRegistry
    logger         *slog.Logger
}

// StepHandlerRegistry maps Starlark action names to Go implementations.
type StepHandlerRegistry struct {
    handlers map[string]StepHandler
}

// StepHandler is the Go function that executes a saga step.
type StepHandler func(ctx context.Context, params map[string]any) (StepResult, error)

// Execute runs a saga definition for a given tenant and input.
func (r *SagaRuntime) Execute(
    ctx context.Context,
    tenantID uuid.UUID,
    sagaName string,
    input ExecutionContext,
) (*SagaResult, error) {
    // 1. Load saga definition from reference data
    def, err := r.refDataClient.GetSagaDefinition(ctx, tenantID, sagaName)
    if err != nil {
        return nil, fmt.Errorf("load saga %s: %w", sagaName, err)
    }

    // 2. Create Starlark thread with CEL builtins
    thread := &starlark.Thread{
        Name: fmt.Sprintf("saga-%s-%s", sagaName, uuid.New().String()),
    }

    // 3. Register built-in functions
    builtins := r.createBuiltins(ctx, tenantID, input)
    // Available: cel_eval, posting, resolve_account, valuate, valuate_batch,
    //            ctx.new_uuid, invoke_saga, fail, log, position_keeping.*,
    //            market_data.*, party.*

    // 4. Execute Starlark script
    globals, err := starlark.ExecFile(thread, sagaName+".star", def.Script, builtins)
    if err != nil {
        return nil, fmt.Errorf("parse saga script: %w", err)
    }

    // 5. Extract saga definition from globals
    sagaDef := globals["saga"]

    // 6. Execute steps with compensation support
    return r.executeSteps(ctx, sagaDef, input)
}

// createBuiltins registers Go functions callable from Starlark.
// NOTE: Production implementation MUST add type assertion error handling:
//   str, ok := args[0].(starlark.String)
//   if !ok { return nil, fmt.Errorf("cel_eval: expected string, got %T", args[0]) }
func (r *SagaRuntime) createBuiltins(
    ctx context.Context,
    tenantID uuid.UUID,
    input ExecutionContext,
) starlark.StringDict {
    return starlark.StringDict{
        // CEL evaluation for calculations
        "cel_eval": starlark.NewBuiltin("cel_eval", func(
            thread *starlark.Thread,
            fn *starlark.Builtin,
            args starlark.Tuple,
            kwargs []starlark.Tuple,
        ) (starlark.Value, error) {
            expr := args[0].(starlark.String).GoString()
            params := toGoMap(args[1])

            result, err := r.celCompiler.Evaluate(ctx, expr, params)
            if err != nil {
                return starlark.None, err
            }
            return toStarlarkValue(result), nil
        }),

        // Create a posting instruction
        "posting": starlark.NewBuiltin("posting", createPostingBuiltin),

        // Resolve internal account
        "resolve_account": starlark.NewBuiltin("resolve_account", func(
            thread *starlark.Thread,
            fn *starlark.Builtin,
            args starlark.Tuple,
            kwargs []starlark.Tuple,
        ) (starlark.Value, error) {
            purpose := args[0].(starlark.String).GoString()
            currency := args[1].(starlark.String).GoString()

            accountID, err := r.resolveAccount(ctx, tenantID, purpose, currency)
            if err != nil {
                return starlark.None, err
            }
            return starlark.String(accountID), nil
        }),
    }
}

Test Strategy

Test Category	Coverage	Tools
Starlark unit tests	posting_rules(), compensation_rules()	Go test + Starlark interpreter
CEL valuation tests	Expression evaluation	Existing CEL test framework
Saga orchestration	Step execution, compensation order	Existing saga_test.go patterns
Integration tests	End-to-end with mocked services	Testcontainers
Multi-tenant scenarios	Different posting patterns per tenant	Parameterized tests
Golden file tests	Saga definition regression detection	Snapshot testing

// Example: Multi-tenant posting variant tests
func TestSagaExecution_MultiTenantPostingVariants(t *testing.T) {
    testCases := []struct {
        tenant           string
        sagaVersion      string
        expectedPostings int
        description      string
    }{
        {"default", "1.0.0", 2, "standard double-entry"},
        {"with-tax", "2.0.0", 3, "with withholding tax"},
        {"with-tax-and-fee", "2.1.0", 4, "with tax and fee"},
        {"four-corner", "3.0.0", 4, "clearing house model"},
    }

    for _, tc := range testCases {
        t.Run(tc.tenant, func(t *testing.T) {
            runtime := NewSagaRuntime(mockRefData, mockCEL, mockHandlers)

            result, err := runtime.Execute(ctx, tc.tenant, "withdrawal", input)

            require.NoError(t, err)
            assert.True(t, result.Success)
            assert.Len(t, capturedPostings, tc.expectedPostings)
        })
    }
}

Security Constraints

Both Starlark and CEL are designed for safe execution of untrusted code:

Constraint	Starlark	CEL
Turing completeness	No (no while loops)	No
Infinite loops	Impossible	Impossible
File system access	None	None
Network access	None	None
Memory limits	Configurable	Configurable
Execution timeout	Configurable	Configurable
Determinism	100%	100%

Additional Meridian constraints:

const (
    MaxStarlarkScriptSize   = 64 * 1024  // 64KB
    MaxStarlarkExecutionMs  = 5000       // 5 seconds
    MaxCELExpressionLength  = 4096       // 4KB (from ADR-014)
    MaxCELExpressionDepth   = 10         // Nesting limit
    MaxCELCostLimit         = 10000      // Evaluation cost units
)

Party Isolation

Sagas execute within a party scope, ensuring data isolation across organizational boundaries:

# Runtime injects immutable party scope
ctx.party_scope = PartyScope(
    party_id = "P001",                    # Executing party
    party_type = "ORGANIZATION",          # INDIVIDUAL, ORGANIZATION
    visible_parties = ["P001", "P002"],   # Party + authorized children
)

# All lookups are automatically scoped
positions = position_keeping.list(party_scope=ctx.party_scope)  # Only visible positions

Enforcement rules:

Runtime resolves party tree from Party Service before execution
ctx.party_scope is immutable (attempts to modify raise error)
Cross-party lookups require explicit authorized_lookups declaration in saga definition
Audit log captures party_id and visible_parties for every execution

Durable Execution via Replay

Sagas survive pod death through checkpoint-and-replay (mini-Temporal pattern):

-- Service-local tables (not in Reference Data)
CREATE TABLE saga_instances (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    saga_definition_id UUID NOT NULL,
    saga_name VARCHAR(64) NOT NULL,
    saga_version INTEGER NOT NULL,
    input_snapshot JSONB NOT NULL,
    party_id UUID NOT NULL,
    knowledge_at TIMESTAMPTZ,

    -- Lease-based claiming (race condition prevention)
    claimed_by_pod VARCHAR(128),
    claimed_at TIMESTAMPTZ,
    lease_expires_at TIMESTAMPTZ,

    -- Progress
    current_step_index INTEGER NOT NULL DEFAULT 0,
    replay_count INTEGER NOT NULL DEFAULT 0,
    status VARCHAR(32) NOT NULL DEFAULT 'PENDING',
    -- PENDING, RUNNING, COMPLETED, COMPENSATING, COMPENSATED, FAILED, FAILED_MANUAL_INTERVENTION

    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at TIMESTAMPTZ,
    error_message TEXT
);

-- Index for efficient orphaned saga detection (frequent query in multi-pod deployments)
CREATE INDEX idx_saga_instances_lease_expires
    ON saga_instances(lease_expires_at)
    WHERE status IN ('RUNNING', 'PENDING', 'COMPENSATING');

CREATE TABLE saga_step_results (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    saga_instance_id UUID NOT NULL REFERENCES saga_instances(id),
    step_index INTEGER NOT NULL,
    step_name VARCHAR(64) NOT NULL,
    idempotency_key VARCHAR(128) NOT NULL,  -- "saga_{id}_step_{index}"
    output_snapshot JSONB,
    status VARCHAR(16) NOT NULL,
    causation_id UUID NOT NULL,
    UNIQUE(saga_instance_id, step_index),
    UNIQUE(idempotency_key)
);

Recovery flow:

Pod B detects orphaned sagas: WHERE lease_expires_at < NOW()
Claims with FOR UPDATE SKIP LOCKED (no race conditions)
Replays Starlark from start; completed steps return cached results
Continues from first incomplete step

Transaction affinity: Step result INSERT and current_step_index UPDATE must be atomic.

Determinism Requirements

Saga execution must be 100% deterministic for replay safety:

Constraint	Enforcement
No `time.now()`	Runtime blocks; use `ctx.knowledge_at`
No random numbers	Runtime blocks; use `ctx.new_uuid()`
No external I/O	Only via registered step handlers
Idempotent handlers	External integrations must declare capability

# WRONG: Non-deterministic
timestamp = time.now()  # Error: time access not allowed

# RIGHT: Deterministic
timestamp = ctx.knowledge_at  # Injected, stable across replays
ref_id = ctx.new_uuid()       # Version 5 UUID (namespace=saga_id, name=step:call)

External Integration Guardrails

Side-Effect Idempotency: External integrations (payment gateways, etc.) must declare idempotency support:

type StepHandler struct {
    Name        string
    Idempotency IdempotencyCapability  // INTERNAL, EXTERNAL_SUPPORTED, EXTERNAL_NOT_SUPPORTED
    Execute     func(ctx, params) (result, error)
}

EXTERNAL_NOT_SUPPORTED handlers require "Pre-Step Check" pattern (query before mutation)
Runtime provides verify_external_state(handler, check_fn) builtin
ACTIVATION fails if non-idempotent handler used without Pre-Step Check
Logic/Physics Linter enforces Pre-Step Check pattern

Zombie Saga Detection & Hot-Fixing: Sagas stuck in retry loops are detected and escalated:

replay_count incremented on each replay attempt
If replay_count > MAX_REPLAYS (default: 5) → status = FAILED_MANUAL_INTERVENTION
High-severity (P1) alert triggered for operator intervention
Hot-fix via definition re-pointing (bi-temporal compatible):
1. Deploy fixed definition as new version (v2)
2. Update instance's saga_definition_id to v2, reset replay_count
3. On resume: cached step results are respected, failed step executes with v2 logic
4. Audit trail preserves: which version used, when each step executed

Dry-Run Testing: Validate Starlark logic before deployment:

// ExecuteDryRun runs saga with mocked handlers, returns execution plan
func (r *Runtime) ExecuteDryRun(ctx, sagaName, input) (*ExecutionPlan, error)

No database writes
Returns intended step sequence with parameters
Validates attribute references against instrument schema

Real Service Integration

Service Binding Architecture

Starlark handlers bridge the dynamic scripting interface with strongly-typed gRPC services. Each service provides a RegisterStarlarkHandlers function that registers its operations with the saga runtime.

Pattern: Service bindings live in services/{service-name}/client/starlark.go alongside the gRPC client:

// services/current-account/client/starlark.go
func RegisterStarlarkHandlers(registry *saga.HandlerRegistry, client *Client) error {
    handlers := map[string]struct {
        handler  saga.Handler
        metadata saga.HandlerMetadata
    }{
        "current_account.create_lien": {
            handler: createLienHandler(client),
            metadata: saga.HandlerMetadata{
                Category:            saga.HandlerCategorySettlement,
                ProducesInstruments: []string{"USD", "EUR", "GBP", "NZD"},
            },
        },
    }

    for name, h := range handlers {
        if err := registry.RegisterWithMetadata(name, h.handler, &h.metadata); err != nil {
            return fmt.Errorf("failed to register %s: %w", name, err)
        }
    }
    return nil
}

func createLienHandler(client *Client) saga.Handler {
    return func(ctx *saga.StarlarkContext, params map[string]any) (any, error) {
        // 1. Parse Starlark params (map[string]any)
        accountID, err := saga.RequireStringParam(params, "account_id")
        amount, err := saga.RequireDecimalParam(params, "amount")

        // 2. Prepare context with saga metadata (idempotency, tracing)
        clientCtx := prepareClientContext(ctx)

        // 3. Build gRPC request
        req := &currentaccountv1.InitiateLienRequest{...}

        // 4. Call REAL gRPC client (not a mock!)
        resp, err := client.InitiateLien(clientCtx, req)

        // 5. Convert response to Starlark format
        return map[string]any{
            "lien_id": resp.GetLien().GetLienId(),
            "status":  "ACTIVE",
        }, nil
    }
}

Dependency Injection Over Global Registry

Services explicitly declare their saga handler dependencies during initialization:

Before (global registry - problematic):

// OLD PATTERN - DEPRECATED
executor := saga.NewExecutor(saga.ExecutorConfig{
    Handlers: saga.DefaultRegistry(), // ❌ Global state, hidden dependencies
})

After (explicit registration - current):

// NEW PATTERN - RECOMMENDED
func main() {
    // Initialize service clients
    currentAccountClient, cleanup, _ := currentaccountclient.New(...)
    defer cleanup()

    posKeepingClient, cleanup2, _ := positionkeepingclient.New(...)
    defer cleanup2()

    // Create handler registry
    handlerRegistry := saga.NewHandlerRegistry()

    // Explicitly register service bindings
    currentaccountclient.RegisterStarlarkHandlers(handlerRegistry, currentAccountClient)
    positionkeepingclient.RegisterStarlarkHandlers(handlerRegistry, posKeepingClient)

    // Create saga runner with explicit dependencies
    sagaRunner := saga.NewStarlarkSagaRunner(saga.StarlarkSagaRunnerConfig{
        Handlers: handlerRegistry, // ✅ Explicit dependencies
        Logger:   logger,
    })
}

Benefits:

Clear dependency graph (this service orchestrates these services)
Easy to test (inject mock clients)
No global state
Graceful degradation (warn on missing services, don't fail)

Conservation of Dimension Rule

The Conservation of Dimension Rule enforces type safety for financial instruments at the handler level:

Handlers must declare ProducesInstruments metadata matching the instrument types they actually create in position-keeping. A handler that produces USD positions cannot declare EUR, preventing runtime type mismatches.

Rationale: Financial systems must enforce dimensional consistency. Just as you cannot add meters to kilograms in physics, you cannot mix currencies or asset types without explicit conversion. This rule catches type errors at handler registration time rather than in production.

Example validation failure:

// BAD - Declaration doesn't match implementation
metadata: saga.HandlerMetadata{
    ProducesInstruments: []string{"USD"},  // Declared: USD only
}

// In handler implementation:
req := &positionkeepingv1.InitiateLogRequest{
    Currency: "EUR",  // MISMATCH! Creates EUR but declared USD
}

// Result: Validation error at registration time:
// "handler current_account.create_lien produced EUR but only declared [USD]"

Handler categories:

HandlerCategoryIngestion - Creates positions from external data (meter readings, market prices)
- Example: ProducesInstruments: []string{"KWH", "GAS", "WATER"}
HandlerCategoryValuation - Computes derived values (mark-to-market, accruals)
- Example: ProducesInstruments: []string{} (usually empty, updates existing)
HandlerCategorySettlement - Executes financial operations (debits, credits, transfers)
- Example: ProducesInstruments: []string{"USD", "EUR", "GBP"}

The validator enforces:

Declared instruments must match what's actually created in position-keeping
No instrument type mismatches between declaration and implementation
Multi-currency handlers must declare all supported currencies upfront

This ensures financial instrument type safety at compile time (handler registration), not at runtime when data corruption could occur.

Positive Consequences

Tenant flexibility: Different posting patterns without code deployment
1:N posting support: One position entry generates dynamic number of postings
Separation of concerns: Business logic in Starlark, financial math in CEL
Hot-reloadable: Saga changes without service restart
Auditability: Versioned definitions with full history in reference data
Testability: Unit test posting rules independently; dry-run API for validation
Performance: CEL (~100ns) for high-frequency calculations
Safety: Both languages sandboxed, deterministic, guaranteed termination
Bi-temporal replay: Deterministic execution enables valuation replay
Durable execution: Sagas survive pod death via checkpoint-and-replay
Party isolation: Cross-party data access enforced at runtime
Financial grade: External idempotency enforcement, zombie detection, transaction affinity

Negative Consequences

Learning curve: Teams must learn both Starlark and CEL syntax
Debugging complexity: Two-language stack requires tooling investment
Runtime dependency: Saga execution requires reference data service availability
Migration effort: Existing Go orchestrators must be ported to Starlark
Version management: Saga versions add operational complexity

Notes

Migration Path

Phase dispositions assessed 2026-03-06 against current implementation in shared/pkg/saga/.

Phase 1: Foundation — COMPLETED. Registry (handlers.go), runtime (runtime.go, starlark_runner.go), schema (schema/), CEL evaluator (cel_evaluator.go), caching (validation/cache.go).
Phase 2: Prove the Pattern — PLANNED. No Go orchestrators have been migrated to Starlark yet. The runtime and service bindings are ready, but no production saga scripts exist.
Phase 3: Expand — PLANNED. Depends on Phase 2 completion.
Phase 4: Enable Tenants — PLANNED. Admin handler exists (admin_handler.go) but tenant self-service saga authoring is not yet exposed.
Phase 5: Party Isolation — COMPLETED. party_scope.go implements PartyScope, PartyScopeResolver, PartyHierarchyClient with immutable scope enforcement.
Phase 6: Bi-Temporal — PARTIALLY COMPLETED. Replay implemented (replay.go) with deterministic UUID generation (uuid_generator.go). verify_execution() audit builtin not yet implemented.
Phase 7: Attribute Validation — COMPLETED. Validator (validator.go) with AST extraction, handler schema validation (validation/validator.go), cached validation (validation/cached_validator.go).
Phase 8: Durable Execution — COMPLETED. Saga instances with lease-based claiming (claiming.go), orphan detection (orphan_watcher.go), step execution persistence (step_execution.go), migrations (migrations.go), lease renewal (lease_renewer.go), suspend/resume (suspend.go).
Phase 9: Hardening — PARTIALLY COMPLETED. Logic/Physics Linter implemented (linter.go). valuate_batch() builtin not yet implemented. External integration guardrails (Pre-Step Check pattern) enforced via linter.

See the Implementation PRDs above for detailed tasks organized into parallel work streams.

Ken's "Policy and Procedure" Framing

For non-technical stakeholders:

CEL is the "Policy": "This is the formula for how we calculate the price." Policies are rigid, fast, and auditable.
Starlark is the "Procedure": "This is the sequence of steps we take to settle a transaction." Procedures are flexible and handle real-world complexity.

Value proposition: The system is Flexible (procedures can change without deployment) but Rock-Solid (financial math is enforced by a mathematical engine with guaranteed termination).

Reconsidering This Decision

Revisit if:

Starlark execution becomes a performance bottleneck (>5ms per saga)
Debugging complexity outweighs flexibility benefits
Tenant-specific saga customization proves unnecessary
A superior embedded scripting solution emerges

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

28. Starlark Saga Orchestration with CEL Valuation

Status

Context

Current State: Hardcoded 1:1 Coupling

The Multi-Tenant Challenge

Why CEL Alone Is Insufficient

The "Golden Stack" Solution

Decision Drivers

Considered Options

Decision Outcome

Architecture Overview

Saga Definition Schema

Validation Phases

Starlark Saga Example: Tenant-Configurable Withdrawal

CEL Valuation Expressions

Go Runtime Implementation

Test Strategy

Security Constraints

Party Isolation

Durable Execution via Replay

Determinism Requirements

External Integration Guardrails

Real Service Integration

Service Binding Architecture

Dependency Injection Over Global Registry

Conservation of Dimension Rule

Positive Consequences

Negative Consequences

Links

Implementation PRDs (Parallel Work Streams)

Implementation Guides

Related ADRs

External References

Notes

Migration Path

Ken's "Policy and Procedure" Framing

Reconsidering This Decision

FilesExpand file tree

0028-starlark-saga-cel-valuation.md

Latest commit

History

0028-starlark-saga-cel-valuation.md

File metadata and controls

28. Starlark Saga Orchestration with CEL Valuation

Status

Context

Current State: Hardcoded 1:1 Coupling

The Multi-Tenant Challenge

Why CEL Alone Is Insufficient

The "Golden Stack" Solution

Decision Drivers

Considered Options

Decision Outcome

Architecture Overview

Saga Definition Schema

Validation Phases

Starlark Saga Example: Tenant-Configurable Withdrawal

CEL Valuation Expressions

Go Runtime Implementation

Test Strategy

Security Constraints

Party Isolation

Durable Execution via Replay

Determinism Requirements

External Integration Guardrails

Real Service Integration

Service Binding Architecture

Dependency Injection Over Global Registry

Conservation of Dimension Rule

Positive Consequences

Negative Consequences

Links

Implementation PRDs (Parallel Work Streams)

Implementation Guides

Related ADRs

External References

Notes

Migration Path

Ken's "Policy and Procedure" Framing

Reconsidering This Decision