Skip to content

Commit a236ca7

Browse files
authored
refactor(audit): final DRY optimization and documentation pass (#425)
* docs: add comprehensive audit implementation analysis findings Document findings from systematic analysis of async audit implementation across 6 services. Key findings include: - Payment-order service has full local audit implementation (DRY violation) - Migration schema inconsistencies (record_id types, JSON storage types) - Context key management duplication between models and audit package - Redundant type aliases in multiple services Prioritized recommendations for subsequent cleanup subtasks. * feat(audit): add RecordUpdateManual helper for map-based updates Add exported RecordUpdateManual function to the shared audit library for use when GORM hooks cannot be used. This supports the pattern where repositories use map-based updates with optimistic locking, which bypasses GORM struct hooks. The function accepts old and new entity values and records the UPDATE operation in the audit trail, maintaining consistency with the hook-based approach used for model updates. * refactor(payment-order): migrate audit to shared library Remove duplicated audit infrastructure from payment-order service and use the shared audit library instead. This eliminates ~100 lines of duplicated code including: - Local AuditOutbox struct definition - Local recordAudit function - Local getUserIDFromContext function - Local ErrNilTransaction error The PaymentOrderEntity now implements the audit.Auditable interface and uses audit.RecordCreate/RecordDelete for hook-based auditing. The repository uses the new audit.RecordUpdateManual for its map-based update pattern that bypasses GORM hooks. This migration brings payment-order in line with other services (tenant, party, financial-accounting) that already use the shared audit library, and enables Kafka-first publishing with outbox fallback. * refactor(tenant): remove redundant TenantAuditOutbox and TenantAuditLog types Remove the locally-defined TenantAuditOutbox and TenantAuditLog structs from the tenant service. These types were functionally identical to the shared audit.AuditOutbox and audit.AuditLog types (both use string RecordID to accommodate tenant's string IDs). The shared types already support both UUID and string record IDs, making these local definitions unnecessary duplication. The service continues to use the existing type aliases (AuditOutbox, systemUser) for backward compatibility with tests. * test(payment-order): update audit tests for shared library types Update audit tests to work with the shared audit library types: - Change record_id column from UUID to VARCHAR(50) to match the shared AuditOutbox struct which uses string to support both UUID and string IDs - Change old_values/new_values columns from JSONB to TEXT to match the shared struct and handle empty strings (JSONB rejects empty string "") - Update assertions from pointer checks (!=nil) to string checks (Empty/NotEmpty) - Update JSON unmarshal calls to use string values directly instead of dereferencing pointers These changes align the test infrastructure with the shared audit library types while maintaining full test coverage of audit functionality. * feat(audit): add adaptive polling and configurable worker options Implement performance optimizations for the audit worker: - Add functional options pattern (WithBatchSize, WithPollInterval, WithMaxRetries, WithAdaptivePolling) for flexible worker configuration - Implement adaptive polling that reduces database load during idle periods by exponentially backing off poll interval when outbox is empty, and immediately responding when entries arrive - Add new metrics: batch_size histogram, poll_interval_seconds gauge, and empty_polls_consecutive gauge for monitoring adaptive behavior - Add processBatchWithCount internal method to support adaptive polling decisions based on actual work performed These changes are backward compatible - default behavior is unchanged unless new options are explicitly used. * refactor(audit): centralize status constants and error types - Create status.go with exported status constants (StatusPending, StatusProcessing, StatusFailed, StatusCompleted) - Add operation constants (OperationInsert, OperationUpdate, OperationDelete) - Add table name constants (TableAuditOutbox, TableAuditLog) - Create errors.go consolidating all audit-related errors from hooks.go, worker.go, and publisher.go into a single file - Update worker.go, hooks.go, publisher.go to use centralized constants - Update worker_test.go to use exported status constants This improves code consistency by providing a single source of truth for status values and error types used across the audit package and services. Constants can now be used in SQL migrations and by consumers of the audit package. * refactor(audit): remove redundant type aliases and consolidate context helpers - Remove AuditOutbox and AuditLog type aliases from service persistence packages (tenant, party, financial-accounting, payment-order) - Update all test files to use audit.AuditOutbox directly instead of service-local aliases - Consolidate getUserIDFromContext in shared/domain/models/base.go to use audit.GetUserFromContext(), reducing code duplication - Update SystemUser constant to use audit.DefaultAuditUser This cleanup removes ~40 lines of redundant type alias definitions across 4 services while maintaining backward compatibility. Tests now reference the canonical shared audit types directly. * docs(audit): add comprehensive documentation for async audit system - Add package-level godoc in doc.go with architecture overview, usage examples, and performance characteristics - Enhance README.md with configuration options, adaptive polling documentation, and troubleshooting guide - Create operations/audit-monitoring.md with Prometheus metrics reference, alert thresholds, Grafana queries, and failure modes - Create development/audit-adding-new-service.md with step-by-step guide including migration templates, entity implementation, and test examples The documentation covers the complete async audit system including the dual-path Kafka/outbox architecture per ADR-0009. * fix(party): update lifecycle tests to use shared audit package Update lifecycle_integration_test.go to use audit.AuditOutbox from the shared platform audit package instead of the removed service-specific persistence.PartyAuditOutbox type. * fix(audit): use TEXT instead of JSONB for audit outbox values Change OldValues and NewValues columns in AuditOutbox and AuditLog structs from JSONB to TEXT to match the migration compatibility fixes. This prevents SQLSTATE 22P02 errors when tests run with GORM auto-migration which creates columns based on struct definitions. The JSONB type rejects empty strings as invalid JSON, but the audit system may write empty strings for null values. TEXT columns are more permissive and align with the migration schema changes. * fix(payment-order): update test schemas to use TEXT for audit values Update audit_outbox table creation in repository and integration tests to use TEXT columns for old_values/new_values instead of JSONB. Also change record_id from UUID to VARCHAR(50) to match shared audit infra. This aligns with the shared audit package and migration compatibility fixes that use TEXT to handle empty strings which JSONB rejects. * docs(audit): align migration template with shared audit types Update the audit adding new service guide to use TEXT columns and VARCHAR(50) for record_id instead of JSONB/UUID. This aligns with the shared audit package which uses TEXT to handle empty strings that JSONB would reject as invalid JSON. Also update the change_summary view to safely cast TEXT to JSONB only when the content is valid JSON format. * fix(audit): address CodeRabbit review feedback - Align record_id type with shared package (VARCHAR(50) instead of UUID) in party service test schemas for consistency - Add validation to WithAdaptivePolling to swap inverted min/max intervals - Add TODO for future improvement: use updated_at instead of created_at for more accurate stuck entry detection (requires schema migration) --------- Co-authored-by: Ben Coombs <[email protected]>
1 parent fa860fd commit a236ca7

29 files changed

Lines changed: 1963 additions & 431 deletions

docs/audit-analysis-findings.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Async Audit Implementation Analysis Findings
2+
3+
**Date**: 2024-12-24
4+
**Task**: 19.1 - Analysis phase for final optimization and DRY review
5+
**Scope**: 6 services (tenant, payment-order, party, position-keeping, financial-accounting, current-account)
6+
7+
## Executive Summary
8+
9+
The async audit implementation is largely well-structured, with a shared library in
10+
`shared/platform/audit/` that most services correctly use. However, there is one significant
11+
duplication issue (payment-order service), several schema inconsistencies across migrations,
12+
and minor code patterns that could be consolidated.
13+
14+
---
15+
16+
## 1. Code Duplication Findings
17+
18+
### 1.1 Critical: Payment-Order Service Has Full Local Implementation
19+
20+
**Location**: `services/payment-order/adapters/persistence/payment_order_entity.go`
21+
22+
The payment-order service maintains a **complete local copy** of the audit infrastructure instead of using the shared library:
23+
24+
| Component | Shared Library | Payment-Order Local |
25+
|-----------|----------------|---------------------|
26+
| `ErrNilTransaction` | `shared/platform/audit/hooks.go:19` | `payment_order_entity.go:84` |
27+
| `AuditOutbox` struct | `shared/platform/audit/hooks.go:38-54` | `payment_order_entity.go:94-109` |
28+
| `recordAudit()` function | `shared/platform/audit/hooks.go:252-295` | `payment_order_entity.go:130-184` |
29+
| `getUserIDFromContext()` | `shared/platform/audit/context.go:19` | `payment_order_entity.go:188-204` |
30+
31+
**Impact**:
32+
33+
- Maintenance burden (changes must be made in two places)
34+
- Inconsistent behavior (local version uses `uuid.UUID` for RecordID, shared uses `string`)
35+
- Missed Kafka integration (local version writes directly to outbox, shared version attempts Kafka first)
36+
37+
**Recommendation**: Migrate payment-order to use shared audit library. This requires:
38+
39+
1. Implementing `audit.Auditable` interface on `PaymentOrderEntity`
40+
2. Replacing local `recordAudit()` calls with `audit.RecordCreate/Update/Delete`
41+
3. Removing local `AuditOutbox`, `ErrNilTransaction`, `getUserIDFromContext`
42+
43+
### 1.2 Context Key Management Duplication
44+
45+
Two implementations of `getUserIDFromContext()` exist:
46+
47+
1. **shared/domain/models/base.go:105-122** - Uses `auth.UserIDContextKey` directly
48+
2. **shared/platform/audit/context.go:19-30** - Uses `auth.GetUserIDFromContext()`
49+
50+
Both achieve the same goal but via different methods. The audit library's version is more abstracted.
51+
52+
**Recommendation**: Consolidate to use `audit.GetUserFromContext()` everywhere, or extract to a single shared utility.
53+
54+
### 1.3 Error Type Duplication
55+
56+
`ErrNilTransaction` is defined in multiple places:
57+
58+
| Location | Package |
59+
|----------|---------|
60+
| `shared/platform/audit/hooks.go:19` | audit |
61+
| `shared/platform/events/outbox.go:35` | events |
62+
| `services/payment-order/adapters/persistence/payment_order_entity.go:84` | persistence |
63+
64+
**Recommendation**: Consider a shared errors package for common transaction-related errors, or
65+
document that each package intentionally defines its own error for proper `errors.Is()` matching.
66+
67+
---
68+
69+
## 2. Migration Schema Inconsistencies
70+
71+
### 2.1 Record ID Type Inconsistency
72+
73+
| Service | `record_id` Type in audit_log/audit_outbox |
74+
|---------|-------------------------------------------|
75+
| tenant | `VARCHAR(50)` (string IDs) |
76+
| position-keeping | `UUID` |
77+
| party | `UUID` |
78+
| current-account | `UUID` |
79+
| financial-accounting | `UUID` |
80+
| payment-order | `UUID` |
81+
82+
**Note**: Tenant uses string IDs intentionally (tenant IDs are alphanumeric). The shared
83+
`audit.AuditOutbox` struct uses `string` for `RecordID` to accommodate both.
84+
85+
### 2.2 JSON Storage Type Inconsistency
86+
87+
| Service | `old_values`/`new_values` Type |
88+
|---------|-------------------------------|
89+
| party | `TEXT` (contains JSON strings) |
90+
| All others | `JSONB` |
91+
92+
**Location**: `services/party/migrations/20251217000001_audit_system.sql:22-24`
93+
94+
**Impact**:
95+
96+
- Party service stores JSON as text, requiring cast to JSONB for queries
97+
- The `change_summary` view handles this with explicit `::jsonb` casts
98+
- Shared `audit.AuditLog` struct uses `string` type which works with both
99+
100+
**Recommendation**: Consider a migration to standardize party to JSONB for consistency, though functionally equivalent.
101+
102+
### 2.3 Client IP Type Inconsistency
103+
104+
| Service | `client_ip` Type |
105+
|---------|------------------|
106+
| tenant | `INET` |
107+
| position-keeping | `INET` |
108+
| current-account | `INET` |
109+
| financial-accounting | `INET` |
110+
| party | `VARCHAR(45)` |
111+
| payment-order | `VARCHAR(45)` |
112+
113+
**Impact**: Minor - both work for storing IP addresses. `INET` provides validation and supports IPv6.
114+
115+
### 2.4 Status Constraint Inconsistency (RESOLVED)
116+
117+
Current-account originally had a constraint missing `'completed'` status:
118+
119+
- Original: `CHECK (status IN ('pending', 'processing', 'failed'))`
120+
- Fixed in: `20251217000001_fix_audit_status_constraint.sql`
121+
122+
All services now correctly include `'completed'` status.
123+
124+
---
125+
126+
## 3. Service Integration Analysis
127+
128+
### 3.1 Services Using Shared Audit Library Correctly
129+
130+
| Service | Entity | Uses Shared Library |
131+
|---------|--------|---------------------|
132+
| tenant | TenantEntity | Yes - `audit.RecordCreate/Update/Delete` |
133+
| party | PartyEntity | Yes - `audit.RecordCreate/Update/Delete` |
134+
| financial-accounting | FinancialBookingLogEntity | Yes - `audit.RecordCreate/Update/Delete` |
135+
| financial-accounting | LedgerPostingEntity | Yes - `audit.RecordCreate/Update/Delete` |
136+
137+
### 3.2 Services with Custom/Mixed Implementation
138+
139+
| Service | Entity | Implementation |
140+
|---------|--------|----------------|
141+
| payment-order | PaymentOrderEntity | **Local** - full local copy |
142+
| position-keeping | N/A | Uses `audit.GetUserFromContext()` but has custom audit logic in repository |
143+
| current-account | N/A | Uses `audit.GetUserFromContext()` but has custom audit logic in repository |
144+
145+
### 3.3 Shared Domain Models with Audit Hooks
146+
147+
The following models in `shared/domain/models/` implement audit hooks using the shared library:
148+
149+
- `FinancialPositionLog`
150+
- `TransactionLogEntry`
151+
- `TransactionLineage`
152+
- `AuditTrailEntry`
153+
- `Customer`
154+
- `Account`
155+
156+
These all correctly use `audit.RecordCreate/Update/Delete`.
157+
158+
---
159+
160+
## 4. Performance Bottleneck Analysis
161+
162+
### 4.1 Worker Metrics Review
163+
164+
`shared/platform/audit/metrics.go` provides comprehensive metrics:
165+
166+
**Outbox Processing Metrics:**
167+
168+
- `meridian_audit_worker_outbox_depth` (gauge by schema)
169+
- `meridian_audit_worker_outbox_processed_total` (counter)
170+
- `meridian_audit_worker_outbox_failed_total` (counter)
171+
- `meridian_audit_worker_processing_duration_seconds` (histogram)
172+
- `meridian_audit_worker_entry_age_seconds` (histogram)
173+
174+
**Kafka Metrics:**
175+
176+
- `meridian_audit_kafka_events_published_total` (counter)
177+
- `meridian_audit_kafka_events_consumed_total` (counter)
178+
- `meridian_audit_kafka_publish_duration_seconds` (histogram)
179+
- `meridian_audit_kafka_consume_duration_seconds` (histogram)
180+
- `meridian_audit_kafka_consumer_lag_messages` (gauge)
181+
- `meridian_audit_kafka_fallback_used_total` (counter by reason)
182+
- `meridian_audit_kafka_dlq_messages_total` (counter)
183+
184+
**Potential Bottleneck Indicators:**
185+
186+
1. `kafkaPublishDuration` histogram buckets extend to 5s - may need alerting on p99
187+
2. `kafkaFallbackUsed` counter tracks fallback reasons - good for diagnosing issues
188+
3. No explicit batch size metrics for consumer throughput
189+
190+
**Recommendation**: Add `batch_size` label to `processing_duration_seconds` histogram to correlate
191+
processing time with batch sizes.
192+
193+
### 4.2 Worker Configuration
194+
195+
Default worker configuration in `shared/platform/audit/worker.go:28-33`:
196+
197+
- `defaultBatchSize = 100`
198+
- `defaultPollInterval = 5 * time.Second`
199+
- `defaultMaxRetries = 3`
200+
- `defaultProcessingAge = 5 * time.Minute` (stuck entry threshold)
201+
202+
These defaults appear reasonable. No configuration exposure mechanism exists for tuning per-service.
203+
204+
**Recommendation**: Consider exposing these as configurable options in `NewAuditWorker()`.
205+
206+
---
207+
208+
## 5. Dead Code Candidates
209+
210+
### 5.1 Unused Type Aliases
211+
212+
Several services define type aliases for backward compatibility:
213+
214+
```go
215+
// services/tenant/adapters/persistence/audit.go:99-101
216+
const systemUser = audit.DefaultAuditUser
217+
type AuditOutbox = audit.AuditOutbox
218+
219+
// services/party/adapters/persistence/party_entity.go:90
220+
type PartyAuditOutbox = audit.AuditOutbox
221+
222+
// services/financial-accounting/adapters/persistence/booking_log_entity.go:61-65
223+
type AuditOutbox = audit.AuditOutbox
224+
type AuditLog = audit.AuditLog
225+
```
226+
227+
**Recommendation**: Audit tests to determine if these aliases are still needed. If only for test
228+
compatibility, consider updating tests to use the shared types directly.
229+
230+
### 5.2 Tenant-Specific Audit Types (Potentially Redundant)
231+
232+
`services/tenant/adapters/persistence/audit.go` defines:
233+
234+
- `TenantAuditOutbox` - nearly identical to `audit.AuditOutbox`
235+
- `TenantAuditLog` - nearly identical to `audit.AuditLog`
236+
237+
The only difference is these use `string` for RecordID (tenant compatibility). However, the shared
238+
`audit.AuditOutbox` **already uses `string`** for RecordID.
239+
240+
**Recommendation**: These types appear redundant and can likely be removed.
241+
242+
---
243+
244+
## 6. TODO/FIXME Comments (Non-Audit Related)
245+
246+
No TODO/FIXME comments exist in `shared/platform/audit/` directory.
247+
248+
Audit-adjacent TODOs found in services (for reference):
249+
250+
- `services/financial-accounting/service/financial_accounting_service.go:1176` - "TODO: Extract from auth context when available"
251+
252+
---
253+
254+
## 7. Prioritized Recommendations
255+
256+
### High Priority (DRY Violations)
257+
258+
1. **Migrate payment-order to shared audit library** (Est: 2-3 hours)
259+
- Implement `Auditable` interface on `PaymentOrderEntity`
260+
- Update hooks to use `audit.RecordCreate/Update/Delete`
261+
- Remove local `recordAudit()`, `AuditOutbox`, `ErrNilTransaction`
262+
- Update repository audit handling to use shared patterns
263+
264+
### Medium Priority (Consistency)
265+
266+
1. **Standardize migration schemas** (Est: 1-2 hours)
267+
- Create migrations to align `old_values`/`new_values` types (TEXT -> JSONB for party)
268+
- Consider aligning `client_ip` types (VARCHAR -> INET)
269+
270+
2. **Remove redundant tenant audit types** (Est: 30 min)
271+
- Remove `TenantAuditOutbox` and `TenantAuditLog` if truly unused
272+
- Update any tests using these types
273+
274+
### Low Priority (Nice to Have)
275+
276+
1. **Consolidate `getUserIDFromContext`** (Est: 30 min)
277+
- Update `shared/domain/models/base.go` to use `audit.GetUserFromContext()`
278+
279+
2. **Add worker configuration options** (Est: 1 hour)
280+
- Allow customizing batch size, poll interval, max retries per-service
281+
282+
3. **Add batch size metrics** (Est: 30 min)
283+
- Add label to processing duration histogram
284+
285+
---
286+
287+
## 8. Files Reference
288+
289+
### Shared Library
290+
291+
- `/shared/platform/audit/hooks.go` - Core audit recording functions
292+
- `/shared/platform/audit/context.go` - User context extraction
293+
- `/shared/platform/audit/worker.go` - Background processing
294+
- `/shared/platform/audit/metrics.go` - Prometheus metrics
295+
- `/shared/platform/audit/publisher.go` - Kafka publishing
296+
- `/shared/platform/audit/consumer.go` - Kafka consumption
297+
298+
### Service Implementations
299+
300+
- `/services/tenant/adapters/persistence/audit.go`
301+
- `/services/party/adapters/persistence/party_entity.go`
302+
- `/services/payment-order/adapters/persistence/payment_order_entity.go` (needs migration)
303+
- `/services/financial-accounting/adapters/persistence/booking_log_entity.go`
304+
- `/services/financial-accounting/adapters/persistence/ledger_posting_entity.go`
305+
306+
### Migration Files
307+
308+
- `/services/tenant/migrations/20251216000001_initial.sql`
309+
- `/services/position-keeping/migrations/20251217000001_audit_system.sql`
310+
- `/services/party/migrations/20251217000001_audit_system.sql`
311+
- `/services/current-account/migrations/20251216000002_audit_system.sql`
312+
- `/services/current-account/migrations/20251217000001_fix_audit_status_constraint.sql`
313+
- `/services/financial-accounting/migrations/20251217000001_audit_system.sql`
314+
- `/services/payment-order/migrations/20251217000001_audit_system.sql`

0 commit comments

Comments
 (0)