A highly scalable, production-grade Transaction Reconciliation Engine built to detect discrepancies between internal system records and external exchange reports.
Designed with an emphasis on high-throughput memory efficiency, fault tolerance, and infrastructure-as-code security, this engine leverages modern Node.js streaming APIs and a distributed micro-architecture.
The system is deployed entirely serverless on AWS, ensuring zero-downtime scalability and highly optimized cost efficiency.
Traditional CSV parsing loads entire files into memory, leading to catastrophic OutOfMemory (OOM) crashes under heavy load. This engine implements an advanced for-await iteration pipeline using native Node.js streams. It reads chunks iteratively, applies on-the-fly normalization, and yields memory immediately, allowing the engine to process multi-gigabyte transaction datasets within the strict memory constraints of an AWS Lambda environment.
Instead of naive O(N^2) cross-referencing, the matching engine utilizes a layered, hash-based deterministic approach:
- Pass 1 (Exact Match):
O(1)Hash-map lookups based on globally unique composite keys(TransactionID + Amount + Date). - Pass 2 (Fuzzy Date Match): Sliding window temporal matching for transactions delayed across midnights.
- Pass 3 (Orphan Detection): Remainder processing to accurately flag
MISSING_IN_INTERNALandMISSING_IN_EXCHANGEdiscrepancies.
The deployment pipeline (ci.yml, cd.yml) utilizes Turborepo to intelligently cache linting, type-checking, and test results. By hashing the AST and dependency trees, redundant executions are skipped. We consolidated the CI and CD steps into a highly optimized build-and-deploy job, cutting deployment times by over 60%.
Infrastructure provisioning is strictly controlled via granular IAM policies. The Serverless deployment executes under an IAM role explicitly restricted to:
- Resource-constrained
iam:PassRolelimits. - Deterministic deployment bucket constraints (
reconciliation-engine-deploy-*). - Scoped
cloudformation,lambda, andapigatewayARNs. - Strict
TagResourcevalidation for organizational compliance.
The engine relies on MongoDB for persistence, utilizing heavily indexed collections to support lightning-fast reporting queries and aggregate data generation.
Tracks the metadata, status, and aggregate statistics of a singular reconciliation execution.
{
_id: ObjectId,
status: 'PENDING' | 'PROCESSING' | 'COMPLETED' | 'FAILED',
startTime: Date,
endTime: Date,
summary: {
totalInternalRecords: Number,
totalExchangeRecords: Number,
matchedCount: Number,
discrepancyCount: Number,
}
}- Indexes:
{ status: 1, startTime: -1 }(Optimized for dashboard polling)
Stores the normalized, atomic transactions from both sources.
{
_id: ObjectId,
runId: ObjectId, // Ref: ReconciliationRun
source: 'INTERNAL' | 'EXCHANGE',
transactionId: String,
amount: Number,
currency: String,
timestamp: Date,
rawMetadata: Object // Schema-less blob for audit trails
}- Indexes:
{ runId: 1, transactionId: 1 }(Compound unique index to prevent duplicate ingestion per run){ runId: 1, source: 1 }(Optimized for filtering by source during matching phase)
The highly-optimized discrepancy ledger.
{
_id: ObjectId,
runId: ObjectId, // Ref: ReconciliationRun
status: 'MATCHED' | 'DISCREPANCY' | 'MISSING_IN_INTERNAL' | 'MISSING_IN_EXCHANGE',
internalTransactionId: String, // (Optional)
exchangeTransactionId: String, // (Optional)
discrepancyReasons: [String], // e.g., ["AMOUNT_MISMATCH", "DATE_MISMATCH"]
deltaAmount: Number
}- Indexes:
{ runId: 1, status: 1 }(Critical for fast report generation and pagination)
Generating discrepancy reports utilizes native MongoDB aggregation pipelines rather than memory-heavy application mapping:
$match: Filters byrunIdandstatus != 'MATCHED'.$group: Groups bydiscrepancyReasonsto generate analytical health dashboards.$lookup: (Optional) Left-joins theTransactioncollection to hydrate raw audit data into the final payload.
reconciliation-engine/
├── apps/
│ └── api/ # Express.js REST API & Serverless Framework config
├── packages/
│ ├── engine/ # Core reconciliation logic & streaming parsers
│ ├── database/ # Mongoose models, schemas, and DB connection logic
│ ├── logger/ # Structured Pino logging utility
│ └── types/ # Shared TypeScript definitions (DTOs)
├── .github/
│ └── workflows/ # CI/CD GitHub Actions pipelines
├── turbo.json # Turborepo task runner configuration
└── package.json # Root workspace definitions (pnpm)
-
Install Dependencies:
pnpm install
-
Start Infrastructure (MongoDB):
docker-compose up -d
-
Run Development Server:
pnpm turbo run dev
-
Run Test Suite:
pnpm turbo run test
Designed and engineered for financial-grade data integrity and uncompromised cloud security.