Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions src/routes/sync.schemas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,10 @@ export const pushChangeSchema = z.object({
tableName: sqlIdentifier,
rowPks: z.string(), // JSON string
columnName: sqlIdentifier.nullable(),
hlcTimestamp: z.string(),
hlcTimestamp: z.string(), // Transaction-scope HLC; shared by every change from one sender-side tx

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate hlcTimestamp structurally instead of accepting any string.

On Line 45, hlcTimestamp: z.string() is too permissive now that HLC is the primary transaction/ordering key. Malformed values can break ordering assumptions in push conflict handling.

Suggested hardening
 export const pushChangeSchema = z.object({
   tableName: sqlIdentifier,
   rowPks: z.string(), // JSON string
   columnName: sqlIdentifier.nullable(),
-  hlcTimestamp: z.string(), // Transaction-scope HLC; shared by every change from one sender-side tx
+  hlcTimestamp: z
+    .string()
+    .min(1)
+    // Replace with your canonical HLC regex/parser if available.
+    .regex(/^[^\\s]+$/, 'invalid hlcTimestamp'),
   deviceId: z.string().optional(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hlcTimestamp: z.string(), // Transaction-scope HLC; shared by every change from one sender-side tx
export const pushChangeSchema = z.object({
tableName: sqlIdentifier,
rowPks: z.string(), // JSON string
columnName: sqlIdentifier.nullable(),
hlcTimestamp: z
.string()
.min(1)
// Replace with your canonical HLC regex/parser if available.
.regex(/^[^\s]+$/, 'invalid hlcTimestamp'),
deviceId: z.string().optional(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/sync.schemas.ts` at line 45, The hlcTimestamp field currently uses
hlcTimestamp: z.string() which is too permissive; replace it with a structural
validator—either z.string().regex(...) with the canonical HLC pattern or
z.string().refine(value => isValidHlc(value), { message: 'invalid HLC timestamp'
}) and add a small helper isValidHlc or parseHlcTimestamp that enforces the HLC
format/components (e.g., logical counter, wall-time, node id) so malformed
values are rejected by the schema; update the schema entry for hlcTimestamp to
use that validator and return a clear error message on failure.

deviceId: z.string().optional(),
encryptedValue: z.string().nullable(),
nonce: z.string().nullable(),
batchId: z.string().optional(), // UUID for grouping related changes
batchSeq: z.number().int().positive().optional(), // 1-based sequence within batch
batchTotal: z.number().int().positive().optional(), // Total changes in this batch
// MLS epoch (for shared spaces with epoch-key encryption)
epoch: z.number().int().nonnegative().optional().nullable(),
// Space-specific fields (required for space pushes, ignored for personal vaults)
Expand Down
32 changes: 12 additions & 20 deletions src/routes/sync.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,8 @@ import { getServerIdentity } from '../services/serverIdentity'
import vaultRoutes from './sync.vaults'
import { broadcastToSpace } from './ws'

import { validateBatches } from '../utils/syncUtils'

// Re-export sync utilities
export { validateBatches, type SyncChange, type BatchValidationError } from '../utils/syncUtils'
export { type SyncChange } from '../utils/syncUtils'

const sync = new Hono()

Expand All @@ -38,9 +36,13 @@ function getCallerDid(c: any): string | null {

/**
* POST /sync/push
* Push CRDT changes to server with unencrypted metadata for deduplication
* Uses INSERT ... ON CONFLICT DO UPDATE to keep only latest value per cell
* Validates batch completeness if batchId/batchSeq/batchTotal are provided
* Push CRDT changes to server with unencrypted metadata for deduplication.
* Uses INSERT ... ON CONFLICT DO UPDATE to keep only latest value per cell.
*
* HLC atomicity: every change in a push shares its sender-side transaction HLC,
* and all changes in one request are persisted in a single db.transaction() —
* so an HLC-group is never split across partial inserts. The client chunks
* at HLC boundaries, the server never splits what the client sends.
*/
sync.post('/push', zValidator('json', pushChangesSchema), async (c) => {
const { spaceId, changes: rawChanges } = c.req.valid('json')
Expand Down Expand Up @@ -121,20 +123,10 @@ sync.post('/push', zValidator('json', pushChangesSchema), async (c) => {
spaceAuthenticatedPublicKey = authenticatedPublicKey
}

// Validate batch completeness + duplicate-sequence detection.
//
// Delegates to validateBatches() in utils/syncUtils.ts — the previous
// inline duplicate of this logic checked completeness before duplicates,
// so a batch like seq=[1,1,3] with batchTotal=3 reported "Incomplete
// batch (missing 2)" instead of "Duplicate sequence numbers", masking
// the real client bug. The shared util has the correct check order.
const batchError = validateBatches(changes)
if (batchError) {
return c.json(batchError, 400)
}

// All batches are complete - apply changes atomically in a transaction
// This ensures either ALL changes are applied or NONE (on error/constraint violation)
// Apply changes atomically in a transaction so either ALL changes are
// applied or NONE. HLC-group atomicity is guaranteed by the client
// chunking at HLC boundaries before calling this endpoint — the server
// trusts the sender's grouping and does no completeness verification.
Comment on lines +126 to +129

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce the single-HLC request invariant server-side.

Line 126-Line 129 explicitly trusts client chunking, but there is no guard against mixed hlcTimestamp values in one request. That can violate the documented contract and make lastHlc ambiguous.

Suggested guard before transaction
   const changes = rawChanges as PushChange[]
+  const uniqueHlc = new Set(changes.map((c) => c.hlcTimestamp))
+  if (uniqueHlc.size > 1) {
+    return c.json({ error: 'All changes in one push request must share the same hlcTimestamp' }, 400)
+  }

   try {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/sync.ts` around lines 126 - 129, Before starting the transaction,
validate that all incoming records in the sync request have the same
hlcTimestamp to enforce the single-HLC invariant: iterate the array payload
(e.g., request body field that contains changes/ops), collect the distinct
hlcTimestamp values and if more than one is present return a 400 error (do not
start the transaction) with an explanatory message; this check should be placed
in the sync route handler just before the transaction block that uses lastHlc so
lastHlc remains unambiguous.

//
// PostgreSQL has a limit of 65534 parameters per query.
// Each change has ~9 parameters, so we can safely insert ~5000 changes per query.
Expand Down
78 changes: 6 additions & 72 deletions src/utils/syncUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@
*/

/**
* Type for sync change validation
* Type for sync change wire format.
*
* Every change carries its sender-side transaction HLC. Changes sharing the
* same `hlcTimestamp` originated in the same local transaction and must be
* applied together on the receiver; the client guarantees atomic delivery
* by chunking push requests at HLC boundaries.
*/
export type SyncChange = {
tableName: string
Expand All @@ -14,75 +19,4 @@ export type SyncChange = {
deviceId?: string
encryptedValue?: string | null
nonce?: string | null
batchId?: string
batchSeq?: number
batchTotal?: number
}

/**
* Batch validation error type
*/
export type BatchValidationError = {
error: string
batchId: string
missingSequences?: number[]
expected?: number
received?: number
}

/**
* Validate batch completeness for sync changes.
* Returns null if all batches are valid, or an error object if validation fails.
*
* Validates:
* - All sequence numbers from 1 to batchTotal are present
* - No duplicate sequence numbers within a batch
*/
export function validateBatches(changes: SyncChange[]): BatchValidationError | null {
const batchMap = new Map<string, SyncChange[]>()

// Group changes by batchId
for (const change of changes) {
if (change.batchId && change.batchSeq && change.batchTotal) {
if (!batchMap.has(change.batchId)) {
batchMap.set(change.batchId, [])
}
batchMap.get(change.batchId)!.push(change)
}
}

// Validate each batch is complete
for (const [batchId, batchChanges] of batchMap.entries()) {
const batchTotal = batchChanges[0]?.batchTotal
if (!batchTotal) continue

// Check for duplicate sequence numbers first
const sequences = new Set(batchChanges.map(c => c.batchSeq))
if (sequences.size !== batchChanges.length) {
return {
error: 'Duplicate sequence numbers in batch',
batchId,
}
}

// Check we have all sequence numbers from 1 to batchTotal
const missingSeqs: number[] = []
for (let i = 1; i <= batchTotal; i++) {
if (!sequences.has(i)) {
missingSeqs.push(i)
}
}

if (missingSeqs.length > 0) {
return {
error: 'Incomplete batch',
batchId,
missingSequences: missingSeqs,
expected: batchTotal,
received: batchChanges.length,
}
}
}

return null
}
141 changes: 0 additions & 141 deletions tests/syncBatchValidation.test.ts

This file was deleted.

Loading