Summary
The NDD adapter in src/anonymizer/engine/ndd/adapter.py wraps Data Designer calls but has two paths where records are dropped silently. Additionally, there is no retry mechanism for failed records despite the adapter already tracking them via WorkflowRunResult.failed_records.
Silent failure paths
1. Preview mode returns empty DataFrame on None dataset
In NddAdapter.run_workflow(), if preview_results.dataset is None, the adapter returns an empty DataFrame (workflow_input_df.iloc[0:0].copy()) with no log message explaining why or how many records were lost.
2. Missing RECORD_ID_COLUMN causes detection bypass
_detect_missing_records() only flags omissions if RECORD_ID_COLUMN is present in the output. If Data Designer strips this column, the method short-circuits silently:
if RECORD_ID_COLUMN not in output_df.columns:
return [] # all missing records go unreported
Missing retry logic
The adapter already has the infrastructure needed for retries:
WorkflowRunResult.failed_records tracks every record that didn't appear in output, with record_id, step, and reason
_compute_record_id() generates deterministic UUID5s so failed rows can be precisely identified and re-submitted
rewrite_workflow.py already accumulates all_failed across steps
But nothing uses this to retry. Failed records are logged and discarded.
Proposed fix
Visibility:
Retry loop in NddAdapter.run_workflow() (or caller in rewrite_workflow.py):
Notes
The RAT-Bench 33-record run lost 4 records due to nvidia/nemotron-3-nano-30b-a3b internal server errors with 3–6 hour hangs — a retry with backoff would have recovered these automatically.
Summary
The NDD adapter in
src/anonymizer/engine/ndd/adapter.pywraps Data Designer calls but has two paths where records are dropped silently. Additionally, there is no retry mechanism for failed records despite the adapter already tracking them viaWorkflowRunResult.failed_records.Silent failure paths
1. Preview mode returns empty DataFrame on None dataset
In
NddAdapter.run_workflow(), ifpreview_results.dataset is None, the adapter returns an empty DataFrame (workflow_input_df.iloc[0:0].copy()) with no log message explaining why or how many records were lost.2. Missing RECORD_ID_COLUMN causes detection bypass
_detect_missing_records()only flags omissions ifRECORD_ID_COLUMNis present in the output. If Data Designer strips this column, the method short-circuits silently:Missing retry logic
The adapter already has the infrastructure needed for retries:
WorkflowRunResult.failed_recordstracks every record that didn't appear in output, withrecord_id,step, andreason_compute_record_id()generates deterministic UUID5s so failed rows can be precisely identified and re-submittedrewrite_workflow.pyalready accumulatesall_failedacross stepsBut nothing uses this to retry. Failed records are logged and discarded.
Proposed fix
Visibility:
preview_results.dataset is NoneRECORD_ID_COLUMNis missing from output, warn that missing-record detection is disabled rather than returning empty silentlyRetry loop in
NddAdapter.run_workflow()(or caller inrewrite_workflow.py):run_workflow()call, checkresult.failed_recordsretry_attempt < max_retries, extract the failed rows from the original input DataFrame using their record IDs and re-submit themall_failedwith reason"max_retries_exceeded"max_retriesas a config parameter (suggested default: 2)"Retrying %d failed records (attempt %d/%d)"Notes
The RAT-Bench 33-record run lost 4 records due to
nvidia/nemotron-3-nano-30b-a3binternal server errors with 3–6 hour hangs — a retry with backoff would have recovered these automatically.