Skip to content

👔 Replace {mrn}_P IDs with r_id IDs#22

Merged
shnizzedy merged 17 commits into
mainfrom
r_id
Apr 23, 2026
Merged

👔 Replace {mrn}_P IDs with r_id IDs#22
shnizzedy merged 17 commits into
mainfrom
r_id

Conversation

@shnizzedy
Copy link
Copy Markdown
Collaborator

Description

Sets up webhook-driven real-time processing using REDCap Data Entry Triggers, adds FastAPI webhook endpoints for both REDCap-to-REDCap and REDCap-to-Curious pipelines, restructures the Terraform infrastructure, overhauls the caching system with composite keys and content hashing, and fixes several data processing bugs.

Changes

Architecture Migration

  • Webhook-driven processing: REDCap-to-REDCap and REDCap-to-Curious pipelines now use FastAPI webhook endpoints triggered by REDCap Data Entry Triggers instead of timer-based polling. Services run as always-on uvicorn servers (ports 8001/8002).
  • New FastAPI endpoints:
    • POST /webhook/redcap-to-intake (port 8001): Processes intake_ready=1 triggers from PID 247→625
    • POST /webhook/redcap-to-curious (port 8002): Processes ready_to_send_to_curious=1 triggers from PID 625→Curious
    • GET /health: Health check endpoints on both services
  • Background task processing: Webhook handlers accept triggers synchronously and process records asynchronously via BackgroundTasks.
  • No caching for webhook pushes: REDCap push operations triggered by webhooks operate directly without caching.

New Dependencies

  • fastapi>=0.136.0: Webhook endpoint framework
  • uvicorn>=0.44.0: ASGI server for webhook services
  • httpx>=0.28.1: HTTP client for testing
  • python-multipart>=0.0.26: Form data parsing for webhook payloads

Cache System Overhaul

  • Composite cache keys: New create_composite_cache_key() function joins multiple components with : separator (e.g., "12345:3:1" for record_id:status:has_response).
  • Content hashing: compute_content_hash() and compute_dataframe_hash() for deduplication by data content rather than just record ID.
  • Alert cache keys: create_alert_cache_key() combines alert ID + message hash.
  • Instrument cache keys: create_instrument_cache_key() combines instrument name + file hash + row count.
  • Ripple cache keys: create_ripple_record_cache_key() combines MRN + email hash + date.
  • Invitation cache keys: create_invitation_cache_key() combines record_id + status + has_response.
  • Helper utilities: filter_by_cache(), add_cache_keys(), log_cache_statistics() for standardized cache operations across pipelines.
  • Cache rewrite: DataCache class simplified — uses time.time() timestamps, explicit TTL-based cleanup, removed dependency on Config import at module level.

Bug Fixes

  • (v1.10.6) Column misalignment in Ripple-to-REDCap: Fixed np.nanfloat('nan') for missing emails to prevent pd.NA type issues.
  • (v1.10.5) RedcapRecord Pydantic model: Added field_validator to convert float redcap_repeat_instance values to int.
  • (v1.10.4) Cache now incorporates full state (composite keys), not just record ID. Fixed parent_involvement set→sorted list conversion before JSON serialization.
  • (v1.10.3) Known missing REDCap fields no longer trigger errors.
  • (v1.10.2) Restored minute-by-minute batch jobs pending AWS permission update.
  • (v1.10.1) WebSocket re-authenticates on token expiry (401). Fixed Curious user creation after project split.
  • (v1.10.0) Added PID 879 auth, new fields for PID 625→Curious, UTC timestamps for Curious API.

Invitation Processing Split

  • Responder vs Child accounts: invitations_to_redcap.py now processes responder and child accounts separately with different instruments (curious_account_created_responder, curious_account_created_child).
  • MRN lookup: New lookup_mrn_from_r_id() function maps responder secret IDs to MRNs via PID 625.
  • Field suffixes: Child account fields get _c suffix to avoid naming conflicts.
  • Metadata column cleanup: Internal fields (instrument, account_context, respondent_id) are stripped before REDCap push and deduplication.
  • Target PID routing: main() accepts target_pid parameter (625 or 891) for future PID 879 integration.

REDCap-to-Curious Refactoring

  • process_record_for_curious(): New function processes individual records triggered by webhook.
  • clear_ready_flag(): Clears trigger flag after successful processing.
  • event_map(): Builds field→event mapping from EAV data for correct longitudinal updates.
  • _prepare_curious_data(): Sets accountType for child (full/limited) and parent (full) DataFrames.
  • _push_to_curious(): Orchestrates validation, push, and REDCap update.
  • Removed caching: send_to_curious() no longer uses DataCache (webhook-driven = no polling).

REDCap-to-REDCap Refactoring

  • process_record_for_redcap_operations(): Processes individual records from webhook trigger.
  • format_data_for_redcap_operations(): Extracted formatting logic from main().
  • push_to_intake_redcap(): Extracted push logic.
  • update_source_redcap_status(): Updates intake_ready flag with correct event name.
  • Responder tracking functions moved: transform_redcap_data_for_responder_tracking(), build_responders_df(), and related functions moved to from_redcap.py.

Ripple-to-REDCap Improvements

  • lastModified injection: Source DataFrame gets lastModified column before transformation for cache key generation.
  • Composite cache keys: Uses MRN + email hash + date instead of just MRN.
  • Helper column cleanup: cache_key and lastModified dropped before downstream processing.
  • float('nan') for missing emails: Prevents pd.NA type contamination.

Infrastructure

  • Terraform simplified: Removed null_resource provisioners; generates service files only (no deployment via Terraform).
  • New service templates: redcap-to-redcap-batch.service.tpl and redcap-to-curious-batch.service.tpl for timer-triggered batch processing.
  • Webhook service templates: redcap-to-redcap.service.tpl and redcap-to-curious.service.tpl changed to Type=simple uvicorn services with Restart=always.
  • hbn-sync.service.tpl: Updated Wants= to reference batch service names.
  • Renamed: curious-accouts-to-redcap.service.tplcurious-accounts-to-redcap.service.tpl.
  • Testing: New tests/ directory with test-terraform.sh, .tflint.hcl, .checkov.yaml, install-tools.sh, Makefile.
  • CI: New .github/workflows/terraform-test.yaml for PR/push validation.
  • user_data.sh: EC2 bootstrap script for system setup.
  • README rewritten: Architecture documentation with Mermaid sequence diagrams for both webhook flows.

New Types & Models

  • RedcapRecord: Pydantic model for REDCap Data Entry Trigger payload with float→int redcap_repeat_instance conversion.
  • RedcapTriggerPayload: Extended payload models in to_curious.py and to_redcap.py with trigger-specific fields.
  • Record TypeAlias: dict[str, str | int | list | set | tuple] for Curious account records.
  • ColumnRenameMapping: Base class for column rename mapping configurations.
  • AccountContext: Literal type "responder" | "child" for invitation processing.

Configuration Changes

  • Fields.export_operations: New field list for PID 625 exports including curious_email_child, curious_password_child, r_id.
  • Fields.rename.redcap_operations_to_curious: New rename mapping for PID 625→Curious (deprecates redcap247_to_curious).
  • Values.PID625: Added curious_account_created_responder_complete and curious_account_created_child_complete.
  • Tokens.pid879: REDCap PID 879 token for responder tracking.
  • Tokens.pid891: REDCap PID 891 token for Curious outputs (optional/nullable).
  • TypeScript logging: tsx() now logs stdout and stderr from TypeScript subprocess.

Testing

  • New test_cache.py (359 lines): Tests for composite keys, content hashing, DataFrame cache keys, cache filtering, logging, and invitation cache key workflow.
  • Expanded test_cache_recovery.py: Added TestDataCache class with 15+ tests for persistence, expiry, corruption, etc.
  • New webhook tests in test_redcap.py: TestToCuriousWebhook, TestToRedcapWebhook using FastAPI TestClient.
  • TestProcessRecordForCurious / TestProcessRecordForRedcapOperations: Replace old TestMain class.
  • TestClearReadyFlagCurious / TestClearReadyFlagIntake: Tests for flag clearing.
  • TestPrepareCuriousData / TestPushToCurious: Tests for new orchestration functions.
  • TestRedcapRecord / TestRedcapRepeatInstanceConversion: Pydantic model validation tests.
  • Ripple tests: TestRippleCacheKeys, TestEmailNanInsteadOfPdNA, TestLastModifiedFlowsThroughSetRedcapColumns, TestHelperColumnsDroppedBeforeDownstream, TestCacheIntegrationWithNewFlow.
  • Invitation tests: Rewritten for responder/child split, MRN lookup mocking, metadata field cleanup verification.
  • Alert tests: TestAlertCacheKeys, websocket re-auth tests (test_main_with_reconnect_reauths_on_401_then_succeeds, test_main_with_reconnect_reauth_failure_raises).
  • Utility tests: TestNewCuriousAccount (parent_involvement set→list, account types, error handling).

Checklist

  • Python tests pass (pytest)
  • Linting passes (ruff check)
  • Type checking passes (mypy)
  • Terraform tests pass (./tests/test-terraform.sh)
  • Webhook endpoints respond to health checks (curl http://localhost:8001/health)
  • REDCap Data Entry Triggers configured and tested
  • Webhook processes records correctly on trigger
  • Ready flags cleared after successful processing
  • Batch timer services still function as fallback
  • WebSocket re-authenticates on 401 errors
  • Composite cache keys distinguish different record states
  • parent_involvement serializes correctly (set→sorted list)
  • redcap_repeat_instance float values converted to int
  • Ripple missing emails produce float('nan') not pd.NA
  • Metadata fields stripped before REDCap push and deduplication
  • Responder and child invitation records have correct field suffixes

Comment thread python_jobs/src/hbnmigration/from_curious/invitations_to_redcap.py Dismissed
Comment thread python_jobs/src/hbnmigration/from_curious/invitations_to_redcap.py Dismissed
Comment thread python_jobs/src/hbnmigration/from_curious/invitations_to_redcap.py Dismissed
Comment thread python_jobs/src/hbnmigration/from_curious/invitations_to_redcap.py Dismissed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/utility_functions/logging.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Potential fix for pull request finding 'CodeQL / Log Injection'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_curious.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Comment thread python_jobs/src/hbnmigration/from_redcap/to_redcap.py Fixed
Potential fix for pull request finding 'CodeQL / Log Injection'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@shnizzedy shnizzedy merged commit 21836cf into main Apr 23, 2026
3 checks passed
@shnizzedy shnizzedy deleted the r_id branch April 23, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants