Conversation
Co-authored-by: Gabriel Schubiner <gabriel.schubiner@childmind.org> Co-authored-by: Alex Franco <eng.franco@gmail.com>
Potential fix for pull request finding 'CodeQL / Log Injection' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Potential fix for pull request finding 'CodeQL / Log Injection' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Sets up webhook-driven real-time processing using REDCap Data Entry Triggers, adds FastAPI webhook endpoints for both REDCap-to-REDCap and REDCap-to-Curious pipelines, restructures the Terraform infrastructure, overhauls the caching system with composite keys and content hashing, and fixes several data processing bugs.
Changes
Architecture Migration
POST /webhook/redcap-to-intake(port 8001): Processesintake_ready=1triggers from PID 247→625POST /webhook/redcap-to-curious(port 8002): Processesready_to_send_to_curious=1triggers from PID 625→CuriousGET /health: Health check endpoints on both servicesBackgroundTasks.New Dependencies
fastapi>=0.136.0: Webhook endpoint frameworkuvicorn>=0.44.0: ASGI server for webhook serviceshttpx>=0.28.1: HTTP client for testingpython-multipart>=0.0.26: Form data parsing for webhook payloadsCache System Overhaul
create_composite_cache_key()function joins multiple components with:separator (e.g.,"12345:3:1"for record_id:status:has_response).compute_content_hash()andcompute_dataframe_hash()for deduplication by data content rather than just record ID.create_alert_cache_key()combines alert ID + message hash.create_instrument_cache_key()combines instrument name + file hash + row count.create_ripple_record_cache_key()combines MRN + email hash + date.create_invitation_cache_key()combines record_id + status + has_response.filter_by_cache(),add_cache_keys(),log_cache_statistics()for standardized cache operations across pipelines.DataCacheclass simplified — usestime.time()timestamps, explicit TTL-based cleanup, removed dependency onConfigimport at module level.Bug Fixes
np.nan→float('nan')for missing emails to prevent pd.NA type issues.RedcapRecordPydantic model: Addedfield_validatorto convert floatredcap_repeat_instancevalues to int.parent_involvementset→sorted list conversion before JSON serialization.Invitation Processing Split
invitations_to_redcap.pynow processes responder and child accounts separately with different instruments (curious_account_created_responder,curious_account_created_child).lookup_mrn_from_r_id()function maps responder secret IDs to MRNs via PID 625._csuffix to avoid naming conflicts.instrument,account_context,respondent_id) are stripped before REDCap push and deduplication.main()acceptstarget_pidparameter (625 or 891) for future PID 879 integration.REDCap-to-Curious Refactoring
process_record_for_curious(): New function processes individual records triggered by webhook.clear_ready_flag(): Clears trigger flag after successful processing.event_map(): Builds field→event mapping from EAV data for correct longitudinal updates._prepare_curious_data(): SetsaccountTypefor child (full/limited) and parent (full) DataFrames._push_to_curious(): Orchestrates validation, push, and REDCap update.send_to_curious()no longer usesDataCache(webhook-driven = no polling).REDCap-to-REDCap Refactoring
process_record_for_redcap_operations(): Processes individual records from webhook trigger.format_data_for_redcap_operations(): Extracted formatting logic frommain().push_to_intake_redcap(): Extracted push logic.update_source_redcap_status(): Updates intake_ready flag with correct event name.transform_redcap_data_for_responder_tracking(),build_responders_df(), and related functions moved tofrom_redcap.py.Ripple-to-REDCap Improvements
lastModifiedinjection: Source DataFrame getslastModifiedcolumn before transformation for cache key generation.cache_keyandlastModifieddropped before downstream processing.float('nan')for missing emails: Preventspd.NAtype contamination.Infrastructure
null_resourceprovisioners; generates service files only (no deployment via Terraform).redcap-to-redcap-batch.service.tplandredcap-to-curious-batch.service.tplfor timer-triggered batch processing.redcap-to-redcap.service.tplandredcap-to-curious.service.tplchanged toType=simpleuvicorn services withRestart=always.hbn-sync.service.tpl: UpdatedWants=to reference batch service names.curious-accouts-to-redcap.service.tpl→curious-accounts-to-redcap.service.tpl.tests/directory withtest-terraform.sh,.tflint.hcl,.checkov.yaml,install-tools.sh,Makefile..github/workflows/terraform-test.yamlfor PR/push validation.user_data.sh: EC2 bootstrap script for system setup.New Types & Models
RedcapRecord: Pydantic model for REDCap Data Entry Trigger payload with float→intredcap_repeat_instanceconversion.RedcapTriggerPayload: Extended payload models into_curious.pyandto_redcap.pywith trigger-specific fields.RecordTypeAlias:dict[str, str | int | list | set | tuple]for Curious account records.ColumnRenameMapping: Base class for column rename mapping configurations.AccountContext: Literal type"responder" | "child"for invitation processing.Configuration Changes
Fields.export_operations: New field list for PID 625 exports includingcurious_email_child,curious_password_child,r_id.Fields.rename.redcap_operations_to_curious: New rename mapping for PID 625→Curious (deprecatesredcap247_to_curious).Values.PID625: Addedcurious_account_created_responder_completeandcurious_account_created_child_complete.Tokens.pid879: REDCap PID 879 token for responder tracking.Tokens.pid891: REDCap PID 891 token for Curious outputs (optional/nullable).tsx()now logs stdout and stderr from TypeScript subprocess.Testing
test_cache.py(359 lines): Tests for composite keys, content hashing, DataFrame cache keys, cache filtering, logging, and invitation cache key workflow.test_cache_recovery.py: AddedTestDataCacheclass with 15+ tests for persistence, expiry, corruption, etc.test_redcap.py:TestToCuriousWebhook,TestToRedcapWebhookusing FastAPITestClient.TestProcessRecordForCurious/TestProcessRecordForRedcapOperations: Replace oldTestMainclass.TestClearReadyFlagCurious/TestClearReadyFlagIntake: Tests for flag clearing.TestPrepareCuriousData/TestPushToCurious: Tests for new orchestration functions.TestRedcapRecord/TestRedcapRepeatInstanceConversion: Pydantic model validation tests.TestRippleCacheKeys,TestEmailNanInsteadOfPdNA,TestLastModifiedFlowsThroughSetRedcapColumns,TestHelperColumnsDroppedBeforeDownstream,TestCacheIntegrationWithNewFlow.TestAlertCacheKeys, websocket re-auth tests (test_main_with_reconnect_reauths_on_401_then_succeeds,test_main_with_reconnect_reauth_failure_raises).TestNewCuriousAccount(parent_involvement set→list, account types, error handling).Checklist
pytest)ruff check)mypy)./tests/test-terraform.sh)curl http://localhost:8001/health)parent_involvementserializes correctly (set→sorted list)redcap_repeat_instancefloat values converted to intfloat('nan')notpd.NA