|
| 1 | +# Troubleshooting Failing Integration Tests |
| 2 | + |
| 3 | +## Malformed data in the raw layer |
| 4 | + |
| 5 | +If the raw-to-stage Glue job fails with a JSON parsing error like: |
| 6 | + |
| 7 | +``` |
| 8 | +Exception Error within function parse_json: Expecting property name enclosed in double quotes |
| 9 | +``` |
| 10 | + |
| 11 | +This means there is a record with invalid JSON in one of the parsed columns (`extensions`, `user`, `txma`, or `restricted`). |
| 12 | + |
| 13 | +### Finding the malformed data |
| 14 | + |
| 15 | +Run this query in Athena (workgroup: `{env}-dap-txma-processing`): |
| 16 | + |
| 17 | +```sql |
| 18 | +SELECT event_id, event_name, extensions, "user", txma, restricted |
| 19 | +FROM "{env}-txma-raw"."txma-refactored" |
| 20 | +WHERE cast(concat(substr(datecreated, 6,4),substr(datecreated, 17, 2),substr(datecreated, 24, 2)) as int) >= {YYYYMMDD} |
| 21 | + AND cast(timestamp as int) > {unix_timestamp} |
| 22 | + AND ( |
| 23 | + (extensions IS NOT NULL AND extensions != '' AND try(json_parse(extensions)) IS NULL) |
| 24 | + OR ("user" IS NOT NULL AND "user" != '' AND try(json_parse("user")) IS NULL) |
| 25 | + OR (txma IS NOT NULL AND txma != '' AND try(json_parse(txma)) IS NULL) |
| 26 | + OR (restricted IS NOT NULL AND restricted != '' AND try(json_parse(restricted)) IS NULL) |
| 27 | + ) |
| 28 | +LIMIT 10 |
| 29 | +``` |
| 30 | + |
| 31 | +Replace `{env}` with the environment (e.g. `build`, `dev`) and set the date/timestamp filters to match the window the Glue job is processing. You can find these values in the Glue job output logs. |
| 32 | + |
| 33 | +### Deleting the malformed data |
| 34 | + |
| 35 | +The raw layer is S3-backed so you cannot delete via Athena. Delete the file directly from S3. |
| 36 | + |
| 37 | +If you don't know the exact date partition, find the file first: |
| 38 | + |
| 39 | +```sh |
| 40 | +aws s3api list-objects-v2 \ |
| 41 | + --bucket {env}-dap-raw-layer \ |
| 42 | + --prefix "txma-refactored/year=2026/month=05" \ |
| 43 | + --query "Contents[?contains(Key, '{event_id}')]" \ |
| 44 | + --profile {profile} |
| 45 | +``` |
| 46 | + |
| 47 | +Then delete it: |
| 48 | + |
| 49 | +```sh |
| 50 | +aws s3api delete-object \ |
| 51 | + --bucket {env}-dap-raw-layer \ |
| 52 | + --key "txma-refactored/year={YYYY}/month={MM}/day={DD}/{event_id}.json.gz" \ |
| 53 | + --profile {profile} |
| 54 | +``` |
| 55 | +txma-refactored/year=2026/month=05/day=20/6485c18c-0d0a-4900-9c75-933508a9e3c4.json.gz |
| 56 | +### Common cause |
| 57 | + |
| 58 | +This is typically caused by the unhappy path integration tests (`invalid-json.spec.ts`) which intentionally write malformed JSON to test error handling. If a test run is interrupted before cleanup completes, the malformed record persists. The cleanup only deletes from today's date partition, so stale data from previous days won't be removed automatically. |
| 59 | + |
| 60 | +The global teardown now cleans up the last 7 days of test data to mitigate this, but older stale data may still need manual removal. |
0 commit comments