Skip to content

Commit 7c673d1

Browse files
authored
Chore: Add a doc explaining how to fix integration tests (#1431)
Sometimes a failing integration test causes dirty data to be left in the Raw layer, this then causes other tests to fail. This doc explains how to find the bad data and remove it.
1 parent c0b86ad commit 7c673d1

1 file changed

Lines changed: 60 additions & 0 deletions

File tree

docs/failing-integration-tests.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Troubleshooting Failing Integration Tests
2+
3+
## Malformed data in the raw layer
4+
5+
If the raw-to-stage Glue job fails with a JSON parsing error like:
6+
7+
```
8+
Exception Error within function parse_json: Expecting property name enclosed in double quotes
9+
```
10+
11+
This means there is a record with invalid JSON in one of the parsed columns (`extensions`, `user`, `txma`, or `restricted`).
12+
13+
### Finding the malformed data
14+
15+
Run this query in Athena (workgroup: `{env}-dap-txma-processing`):
16+
17+
```sql
18+
SELECT event_id, event_name, extensions, "user", txma, restricted
19+
FROM "{env}-txma-raw"."txma-refactored"
20+
WHERE cast(concat(substr(datecreated, 6,4),substr(datecreated, 17, 2),substr(datecreated, 24, 2)) as int) >= {YYYYMMDD}
21+
AND cast(timestamp as int) > {unix_timestamp}
22+
AND (
23+
(extensions IS NOT NULL AND extensions != '' AND try(json_parse(extensions)) IS NULL)
24+
OR ("user" IS NOT NULL AND "user" != '' AND try(json_parse("user")) IS NULL)
25+
OR (txma IS NOT NULL AND txma != '' AND try(json_parse(txma)) IS NULL)
26+
OR (restricted IS NOT NULL AND restricted != '' AND try(json_parse(restricted)) IS NULL)
27+
)
28+
LIMIT 10
29+
```
30+
31+
Replace `{env}` with the environment (e.g. `build`, `dev`) and set the date/timestamp filters to match the window the Glue job is processing. You can find these values in the Glue job output logs.
32+
33+
### Deleting the malformed data
34+
35+
The raw layer is S3-backed so you cannot delete via Athena. Delete the file directly from S3.
36+
37+
If you don't know the exact date partition, find the file first:
38+
39+
```sh
40+
aws s3api list-objects-v2 \
41+
--bucket {env}-dap-raw-layer \
42+
--prefix "txma-refactored/year=2026/month=05" \
43+
--query "Contents[?contains(Key, '{event_id}')]" \
44+
--profile {profile}
45+
```
46+
47+
Then delete it:
48+
49+
```sh
50+
aws s3api delete-object \
51+
--bucket {env}-dap-raw-layer \
52+
--key "txma-refactored/year={YYYY}/month={MM}/day={DD}/{event_id}.json.gz" \
53+
--profile {profile}
54+
```
55+
txma-refactored/year=2026/month=05/day=20/6485c18c-0d0a-4900-9c75-933508a9e3c4.json.gz
56+
### Common cause
57+
58+
This is typically caused by the unhappy path integration tests (`invalid-json.spec.ts`) which intentionally write malformed JSON to test error handling. If a test run is interrupted before cleanup completes, the malformed record persists. The cleanup only deletes from today's date partition, so stale data from previous days won't be removed automatically.
59+
60+
The global teardown now cleans up the last 7 days of test data to mitigate this, but older stale data may still need manual removal.

0 commit comments

Comments
 (0)