Skip to content

Commit ced9536

Browse files
committed
Mention functionality to delete non-current croissant files and links from S3 and Synapse table
1 parent aa14674 commit ced9536

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

dags/synapse_dataset_to_croissant_minimal.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
- (S3, Authenticated) For each dataset, upload the minimal JSON-LD file to the `synapse-croissant-metadata-minimal` public S3 bucket in the `org-sagebase-dpe-prod` AWS account
99
- (Synapse, Authenticated) For each dataset, query the Synapse table to check if a link to the S3 object already exists
1010
- (Synapse, Authenticated) Store or update the S3 object URL in the Synapse table for each dataset
11+
- (S3, Authenticated) Delete non-current croissant files from S3 that no longer correspond to datasets in the data catalog
12+
- (Synapse, Authenticated) Delete non-current croissant file links from the Synapse table that no longer correspond to datasets in the data catalog
1113
1214
1315
This DAG addresses the issue where Google has difficulty indexing Croissant JSON embedded in portal pages.

0 commit comments

Comments
 (0)