Skip to content

Commit 2e1c7bd

Browse files
botanicalJennifer Tran
and
Jennifer Tran
authored
feat: create notebook that publishes collection and item ingest for most collections (#111)
* fix: update old links in metadata reconciliation notebook * fix: metadata change for cell description * fix: update cell descriptions * fix: update based on feedback to use domain names, remove extraneous code, clean up --------- Co-authored-by: Jennifer Tran <[email protected]>
1 parent 867148e commit 2e1c7bd

File tree

2 files changed

+280
-2
lines changed

2 files changed

+280
-2
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Notebook to Publish Items and Collections"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"This notebook publishes the collections in `/ingestion-data/collections` excluding:\n",
15+
"- 'hls-l30-002-ej-reprocessed'\n",
16+
"- 'hls-s30-002-ej-reprocessed'\n",
17+
"- 'ls8-covid-19-example-data'\n",
18+
"- 'landsat-c2l2-sr-antarctic-glaciers-pine-island'\n",
19+
"- 'landsat-c2l2-sr-lakes-aral-sea'\n",
20+
"- 'landsat-c2l2-sr-lakes-tonle-sap'\n",
21+
"- 'landsat-c2l2-sr-lakes-lake-balaton'\n",
22+
"- 'landsat-c2l2-sr-lakes-vanern'\n",
23+
"- 'landsat-c2l2-sr-antarctic-glaciers-thwaites'\n",
24+
"- 'landsat-c2l2-sr-lakes-lake-biwa'\n",
25+
"- 'combined_CMIP6_daily_GISS-E2-1-G_tas_kerchunk_DEMO'"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {},
32+
"outputs": [],
33+
"source": [
34+
"import glob\n",
35+
"import json\n",
36+
"import requests\n",
37+
"from cognito_client import CognitoClient"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"The following cell retrieves collection JSON files from `/ingestion-data/collections/` and save collectionIds to a list."
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"excluded_collections = [\n",
54+
" \"hls-l30-002-ej-reprocessed\",\n",
55+
" \"hls-s30-002-ej-reprocessed\",\n",
56+
" \"ls8-covid-19-example-data\",\n",
57+
" \"landsat-c2l2-sr-antarctic-glaciers-pine-island\",\n",
58+
" \"landsat-c2l2-sr-lakes-aral-sea\",\n",
59+
" \"landsat-c2l2-sr-lakes-tonle-sap\",\n",
60+
" \"landsat-c2l2-sr-lakes-lake-balaton\",\n",
61+
" \"landsat-c2l2-sr-lakes-vanern\",\n",
62+
" \"landsat-c2l2-sr-antarctic-glaciers-thwaites\",\n",
63+
" \"landsat-c2l2-sr-lakes-lake-biwa\",\n",
64+
" \"combined_CMIP6_daily_GISS-E2-1-G_tas_kerchunk_DEMO\",\n",
65+
"]\n",
66+
"\n",
67+
"json_file_paths = glob.glob(\"../ingestion-data/collections/*.json\")\n",
68+
"filtered_list = [\n",
69+
" item\n",
70+
" for item in json_file_paths\n",
71+
" if all(\n",
72+
" excluded_collections not in item\n",
73+
" for excluded_collections in excluded_collections\n",
74+
" )\n",
75+
"]\n",
76+
"\n",
77+
"file_paths_and_collection_ids = [\n",
78+
" {\"filePath\": file_path, \"collectionId\": data[\"id\"]}\n",
79+
" for file_path in filtered_list\n",
80+
" if \"id\" in (data := json.load(open(file_path, \"r\")))\n",
81+
"]"
82+
]
83+
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {},
87+
"source": [
88+
"Set the testing mode to `True` when testing and `False` otherwise. When the testing mode is `True`, the notebook will be set to run against `dev` endpoints."
89+
]
90+
},
91+
{
92+
"cell_type": "code",
93+
"execution_count": null,
94+
"metadata": {},
95+
"outputs": [],
96+
"source": [
97+
"testing_mode = True"
98+
]
99+
},
100+
{
101+
"cell_type": "markdown",
102+
"metadata": {},
103+
"source": [
104+
"Have your Cognito `username` and `password` ready to set up Cognito Client to retrieve a token that will be used to access the STAC Ingestor API."
105+
]
106+
},
107+
{
108+
"cell_type": "code",
109+
"execution_count": null,
110+
"metadata": {},
111+
"outputs": [],
112+
"source": [
113+
"test_endpoint = \"https://test.openveda.cloud\"\n",
114+
"test_client_id = \"CHANGE ME\"\n",
115+
"test_user_pool_id = \"CHANGE ME\"\n",
116+
"test_identity_pool_id = \"CHANGE ME\"\n",
117+
"\n",
118+
"mcp_prod_endpoint = \"https://openveda.cloud\"\n",
119+
"mcp_prod_client_id = \"CHANGE ME\"\n",
120+
"mcp_prod_user_pool_id = \"CHANGE ME\"\n",
121+
"mcp_prod_identity_pool_id = \"CHANGE ME\"\n",
122+
"\n",
123+
"if testing_mode:\n",
124+
" STAC_INGESTOR_API = f\"{test_endpoint}/api/ingest/\"\n",
125+
" VEDA_STAC_API = f\"{test_endpoint}/api/stac/\"\n",
126+
"else:\n",
127+
" STAC_INGESTOR_API = f\"{mcp_prod_endpoint}/api/ingest/\"\n",
128+
" VEDA_STAC_API = f\"{mcp_prod_endpoint}/api/stac/\"\n",
129+
"\n",
130+
"client = CognitoClient(\n",
131+
" client_id=test_client_id if testing_mode else mcp_prod_client_id,\n",
132+
" user_pool_id=test_user_pool_id if testing_mode else mcp_prod_user_pool_id,\n",
133+
" identity_pool_id=test_identity_pool_id\n",
134+
" if testing_mode\n",
135+
" else mcp_prod_identity_pool_id,\n",
136+
")\n",
137+
"_ = client.login()"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"metadata": {},
143+
"source": [
144+
"The following cell sets up headers for requests."
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": null,
150+
"metadata": {},
151+
"outputs": [],
152+
"source": [
153+
"TOKEN = client.access_token\n",
154+
"\n",
155+
"authorization_header = f\"Bearer {TOKEN}\"\n",
156+
"headers = {\n",
157+
" \"Authorization\": authorization_header,\n",
158+
" \"content-type\": \"application/json\",\n",
159+
" \"accept\": \"application/json\",\n",
160+
"}"
161+
]
162+
},
163+
{
164+
"cell_type": "markdown",
165+
"metadata": {},
166+
"source": [
167+
"The following cell defines the function that will post the collection."
168+
]
169+
},
170+
{
171+
"cell_type": "code",
172+
"execution_count": null,
173+
"metadata": {},
174+
"outputs": [],
175+
"source": [
176+
"def post_collection(collection, collection_id):\n",
177+
" collection_url = f\"{VEDA_STAC_API}collections/{collection_id}\"\n",
178+
" ingest_url = f\"{STAC_INGESTOR_API}collections\"\n",
179+
"\n",
180+
" try:\n",
181+
" response = requests.post(ingest_url, json=collection, headers=headers)\n",
182+
" response.raise_for_status()\n",
183+
" if response.status_code == 201:\n",
184+
" print(\n",
185+
" f\"Request was successful. Find the updated collection at {collection_url}\"\n",
186+
" )\n",
187+
" else:\n",
188+
" print(\n",
189+
" f\"Updating {collection_id} failed. Request failed with status code: {response.status_code}\"\n",
190+
" )\n",
191+
" except requests.RequestException as e:\n",
192+
" print(\n",
193+
" f\"Updating {collection_id} failed. An error occurred during the request: {e}\"\n",
194+
" )\n",
195+
" except Exception as e:\n",
196+
" print(\n",
197+
" f\"An unexpected error occurred while trying to update {collection_id}: {e}\"\n",
198+
" )"
199+
]
200+
},
201+
{
202+
"cell_type": "markdown",
203+
"metadata": {},
204+
"source": [
205+
"If testing_mode is enabled, use a test list:"
206+
]
207+
},
208+
{
209+
"cell_type": "code",
210+
"execution_count": null,
211+
"metadata": {},
212+
"outputs": [],
213+
"source": [
214+
"test_file_paths_and_collection_ids = [file_paths_and_collection_ids[0]]\n",
215+
"print(test_file_paths_and_collection_ids)\n",
216+
"print(VEDA_STAC_API)\n",
217+
"\n",
218+
"\n",
219+
"file_paths_and_collection_ids = (\n",
220+
" test_file_paths_and_collection_ids\n",
221+
" if testing_mode\n",
222+
" else file_paths_and_collection_ids\n",
223+
")"
224+
]
225+
},
226+
{
227+
"cell_type": "markdown",
228+
"metadata": {},
229+
"source": [
230+
"The following cell publishes the collection to the target ingestion `api/collections` endpoint."
231+
]
232+
},
233+
{
234+
"cell_type": "code",
235+
"execution_count": null,
236+
"metadata": {},
237+
"outputs": [],
238+
"source": [
239+
"for item in file_paths_and_collection_ids:\n",
240+
" collection_id = item[\"collectionId\"]\n",
241+
" file_path = item[\"filePath\"]\n",
242+
"\n",
243+
" try:\n",
244+
" with open(file_path, \"r\", encoding=\"utf-8\") as file:\n",
245+
" collection = json.load(file)\n",
246+
"\n",
247+
" # Publish the updated collection to the target ingestion `api/collections` endpoint\n",
248+
" post_collection(collection, collection_id)\n",
249+
"\n",
250+
" except requests.RequestException as e:\n",
251+
" print(f\"An error occurred for collectionId {collection_id}: {e}\")\n",
252+
" except Exception as e:\n",
253+
" print(f\"An unexpected error occurred for collectionId {collection_id}: {e}\")"
254+
]
255+
}
256+
],
257+
"metadata": {
258+
"kernelspec": {
259+
"display_name": "venv",
260+
"language": "python",
261+
"name": "python3"
262+
},
263+
"language_info": {
264+
"codemirror_mode": {
265+
"name": "ipython",
266+
"version": 3
267+
},
268+
"file_extension": ".py",
269+
"mimetype": "text/x-python",
270+
"name": "python",
271+
"nbconvert_exporter": "python",
272+
"pygments_lexer": "ipython3",
273+
"version": "3.11.7"
274+
}
275+
},
276+
"nbformat": 4,
277+
"nbformat_minor": 2
278+
}

transformation-scripts/collection-metadata-reconciliation.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
"metadata": {},
7878
"outputs": [],
7979
"source": [
80-
"dev_endpoint = \"https://dev.delta-backend.com/\"\n",
80+
"dev_endpoint = \"https://dev.openveda.cloud\"\n",
8181
"dev_client_id = \"CHANGE ME\"\n",
8282
"dev_user_pool_id = \"CHANGE ME\"\n",
8383
"dev_identity_pool_id = \"CHANGE ME\"\n",
@@ -88,7 +88,7 @@
8888
"staging_identity_pool_id = \"CHANGE ME\"\n",
8989
"\n",
9090
"ingestor_staging_url = \"https://ig9v64uky8.execute-api.us-west-2.amazonaws.com/staging/\"\n",
91-
"ingestor_dev_url = \"https://dev.delta-backend.com/\"\n",
91+
"ingestor_dev_url = \"https://dev.openveda.cloud\"\n",
9292
"\n",
9393
"if testing_mode:\n",
9494
" STAC_INGESTOR_API = ingestor_dev_url\n",

0 commit comments

Comments
 (0)