Skip to content

Commit 567ee5d

Browse files
Update steps order
Signed-off-by: Andy Kwok <[email protected]>
1 parent 247d7ba commit 567ee5d

1 file changed

Lines changed: 25 additions & 44 deletions

File tree

notebooks/import_s3_table_embedding_demo.ipynb

Lines changed: 25 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -176,11 +176,11 @@
176176
"cell_type": "markdown",
177177
"metadata": {},
178178
"source": [
179-
"### Upload dataset\n",
179+
"### Upload Dataset and Register in Athena\n",
180180
"\n",
181-
"Once the embedding column has been added, the enriched dataset is uploaded to Amazon S3.\n",
181+
"After the embedding column is added, the enriched dataset is uploaded to Amazon S3.\n",
182182
"\n",
183-
"This S3 object serves as the input data source for subsequent data lake projection and import into Neptune Analytics."
183+
"An external table is then created in Amazon Athena over the uploaded CSV, exposing both the original attributes and the embedding array for SQL-based access."
184184
]
185185
},
186186
{
@@ -192,30 +192,6 @@
192192
"# Push to s3\n",
193193
"empty_s3_bucket(s3_location_data_lake)\n",
194194
"push_to_s3(data_w_embedding_path, _clean_s3_path(s3_location_data_lake),\"styles_embedding.csv\")\n",
195-
"\n",
196-
"print(\"DataLake preparation completed.\")"
197-
]
198-
},
199-
{
200-
"cell_type": "markdown",
201-
"metadata": {},
202-
"source": [
203-
"## Data Projection\n",
204-
"\n",
205-
"Once the data source has been uploaded successfully, two Amazon Athena queries are executed:\n",
206-
"\n",
207-
"1. Create an external table over the uploaded dataset\n",
208-
"2. Run a SQL projection that selects a subset of columns, including the embedding vector\n",
209-
"\n",
210-
"This process produces a projected .csv file that is compatible with Amazon Neptune Analytics import requirements, supporting both node property data and embedding vectors in a single file.\n"
211-
]
212-
},
213-
{
214-
"cell_type": "code",
215-
"execution_count": null,
216-
"metadata": {},
217-
"outputs": [],
218-
"source": [
219195
"\n",
220196
"# Create external data\n",
221197
"create_csv_table_stmt = f\"\"\"\n",
@@ -242,6 +218,27 @@
242218
"\n",
243219
"_execute_athena_query(athena_client, create_csv_table_stmt, s3_location_log, database=s3_tables_database)\n",
244220
"\n",
221+
"print(\"DataLake preparation completed.\")"
222+
]
223+
},
224+
{
225+
"cell_type": "markdown",
226+
"metadata": {},
227+
"source": [
228+
"## Import Data into Neptune Analytics and Perform Similarity Search\n",
229+
"\n",
230+
"A projection query is executed in Athena to select the required columns, map Neptune-compatible headers, and flatten the embedding array into a vector format.\n",
231+
"\n",
232+
"The resulting CSV is compatible with Amazon Neptune Analytics import requirements and can be ingested directly to enable vector similarity search on the graph.\n"
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"execution_count": null,
238+
"metadata": {},
239+
"outputs": [],
240+
"source": [
241+
"# Clear import directory\n",
245242
"empty_s3_bucket(s3_location_import)\n",
246243
"\n",
247244
"# Projection\n",
@@ -259,24 +256,8 @@
259256
"_execute_athena_query(athena_client, create_csv_table_stmt, s3_location_import, database=s3_tables_database)\n",
260257
"\n",
261258
"# Remove unnecessary .csv.metadata file generated by Athena. \n",
262-
"empty_s3_bucket(s3_location_import, file_extension=\".csv.metadata\")"
263-
]
264-
},
265-
{
266-
"cell_type": "markdown",
267-
"metadata": {},
268-
"source": [
269-
"## Import Data into Neptune Analytics and Perform Similarity Search\n",
259+
"empty_s3_bucket(s3_location_import, file_extension=\".csv.metadata\")\n",
270260
"\n",
271-
"Once the compatible import file has been generated, the import process can be triggered to load the dataset into Amazon Neptune Analytics.\n"
272-
]
273-
},
274-
{
275-
"cell_type": "code",
276-
"execution_count": null,
277-
"metadata": {},
278-
"outputs": [],
279-
"source": [
280261
"task_id = await instance_management.import_csv_from_s3(\n",
281262
" NeptuneGraph.from_config(set_config_graph_id(graph_id)),\n",
282263
" s3_location_import,\n",

0 commit comments

Comments
 (0)