Update steps order

andy-k-improving · andy-k-improving · commit 567ee5d987af · 2026-01-23T11:34:11.000-08:00
Signed-off-by: Andy Kwok &lt;andy.kwok@improving.com&gt;
diff --git a/notebooks/import_s3_table_embedding_demo.ipynb b/notebooks/import_s3_table_embedding_demo.ipynb
@@ -176,11 +176,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Upload dataset\n",
+    "### Upload Dataset and Register in Athena\n",
     "\n",
-    "Once the embedding column has been added, the enriched dataset is uploaded to Amazon S3.\n",
+    "After the embedding column is added, the enriched dataset is uploaded to Amazon S3.\n",
     "\n",
-    "This S3 object serves as the input data source for subsequent data lake projection and import into Neptune Analytics."
+    "An external table is then created in Amazon Athena over the uploaded CSV, exposing both the original attributes and the embedding array for SQL-based access."
    ]
   },
   {
@@ -192,30 +192,6 @@
     "# Push to s3\n",
     "empty_s3_bucket(s3_location_data_lake)\n",
     "push_to_s3(data_w_embedding_path, _clean_s3_path(s3_location_data_lake),\"styles_embedding.csv\")\n",
-    "\n",
-    "print(\"DataLake preparation completed.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Data Projection\n",
-    "\n",
-    "Once the data source has been uploaded successfully, two Amazon Athena queries are executed:\n",
-    "\n",
-    "1. Create an external table over the uploaded dataset\n",
-    "2. Run a SQL projection that selects a subset of columns, including the embedding vector\n",
-    "\n",
-    "This process produces a projected .csv file that is compatible with Amazon Neptune Analytics import requirements, supporting both node property data and embedding vectors in a single file.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
     "\n",
     "# Create external data\n",
     "create_csv_table_stmt = f\"\"\"\n",
@@ -242,6 +218,27 @@
     "\n",
     "_execute_athena_query(athena_client, create_csv_table_stmt, s3_location_log, database=s3_tables_database)\n",
     "\n",
+    "print(\"DataLake preparation completed.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Import Data into Neptune Analytics and Perform Similarity Search\n",
+    "\n",
+    "A projection query is executed in Athena to select the required columns, map Neptune-compatible headers, and flatten the embedding array into a vector format.\n",
+    "\n",
+    "The resulting CSV is compatible with Amazon Neptune Analytics import requirements and can be ingested directly to enable vector similarity search on the graph.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Clear import directory\n",
     "empty_s3_bucket(s3_location_import)\n",
     "\n",
     "# Projection\n",
@@ -259,24 +256,8 @@
     "_execute_athena_query(athena_client, create_csv_table_stmt, s3_location_import, database=s3_tables_database)\n",
     "\n",
     "# Remove unnecessary .csv.metadata file generated by Athena. \n",
-    "empty_s3_bucket(s3_location_import, file_extension=\".csv.metadata\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Import Data into Neptune Analytics and Perform Similarity Search\n",
+    "empty_s3_bucket(s3_location_import, file_extension=\".csv.metadata\")\n",
     "\n",
-    "Once the compatible import file has been generated, the import process can be triggered to load the dataset into Amazon Neptune Analytics.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
     "task_id = await instance_management.import_csv_from_s3(\n",
     "        NeptuneGraph.from_config(set_config_graph_id(graph_id)),\n",
     "        s3_location_import,\n",