Skip to content

Commit 75332d9

Browse files
Rebase
1 parent b49be7e commit 75332d9

3 files changed

Lines changed: 114 additions & 92 deletions

File tree

notebooks/import_s3_table_demo.ipynb

Lines changed: 82 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
{
22
"cells": [
33
{
4-
"metadata": {},
54
"cell_type": "markdown",
5+
"id": "0",
6+
"metadata": {},
67
"source": [
78
"# Neptune Analytics Instance Management With S3 Table Projections\n",
89
"\n",
@@ -13,24 +14,24 @@
1314
"2. Import the projection into Neptune Analytics.\n",
1415
"3. Run Louvain algorithm on the provisioned instance to create communities.\n",
1516
"4. Export the graph back into S3 Tables bucket."
16-
],
17-
"id": "daa071d5474f0439"
17+
]
1818
},
1919
{
20-
"metadata": {},
2120
"cell_type": "markdown",
21+
"id": "1",
22+
"metadata": {},
2223
"source": [
2324
"## Setup\n",
2425
"\n",
2526
"Import the necessary libraries and set up logging."
26-
],
27-
"id": "8db98f850ec409ac"
27+
]
2828
},
2929
{
30-
"metadata": {},
3130
"cell_type": "code",
32-
"outputs": [],
3331
"execution_count": null,
32+
"id": "2",
33+
"metadata": {},
34+
"outputs": [],
3435
"source": [
3536
"# Check the Python version:\n",
3637
"import sys\n",
@@ -49,14 +50,14 @@
4950
"dotenv.load_dotenv()\n",
5051
"\n",
5152
"from nx_neptune.session_manager import SessionManager"
52-
],
53-
"id": "8e270bbf456a8256"
53+
]
5454
},
5555
{
56-
"metadata": {},
5756
"cell_type": "code",
58-
"outputs": [],
5957
"execution_count": null,
58+
"id": "3",
59+
"metadata": {},
60+
"outputs": [],
6061
"source": [
6162
"# Configure logging to see detailed information about the instance creation process\n",
6263
"logging.basicConfig(\n",
@@ -72,24 +73,24 @@
7273
"]:\n",
7374
" logging.getLogger(logger_name).setLevel(logging.INFO)\n",
7475
"logger = logging.getLogger(__name__)"
75-
],
76-
"id": "6d97092f7c0e10cd"
76+
]
7777
},
7878
{
79-
"metadata": {},
8079
"cell_type": "markdown",
80+
"id": "4",
81+
"metadata": {},
8182
"source": [
8283
"## Configuration\n",
8384
"\n",
8485
"Check for environment variables necessary for the notebook."
85-
],
86-
"id": "76d2a19a4f7b6d1"
86+
]
8787
},
8888
{
89-
"metadata": {},
9089
"cell_type": "code",
91-
"outputs": [],
9290
"execution_count": null,
91+
"id": "5",
92+
"metadata": {},
93+
"outputs": [],
9394
"source": [
9495
"def check_env_vars(var_names):\n",
9596
" values = {}\n",
@@ -119,12 +120,12 @@
119120
"s3_tables_database = os.getenv('NETWORKX_S3_TABLES_DATABASE')\n",
120121
"s3_tables_tablename = os.getenv('NETWORKX_S3_TABLES_TABLENAME')\n",
121122
"session_name = \"nx-athena-test-full\""
122-
],
123-
"id": "9d582064efdee720"
123+
]
124124
},
125125
{
126-
"metadata": {},
127126
"cell_type": "markdown",
127+
"id": "6",
128+
"metadata": {},
128129
"source": [
129130
"## Data Setup\n",
130131
"\n",
@@ -133,14 +134,14 @@
133134
"Data should be uploaded to an S3 bucket, and an athena table created for that bucket.\n",
134135
"\n",
135136
"The PaySim dataset includes a simulated mobile money dataset, that involves transactions between client actors and banks. We can use this dataset to detect fraudulent activities in the simulated data."
136-
],
137-
"id": "3d8c75a19b287ff4"
137+
]
138138
},
139139
{
140-
"metadata": {},
141140
"cell_type": "code",
142-
"outputs": [],
143141
"execution_count": null,
142+
"id": "7",
143+
"metadata": {},
144+
"outputs": [],
144145
"source": [
145146
"paysim_s3_bucket = 'nx-fraud-detection'\n",
146147
"paysim_s3_bucket_path = 'data/'\n",
@@ -170,14 +171,14 @@
170171
" paysim_s3_bucket,\n",
171172
" f\"{paysim_s3_bucket_path}{file_path.name}\"\n",
172173
" )"
173-
],
174-
"id": "45e778be855ef9ca"
174+
]
175175
},
176176
{
177-
"metadata": {},
178177
"cell_type": "code",
179-
"outputs": [],
180178
"execution_count": null,
179+
"id": "8",
180+
"metadata": {},
181+
"outputs": [],
181182
"source": [
182183
"def _execute_create_table(stmt, catalog, database, s3_logs_location):\n",
183184
" athena_client = boto3.client('athena')\n",
@@ -250,62 +251,62 @@
250251
"\"\"\"\n",
251252
"\n",
252253
"_execute_create_table(create_s3_table_stmt, s3_tables_catalog, s3_tables_database, f\"s3://{paysim_s3_bucket}\")"
253-
],
254-
"id": "49de921896e76113"
254+
]
255255
},
256256
{
257-
"metadata": {},
258257
"cell_type": "markdown",
258+
"id": "9",
259+
"metadata": {},
259260
"source": [
260261
"## Create a New/Get existing Neptune Analytics Instance\n",
261262
"\n",
262263
"Provision a new Neptune Analytics instance on demand, or retrieve an existing neptune-graph. Creating a new instance may take several minutes to complete."
263-
],
264-
"id": "4a417c3dbf24e35"
264+
]
265265
},
266266
{
267-
"metadata": {},
268267
"cell_type": "code",
269-
"outputs": [],
270268
"execution_count": null,
269+
"id": "10",
270+
"metadata": {},
271+
"outputs": [],
271272
"source": [
272273
"session = SessionManager.session(session_name)\n",
273274
"graph_list = session.list_graphs()\n",
274275
"print(\"The following graphs are available:\")\n",
275276
"for g in graph_list:\n",
276277
" print(g)"
277-
],
278-
"id": "9c73c5cb345e6278"
278+
]
279279
},
280280
{
281-
"metadata": {},
282281
"cell_type": "code",
283-
"outputs": [],
284282
"execution_count": null,
283+
"id": "11",
284+
"metadata": {},
285+
"outputs": [],
285286
"source": [
286287
"session = SessionManager.session(session_name)\n",
287288
"graph = await session.get_or_create_graph(config={\"provisionedMemory\": 32})\n",
288289
"print(f\"Retrieved graph: {graph}\")"
289-
],
290-
"id": "375d3da5e264214a"
290+
]
291291
},
292292
{
293-
"metadata": {},
294293
"cell_type": "markdown",
294+
"id": "12",
295+
"metadata": {},
295296
"source": [
296297
"## Import Data from S3\n",
297298
"\n",
298299
"Import data from S3 into the Neptune Analytics graph and wait for the operation to complete. <br>\n",
299300
"IAM permisisons required for import: <br>\n",
300301
" - s3:GetObject, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey"
301-
],
302-
"id": "f0d93a49706c24b1"
302+
]
303303
},
304304
{
305-
"metadata": {},
306305
"cell_type": "code",
307-
"outputs": [],
308306
"execution_count": null,
307+
"id": "13",
308+
"metadata": {},
309+
"outputs": [],
309310
"source": [
310311
"SOURCE_AND_DESTINATION_BANK_CUSTOMERS = f\"\"\"\n",
311312
"SELECT DISTINCT \"~id\", 'customer' AS \"~label\"\n",
@@ -342,12 +343,12 @@
342343
" catalog=s3_tables_catalog,\n",
343344
" database=s3_tables_database\n",
344345
")"
345-
],
346-
"id": "f8579278ec4cb534"
346+
]
347347
},
348348
{
349-
"metadata": {},
350349
"cell_type": "markdown",
350+
"id": "14",
351+
"metadata": {},
351352
"source": [
352353
"## Execute Louvain Algorithm\n",
353354
"\n",
@@ -356,52 +357,52 @@
356357
"We will run the Louvain Community Detection Algorithm and mutate the graph storing the results of the vertex community in the \"community\" property\n",
357358
"\n",
358359
"Note: This runs the `mutate` algorithm, that only returns a success/failure in the result."
359-
],
360-
"id": "badd0dc0eecd5042"
360+
]
361361
},
362362
{
363-
"metadata": {},
364363
"cell_type": "code",
365-
"outputs": [],
366364
"execution_count": null,
365+
"id": "15",
366+
"metadata": {},
367+
"outputs": [],
367368
"source": [
368369
"# sanity check: print out 10 vertices and edges from the Neptune Analytics graph\n",
369370
"all_nodes = graph.execute_query(\"MATCH (n) RETURN n LIMIT 10\")\n",
370371
"print(f\"all nodes: {all_nodes}\")\n",
371372
"\n",
372373
"all_edges = graph.execute_query(\"MATCH ()-[r]-() RETURN r LIMIT 10\")\n",
373374
"print(f\"all edges: {all_edges}\")"
374-
],
375-
"id": "3942d5d8d1236a27"
375+
]
376376
},
377377
{
378-
"metadata": {},
379378
"cell_type": "code",
380-
"outputs": [],
381379
"execution_count": null,
380+
"id": "16",
381+
"metadata": {},
382+
"outputs": [],
382383
"source": [
383384
"# using Neptune Analytics, run the Louvain Community Detection Algorithm and mutate\n",
384385
"# the graph storing the results of the vertex community in the \"community\" property\n",
385386
"louvain_result = graph.execute_query('CALL neptune.algo.louvain.mutate({iterationTolerance:1e-07, writeProperty:\"community\"}) YIELD success AS success RETURN success')\n",
386387
"print(f\"Louvain result: {louvain_result}\")"
387-
],
388-
"id": "b8b4544be8fb3120"
388+
]
389389
},
390390
{
391-
"metadata": {},
392391
"cell_type": "markdown",
392+
"id": "17",
393+
"metadata": {},
393394
"source": [
394395
"## Export the Neptune Analytics data and add it to S3 Tables as an Iceberg table\n",
395396
"\n",
396397
"Export the Neptune Analytics graph and a CSV export, and convert it to Iceberg format. Use Athena to add it to S3 Tables Bucket."
397-
],
398-
"id": "a4b96f01915deea1"
398+
]
399399
},
400400
{
401-
"metadata": {},
402401
"cell_type": "code",
403-
"outputs": [],
404402
"execution_count": null,
403+
"id": "18",
404+
"metadata": {},
405+
"outputs": [],
405406
"source": [
406407
"# for the CSV table\n",
407408
"csv_catalog = 'AwsDataCatalog'\n",
@@ -425,23 +426,23 @@
425426
" iceberg_catalog,\n",
426427
" iceberg_database\n",
427428
")"
428-
],
429-
"id": "3b9bbe3559ac819c"
429+
]
430430
},
431431
{
432-
"metadata": {},
433432
"cell_type": "code",
434-
"outputs": [],
435433
"execution_count": null,
434+
"id": "19",
435+
"metadata": {},
436+
"outputs": [],
436437
"source": [
437438
"# destroy the session graphs\n",
438439
"session.destroy_all_graphs()"
439-
],
440-
"id": "eec63f4abb4936bf"
440+
]
441441
},
442442
{
443-
"metadata": {},
444443
"cell_type": "markdown",
444+
"id": "20",
445+
"metadata": {},
445446
"source": [
446447
"## Conclusion\n",
447448
"\n",
@@ -453,26 +454,26 @@
453454
"4. **Deletion**: We exported the updated data back into the datalake into an iceberg table\n",
454455
"\n",
455456
"The session manager (`SessionManager`) provides an easy mechanism to execute general datalake functionality."
456-
],
457-
"id": "80fd1b3c3b7f68db"
457+
]
458458
},
459459
{
460-
"metadata": {},
461460
"cell_type": "markdown",
461+
"id": "21",
462+
"metadata": {},
462463
"source": [
463464
"## Execute Louvain Communities Algorithm using NetworkX\n",
464465
"\n",
465466
"This library integrates well as a backend for the NetworkX library (see: https://networkx.org/documentation/latest/backends.html).\n",
466467
"\n",
467468
"Using the following examples shows how to configure the graph id for networkx, and run louvain_communities through the NetworkX API. This will mutate the graph in Neptune Analytics. You can get the full result from Neptune Analytics by removing the `write_property` - which will not run the `mutate` algorithm variant."
468-
],
469-
"id": "452ef7757e62c1db"
469+
]
470470
},
471471
{
472-
"metadata": {},
473472
"cell_type": "code",
474-
"outputs": [],
475473
"execution_count": null,
474+
"id": "22",
475+
"metadata": {},
476+
"outputs": [],
476477
"source": [
477478
"import networkx as nx\n",
478479
"\n",
@@ -485,8 +486,7 @@
485486
"# the graph itself\n",
486487
"result = nx.community.louvain_communities(nx.Graph(), backend=\"neptune\", write_property=\"community\")\n",
487488
"print(f\"louvain result: \\n{result}\")"
488-
],
489-
"id": "e5330d3d7027c7a"
489+
]
490490
}
491491
],
492492
"metadata": {},

0 commit comments

Comments
 (0)