Skip to content

Commit eeee254

Browse files
authored
Neo4j Graph Analytics for Snowflake - Quickstart (#2068)
* Neo4j Fraud Detection Quick Start Neo4j Fraud Detection Quick Start * Corrected asset file name Corrected an asset file name * set default app name in the example code set default app name in the example code * Update Louvain-Community-Detection.md Updated with recommendations from Snowflake Dev-Rel team * renamed file renamed visualization file.
1 parent aeb4817 commit eeee254

File tree

5 files changed

+259
-0
lines changed

5 files changed

+259
-0
lines changed
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
author: corydon baylor
2+
id: finding-fraudulent-communities-with-neo4j
3+
summary: How to find communities affected by fraud using louvain in Neo4j Graph Analytics for Snowflake
4+
categories: getting-started,partner-integrations
5+
environments: web
6+
status: Published
7+
feedback link: https://github.com/Snowflake-Labs/sfguides/issues
8+
tags: Getting Started, Data Science, Data Engineering, Twitter
9+
10+
# Finding Fraudulent Communities With Neo4j
11+
<!-- ------------------------ -->
12+
## Overview
13+
Duration: 2
14+
15+
### What Is Neo4j Graph Analytics For Snowflake?
16+
17+
Neo4j helps organizations find hidden relationships and patterns across billions of data connections deeply, easily, and quickly. **Neo4j Graph Analytics for Snowflake** brings to the power of graph directly to Snowflake, allowing users to run 65+ ready-to-use algorithms on their data, all without leaving Snowflake!
18+
19+
### Discovering Communities In P2P Fraud
20+
P2P Fraud Losses are Skyrocketing. 8% of banking customers reported being victims of P2P Scams in the past year, and the average loss to these scams was $176.
21+
22+
Finding different communities within P2P transactions is the first step towards identifying and ultimately ending P2P fraud.
23+
24+
### Prerequisites
25+
- The Native App [Neo4j Graph Analytics](https://app.snowflake.com/marketplace/listing/GZTDZH40B6/neo4j-neo4j-graph-analytics) for Snowflake
26+
27+
### What You Will Need
28+
- A [Snowflake account](https://signup.snowflake.com/?utm_cta=quickstarts) with appropriate access to databases and schemas.
29+
- Neo4j Graph Analytics application installed from the Snowflake marketplace. Access the marketplace via the menu bar on the left hand side of your screen, as seen below:
30+
![image](assets/marketplace.png)
31+
32+
### What You Will Build
33+
- A method to identify communities that are at high risk of fraud in P2P networks
34+
35+
### What You Will Learn
36+
- How to prepare and project your data for graph analytics
37+
- How to use community detection to identify fraud
38+
- How to read and write directly from and to your snowflake tables
39+
40+
<!-- ------------------------ -->
41+
## Loading The Data
42+
Duration: 5
43+
44+
Dataset overview : This dataset is modelled to design and analyze a peer to peer transaction network to identify fraudulent activity using graph analytics.
45+
![image](assets/datamodel.png)
46+
47+
48+
Let's name our database `P2P_DEMO`. Using the CSVs found [here](https://drive.google.com/drive/u/1/folders/1BnAnRSEfuwDvc4eQH8IRvy3tUkwOeaNf), We are going to add two new tables:
49+
50+
- One called `P2P_TRANSACTIONS` based on the p2p_transactions.csv
51+
- One called `P2P_USERS based` on p2p_users.csv
52+
53+
Follow the steps found [here](https://docs.snowflake.com/en/user-guide/data-load-web-ui) to load in your data.
54+
55+
<!-- ------------------------ -->
56+
## Setting Up
57+
58+
### Import The Notebook
59+
- We’ve provided a Colab notebook to walk you through each SQL and Python step—no local setup required!
60+
- Download the .ipynb found [here](https://github.com/neo4j-product-examples/snowflake-graph-analytics/tree/main/QuickStarts/Python%20Notebooks), and import the notebook into snowflake.
61+
![image](assets/followalong.png)
62+
- Don't forget to install streamlit and python package before you run.
63+
64+
### Permissions
65+
Before we run our algorithms, we need to set the proper permissions. But before we get started granting different roles, we need to ensure that you are using `accountadmin` to grant and create roles. Lets do that now:
66+
67+
```sql
68+
-- you must be accountadmin to create role and grant permissions
69+
use role accountadmin;
70+
```
71+
72+
Next let's set up the necessary roles, permissions, and resource access to enable Graph Analytics to operate on data within the `p2p_demo.public schema`. It creates a consumer role (gds_role) for users and administrators, grants the Neo4j Graph Analytics application access to read from and write to tables and views, and ensures that future tables are accessible.
73+
74+
It also provides the application with access to the required compute pool and warehouse resources needed to run graph algorithms at scale.
75+
76+
```sql
77+
USE SCHEMA P2P_DEMO.PUBLIC;
78+
79+
-- Create a consumer role for users and admins of the Neo4j Graph Analytics application
80+
CREATE ROLE IF NOT EXISTS gds_user_role;
81+
CREATE ROLE IF NOT EXISTS gds_admin_role;
82+
GRANT APPLICATION ROLE neo4j_graph_analytics.app_user TO ROLE gds_user_role;
83+
GRANT APPLICATION ROLE neo4j_graph_analytics.app_admin TO ROLE gds_admin_role;
84+
85+
CREATE DATABASE ROLE IF NOT EXISTS gds_db_role;
86+
GRANT DATABASE ROLE gds_db_role TO ROLE gds_user_role;
87+
GRANT DATABASE ROLE gds_db_role TO APPLICATION neo4j_graph_analytics;
88+
89+
-- Grant access to consumer data
90+
GRANT USAGE ON DATABASE P2P_DEMO TO ROLE gds_user_role;
91+
GRANT USAGE ON SCHEMA P2P_DEMO.PUBLIC TO ROLE gds_user_role;
92+
93+
-- Required to read tabular data into a graph
94+
GRANT SELECT ON ALL TABLES IN DATABASE P2P_DEMO TO DATABASE ROLE gds_db_role;
95+
96+
-- Ensure the consumer role has access to created tables/views
97+
GRANT ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
98+
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
99+
GRANT CREATE TABLE ON SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
100+
GRANT CREATE VIEW ON SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
101+
GRANT ALL PRIVILEGES ON FUTURE VIEWS IN SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
102+
GRANT ALL PRIVILEGES ON ALL VIEWS IN SCHEMA P2P_DEMO.PUBLIC TO DATABASE ROLE gds_db_role;
103+
104+
-- Compute and warehouse access
105+
GRANT USAGE ON WAREHOUSE GDSONSNOWFLAKE TO APPLICATION neo4j_graph_analytics;
106+
```
107+
108+
Now we will switch to the role we just created:
109+
110+
```sql
111+
use role gds_role;
112+
```
113+
114+
115+
<!-- ------------------------ -->
116+
## Cleaning Our Data
117+
Duration: 5
118+
119+
We need our data to be in a particular format in order to work with Graph Analytics. In general it should be like so:
120+
121+
### For The Table Representing Nodes:
122+
123+
The first column should be called `nodeId`, which represents the ids for the each node in our graph
124+
125+
### For The table Representing Relationships:
126+
127+
We need to have columns called `sourceNodeId` and `targetNodeId`. These will tell Graph Analytics the direction of the transaction, which in this case means:
128+
- Who sent the money (sourceNodeId) and
129+
- Who received it (targetNodeId)
130+
- We also include a total_amount column that acts as the weights in the relationship
131+
132+
We are going to use aggregated transactions for our relationships. Let's create that table now:
133+
134+
```sql
135+
CREATE OR REPLACE TABLE p2p_demo.public.P2P_AGG_TRANSACTIONS (
136+
SOURCENODEID NUMBER(38,0),
137+
TARGETNODEID NUMBER(38,0),
138+
TOTAL_AMOUNT FLOAT
139+
) AS
140+
SELECT sourceNodeId, targetNodeId, SUM(transaction_amount) AS total_amount
141+
FROM p2p_demo.public.P2P_TRANSACTIONS
142+
GROUP BY sourceNodeId, targetNodeId;
143+
SELECT * FROM p2p_demo.public.P2P_AGG_TRANSACTIONS;
144+
```
145+
146+
147+
We are also going to create a view that just has the unique `nodeId`s from the `p2p_demo` table and use that as the nodes when we project the graph in the next step:
148+
149+
```sql
150+
CREATE OR REPLACE VIEW p2p_users_vw (nodeId) AS
151+
SELECT DISTINCT p2p_demo.public.p2p_users.NODEID as nodeid
152+
FROM p2p_users;
153+
```
154+
<!-- ------------------------ -->
155+
156+
## Running Your Algorithms
157+
Duration: 10
158+
159+
Now we are finally at the step where we create a projection, run our algorithms, and write back to snowflake. We will run louvain to determine communities within our data. Louvain identifies communities by grouping together nodes that have more connections to each other than to nodes outside the group.
160+
161+
You can find more information about writing this function in our [documentation](https://neo4j.com/docs/snowflake-graph-analytics/current/getting-started/).
162+
163+
You can use this code block as an outline of what you need to fill in:
164+
165+
```
166+
CALL neo4j_graph_analytics.graph.louvain('COMPUTE_POOL', {
167+
'project': {
168+
'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODES'],
169+
'relationshipTables': {
170+
'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS': {
171+
'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
172+
'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
173+
'orientation': 'NATURAL'
174+
}
175+
}
176+
},
177+
'compute': { 'consecutiveIds': true },
178+
'write': [{
179+
'nodeLabel': 'NODES',
180+
'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS'
181+
}]
182+
});
183+
```
184+
185+
But broadly, you will need a few things:
186+
| Name | Description | Our Value |
187+
|-------------------------------------------|-------------------------------------------------------------|------------------------------------------------|
188+
| `EXAMPLE_DB.DATA_SCHEMA.NODES` | A table for nodes | `p2p_demo.public.p2p_users_vw` |
189+
| `EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS` | A table for relationships | `p2p_demo.public.P2P_AGG_TRANSACTIONS` |
190+
| `COMPUTE_POOL` | The size of the compute pool you would like to use| `CPU_X64_XS` |
191+
| `EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS`| A table to output results | `p2p_demo.public.p2p_users_vw_lou` |
192+
| `NODES` | A node label for our nodes | `p2p_users_vw` |
193+
194+
195+
```sql
196+
CALL neo4j_graph_analytics.graph.louvain('CPU_X64_XS', {
197+
'project': {
198+
'nodeTables': ['p2p_demo.public.p2p_users_vw'],
199+
'relationshipTables': {
200+
'p2p_demo.public.P2P_AGG_TRANSACTIONS': {
201+
'sourceTable': 'p2p_demo.public.p2p_users_vw',
202+
'targetTable': 'p2p_demo.public.p2p_users_vw',
203+
'orientation': 'NATURAL'
204+
}
205+
}
206+
},
207+
'compute': { 'consecutiveIds': true, 'relationshipWeightProperty':'TOTAL_AMOUNT'},
208+
'write': [{
209+
'nodeLabel': 'p2p_users_vw',
210+
'outputTable': 'p2p_demo.public.p2p_users_vw_lou'
211+
}]
212+
});
213+
```
214+
Our resulting table assigns a community id to each node based on their connections to other nodes in the graph.
215+
216+
```sql
217+
SELECT community, COUNT(*) AS community_size,
218+
FROM p2p_demo.public.p2p_users_vw_lou
219+
GROUP BY community
220+
ORDER BY community_size DESC;
221+
```
222+
223+
We can then use then add in the `fraud_transfer_flag` (which was provided by the vendor) to our communities to see if users in that community are at greater risk for fraud:
224+
225+
```sql
226+
SELECT
227+
l.community,
228+
COUNT(*) AS community_size,
229+
SUM(n.fraud_transfer_flag) AS fraud_count
230+
FROM
231+
p2p_users_vw_lou l
232+
JOIN
233+
p2p_users n
234+
ON
235+
l.nodeId = n.nodeId
236+
GROUP BY
237+
l.community
238+
ORDER BY
239+
community_size DESC, fraud_count DESC;
240+
```
241+
You can use plotly as a visualization package and explore more. Nodes that cluster closely represent communities of highly interconnected users. You can immediately spot the tight clusters (possible fraud rings) versus the loosely connected periphery. Find more on exploratory analysis and demos [here](https://github.com/neo4j-product-examples/snowflake-graph-analytics).
242+
![image](assets/communities_visualization.png)
243+
244+
## Conclusions And Resources
245+
Duration: 2
246+
247+
In this quickstart, you learned how to bring the power of graph insights into Snowflake using Neo4j Graph Analytics.
248+
249+
### What You Learned
250+
By working with a P2P transaction dataset, you were able to:
251+
252+
1. Set up the [Neo4j Graph Analytics](https://app.snowflake.com/marketplace/listing/GZTDZH40CN/neo4j-neo4j-graph-analytics) application within Snowflake.
253+
2. Prepare and project your data into a graph model (users as nodes, transactions as relationships).
254+
3. Ran Louvain community detection to identify clusters of users with high internal interaction.
255+
256+
### Resources
257+
258+
- [Neo4j Graph Analytics Documentation](https://neo4j.com/docs/snowflake-graph-analytics/)
259+
- [Installing Neo4j Graph Analytics on SPCS](https://neo4j.com/docs/snowflake-graph-analytics/installation/)
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)