Skip to content

Latest commit

 

History

History
227 lines (164 loc) · 7.94 KB

File metadata and controls

227 lines (164 loc) · 7.94 KB

Lab 1: Neo4j Aura Setup and Exploration

In this lab, you will set up your Neo4j Aura database, load a financial knowledge graph using Cypher, and explore your graph visually.

Prerequisites

  • Completed Lab 0 (environment setup)
  • A valid email address

Part 1: Neo4j Aura Signup

Follow the Neo4j Aura Free Signup guide to create your free Aura account and database instance.

Part 2: Load the Knowledge Graph

SEC 10-K Financial Data Model

After your Aura instance is running, open Query from the left sidebar in the Aura Console and run the following Cypher statements in order.

Step 1: Create Constraints

These ensure each node has a unique identifier and speed up MERGE lookups during loading.

CREATE CONSTRAINT companyId IF NOT EXISTS
FOR (c:Company) REQUIRE c.companyId IS UNIQUE;

CREATE CONSTRAINT companyName IF NOT EXISTS
FOR (c:Company) REQUIRE c.name IS UNIQUE;

CREATE CONSTRAINT productId IF NOT EXISTS
FOR (p:Product) REQUIRE p.productId IS UNIQUE;

CREATE CONSTRAINT riskId IF NOT EXISTS
FOR (r:RiskFactor) REQUIRE r.riskId IS UNIQUE;

CREATE CONSTRAINT managerId IF NOT EXISTS
FOR (m:AssetManager) REQUIRE m.managerId IS UNIQUE;

CREATE CONSTRAINT documentId IF NOT EXISTS
FOR (d:Document) REQUIRE d.documentId IS UNIQUE;

CREATE CONSTRAINT metricId IF NOT EXISTS
FOR (fm:FinancialMetric) REQUIRE fm.metricId IS UNIQUE;

Step 2: Load Nodes

Each statement reads a CSV file and creates (or updates) the corresponding nodes: Companies, Products, Risk Factors, Asset Managers, Documents, and Financial Metrics.

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/companies.csv' AS row
MERGE (c:Company {companyId: row.companyId})
SET c.name = row.name, c.ticker = row.ticker,
    c.cik = row.cik, c.cusip = row.cusip;

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/products.csv' AS row
MERGE (p:Product {productId: row.productId})
SET p.name = row.name, p.description = row.description;

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/risk_factors.csv' AS row
MERGE (r:RiskFactor {riskId: row.riskId})
SET r.name = row.name, r.description = row.description;

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/asset_managers.csv' AS row
MERGE (m:AssetManager {managerId: row.managerId})
SET m.name = row.name;

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/documents.csv' AS row
MERGE (d:Document {documentId: row.documentId})
SET d.accessionNumber = row.accessionNumber, d.filingType = row.filingType,
    d.source = row.source;

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/financial_metrics.csv' AS row
MERGE (fm:FinancialMetric {metricId: row.metricId})
SET fm.name = row.name, fm.value = row.value, fm.period = row.period;

Step 3: Load Relationships

Creates relationships between nodes: OFFERS (Company→Product), FACES_RISK (Company→RiskFactor), OWNS (AssetManager→Company), COMPETES_WITH and PARTNERS_WITH (Company→Company), FILED (Company→Document), and REPORTS (Company→FinancialMetric). The competitor and partner loads use MERGE on the target company name, which creates new Company nodes for names not already in companies.csv.

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_products.csv' AS row
MATCH (c:Company {companyId: row.companyId})
MATCH (p:Product {productId: row.productId})
MERGE (c)-[:OFFERS]->(p);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_risk_factors.csv' AS row
MATCH (c:Company {companyId: row.companyId})
MATCH (r:RiskFactor {riskId: row.riskId})
MERGE (c)-[:FACES_RISK]->(r);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/asset_manager_companies.csv' AS row
MATCH (m:AssetManager {managerId: row.managerId})
MATCH (c:Company {companyId: row.companyId})
MERGE (m)-[:OWNS {shares: toInteger(row.shares)}]->(c);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_competitors.csv' AS row
MATCH (a:Company {companyId: row.sourceCompanyId})
MERGE (b:Company {name: row.targetCompanyName})
MERGE (a)-[:COMPETES_WITH]->(b);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_partners.csv' AS row
MATCH (a:Company {companyId: row.sourceCompanyId})
MERGE (b:Company {name: row.targetCompanyName})
MERGE (a)-[:PARTNERS_WITH]->(b);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_documents.csv' AS row
MATCH (c:Company {companyId: row.companyId})
MATCH (d:Document {documentId: row.documentId})
MERGE (c)-[:FILED]->(d);

LOAD CSV WITH HEADERS FROM 'https://dhoj7jltw73ew.cloudfront.net/sec-filings/company_financial_metrics.csv' AS row
MATCH (c:Company {companyId: row.companyId})
MATCH (fm:FinancialMetric {metricId: row.metricId})
MERGE (c)-[:REPORTS]->(fm);

Step 4: Create Fulltext Index

This enables keyword search across entity names and descriptions.

CREATE FULLTEXT INDEX search_entities IF NOT EXISTS
FOR (n:Company|Product|RiskFactor)
ON EACH [n.name, n.description];

Step 5: Verify the Load

Run this query to confirm your node and relationship counts:

MATCH (n)
WITH labels(n)[0] AS label, count(n) AS count
RETURN label, count ORDER BY label;

You should see approximately:

Label Count
AssetManager 15
Company ~71
Document 7
FinancialMetric 874
Product 303
RiskFactor 883

Note: The Company count is higher than 6 because company_competitors.csv and company_partners.csv contain competitor and partner names (e.g., Google, Samsung, OpenAI) that aren't in companies.csv. The relationship loads use MERGE (b:Company {name: row.targetCompanyName}), which creates new Company nodes for these names. The resulting nodes have only a name property — no companyId, ticker, or other identifiers. The 6 filing companies can be found with:

MATCH (c:Company) WHERE c.companyId IS NOT NULL RETURN c.name, c.ticker ORDER BY c.name;

Step 6: Try Some Queries

Now that the graph is loaded, try these queries to explore the data.

What products does NVIDIA offer?

MATCH (c:Company {ticker: 'NVDA'})-[:OFFERS]->(p:Product)
RETURN p.name ORDER BY p.name LIMIT 10;

Which risk factors are shared across multiple companies?

MATCH (c:Company)-[:FACES_RISK]->(r:RiskFactor)
WITH r, collect(c.ticker) AS companies, count(c) AS cnt
WHERE cnt > 1
RETURN r.name, companies, cnt
ORDER BY cnt DESC LIMIT 5;

Who are the top asset managers by number of holdings?

MATCH (am:AssetManager)-[o:OWNS]->(c:Company)
WITH am, count(c) AS holdings, sum(o.shares) AS total_shares
RETURN am.name, holdings, total_shares
ORDER BY holdings DESC LIMIT 5;

Who does Microsoft compete with?

MATCH (c:Company {ticker: 'MSFT'})-[:COMPETES_WITH]->(comp)
RETURN comp.name ORDER BY comp.name;

Which risk factors expose an asset manager's portfolio across multiple companies?

MATCH (am:AssetManager)-[:OWNS]->(c:Company)-[:FACES_RISK]->(r:RiskFactor)
WITH am, r, count(DISTINCT c) AS exposed
WHERE exposed > 1
RETURN am.name, r.name, exposed
ORDER BY exposed DESC, am.name LIMIT 5;

Who are NVIDIA's supply chain partners?

MATCH (c:Company {ticker: 'NVDA'})-[:PARTNERS_WITH]->(p)
RETURN p.name ORDER BY p.name;

Part 3: Explore the Knowledge Graph

Follow EXPLORE.md to:

  1. Use Neo4j Explore to visually navigate your graph
  2. Search for patterns between asset managers, companies, and risk factors
  3. Apply graph algorithms like Degree Centrality
  4. Identify key entities through visual analysis

Next Steps

After completing this lab, continue to Lab 2 - Aura Agents to build an AI-powered agent using the Neo4j Aura Agent no-code platform.