Skip to content

Conversation

@IlyaasK
Copy link
Contributor

@IlyaasK IlyaasK commented Dec 23, 2025

Summary

This PR implements 9 missing AWS resource modules to significantly improve Cartography's AWS coverage. It adds full support for fetching, transforming, and loading the following resources into Neo4j:

  • AWS Backup (Plans & Vaults)
  • AWS WAFv2 (WebACLs & RuleGroups)
  • AWS CloudFront (Distributions)
  • AWS CloudFormation (Stacks)
  • AWS Athena (WorkGroups)
  • AWS Glue (Databases)
  • AWS EventBridge (EventBuses) & Event Schemas (Registries)
  • AWS SSM (Documents)
  • AWS EC2 VPC Endpoint Services

Each module includes proper data models (NodeSchemas), sync logic, and unit tests for data transformation.

Related issues or links

Include links to relevant issues or other pages.

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

  • Update/add unit or integration tests.
  • Include a screenshot showing what the graph looked like before and after your changes.
  • Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

If you are implementing a new intel module:

  • Use the NodeSchema data model.
  • Confirm that the linter actually passes (submitting a PR where the linter fails shows reviewers that you did not test your code and will delay your review).

Detailed Changes

New Intel Modules & Sync Logic:

  • cartography/intel/aws/backup.py: Implements sync for AWSBackupPlan and AWSBackupVault.
  • cartography/intel/aws/wafv2.py: Implements sync for AWSWAFv2WebACL and AWSWAFv2RuleGroup.
  • cartography/intel/aws/cloudfront.py: Implements sync for AWSCloudFrontDistribution.
  • cartography/intel/aws/cloudformation.py: Implements sync for AWSCloudFormationStack.
  • cartography/intel/aws/athena.py: Implements sync for AWSAthenaWorkGroup.
  • cartography/intel/aws/schemas.py: Implements sync for AWSEventSchemasRegistry.
  • cartography/intel/aws/ec2/vpc_endpoint_services.py: Implements sync for AWSEc2VpcEndpointService.
  • cartography/intel/aws/glue.py: Updated to include AWSGlueDatabase sync.
  • cartography/intel/aws/eventbridge.py: Updated to include AWSEventsEventBus sync.
  • cartography/intel/aws/ssm.py: Updated to include AWSSSMDocument sync.

New Data Models:

  • Defined NodeSchemas and RelationshipProperties for all above resources in cartography/models/aws/.

Configuration:

  • Registered all new modules in cartography/intel/aws/resources.py to be included in the default AWS sync.

Testing:

  • Added unit tests for data transformation/logic in:
    • tests/unit/cartography/intel/aws/test_backup.py
    • tests/unit/cartography/intel/aws/test_wafv2.py
    • tests/unit/cartography/intel/aws/test_cloudfront.py
    • tests/unit/cartography/intel/aws/test_cloudformation.py
    • tests/unit/cartography/intel/aws/test_athena_glue.py

@IlyaasK IlyaasK force-pushed the feat/aws-missing-resources branch from 76bc963 to a4a27b4 Compare December 23, 2025 02:36
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 41 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="cartography/models/aws/eventbridge/schema_registry.py">

<violation number="1" location="cartography/models/aws/eventbridge/schema_registry.py:13">
P1: The `id` should use `RegistryArn` instead of `RegistryName` for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment &quot;No ARN returned typically&quot; is also incorrect since `RegistryArn` IS mapped on line 16. This is inconsistent with the similar `event_bus.py` model which correctly uses `Arn` as the id.</violation>
</file>

<file name="cartography/models/aws/ssm/document.py">

<violation number="1" location="cartography/models/aws/ssm/document.py:13">
P1: Using `Name` alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., `{region}:{name}`) or using the ARN format `arn:aws:ssm:{region}:{account}:document/{name}` to ensure uniqueness across regions.</violation>
</file>

<file name="cartography/models/aws/wafv2/web_acl.py">

<violation number="1" location="cartography/models/aws/wafv2/web_acl.py:13">
P1: The `id` property should use the ARN (globally unique) instead of AWS&#39;s `Id` field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.</violation>
</file>

<file name="cartography/models/aws/cloudfront/distribution.py">

<violation number="1" location="cartography/models/aws/cloudfront/distribution.py:14">
P2: Missing `extra_index=True` on the `arn` property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.</violation>
</file>

<file name="cartography/models/aws/glue/database.py">

<violation number="1" location="cartography/models/aws/glue/database.py:13">
P1: Using `Name` as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in `transform_glue_databases()` (format: `arn:aws:glue:{region}:{catalog_id}:database/{name}`) and using it as the id, similar to how `AWSBackupVault` uses `BackupVaultArn`.</violation>
</file>

<file name="cartography/cli.py">

<violation number="1" location="cartography/cli.py:1004">
P2: Missing error handling for int conversion of `NEO4J_MAX_CONNECTION_IDLE_TIME` environment variable. If the value is not a valid integer, this will raise a `ValueError` with an unclear error message. Consider wrapping in try/except with a descriptive error message.</violation>
</file>

<file name="cartography/models/aws/cloudformation/stack.py">

<violation number="1" location="cartography/models/aws/cloudformation/stack.py:15">
P1: Bug: `PropertyRef(&quot;id&quot;)` will look for `item.id` in the source data, but boto3 returns `StackId`, not `id`. This should be `PropertyRef(&quot;StackId&quot;)` to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR


@dataclass(frozen=True)
class AWSEventSchemasRegistryNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("RegistryName") # No ARN returned typically, name is unique per account/region?
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The id should use RegistryArn instead of RegistryName for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment "No ARN returned typically" is also incorrect since RegistryArn IS mapped on line 16. This is inconsistent with the similar event_bus.py model which correctly uses Arn as the id.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/eventbridge/schema_registry.py, line 13:

<comment>The `id` should use `RegistryArn` instead of `RegistryName` for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment &quot;No ARN returned typically&quot; is also incorrect since `RegistryArn` IS mapped on line 16. This is inconsistent with the similar `event_bus.py` model which correctly uses `Arn` as the id.</comment>

<file context>
@@ -0,0 +1,43 @@
+
+@dataclass(frozen=True)
+class AWSEventSchemasRegistryNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;RegistryName&quot;) # No ARN returned typically, name is unique per account/region?
+    # RegistryArn is in response.
+    name: PropertyRef = PropertyRef(&quot;RegistryName&quot;)
</file context>
Fix with Cubic


@dataclass(frozen=True)
class AWSSSMDocumentNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("Name") # Name is unique in account/region combo for SSM Documents?
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Using Name alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., {region}:{name}) or using the ARN format arn:aws:ssm:{region}:{account}:document/{name} to ensure uniqueness across regions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/ssm/document.py, line 13:

<comment>Using `Name` alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., `{region}:{name}`) or using the ARN format `arn:aws:ssm:{region}:{account}:document/{name}` to ensure uniqueness across regions.</comment>

<file context>
@@ -0,0 +1,48 @@
+
+@dataclass(frozen=True)
+class AWSSSMDocumentNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;Name&quot;) # Name is unique in account/region combo for SSM Documents?
+    # Actually SSM Documents can be shared. And Name is unique per account.
+    # ARN is not always returned in list. Let&#39;s use Name as ID.
</file context>
Fix with Cubic


@dataclass(frozen=True)
class AWSWAFv2WebACLNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("Id")
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The id property should use the ARN (globally unique) instead of AWS's Id field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/wafv2/web_acl.py, line 13:

<comment>The `id` property should use the ARN (globally unique) instead of AWS&#39;s `Id` field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.</comment>

<file context>
@@ -0,0 +1,41 @@
+
+@dataclass(frozen=True)
+class AWSWAFv2WebACLNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;Id&quot;)
+    name: PropertyRef = PropertyRef(&quot;Name&quot;)
+    arn: PropertyRef = PropertyRef(&quot;ARN&quot;)
</file context>
Fix with Cubic

@dataclass(frozen=True)
class AWSCloudFrontDistributionNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("Id")
arn: PropertyRef = PropertyRef("ARN")
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing extra_index=True on the arn property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/cloudfront/distribution.py, line 14:

<comment>Missing `extra_index=True` on the `arn` property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.</comment>

<file context>
@@ -0,0 +1,45 @@
+@dataclass(frozen=True)
+class AWSCloudFrontDistributionNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;Id&quot;)
+    arn: PropertyRef = PropertyRef(&quot;ARN&quot;)
+    domain_name: PropertyRef = PropertyRef(&quot;DomainName&quot;)
+    status: PropertyRef = PropertyRef(&quot;Status&quot;)
</file context>
Fix with Cubic


@dataclass(frozen=True)
class AWSGlueDatabaseNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("Name") # Name is ID for database in account/region context
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Using Name as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in transform_glue_databases() (format: arn:aws:glue:{region}:{catalog_id}:database/{name}) and using it as the id, similar to how AWSBackupVault uses BackupVaultArn.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/glue/database.py, line 13:

<comment>Using `Name` as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in `transform_glue_databases()` (format: `arn:aws:glue:{region}:{catalog_id}:database/{name}`) and using it as the id, similar to how `AWSBackupVault` uses `BackupVaultArn`.</comment>

<file context>
@@ -0,0 +1,43 @@
+
+@dataclass(frozen=True)
+class AWSGlueDatabaseNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;Name&quot;) # Name is ID for database in account/region context
+    name: PropertyRef = PropertyRef(&quot;Name&quot;)
+    description: PropertyRef = PropertyRef(&quot;Description&quot;)
</file context>
Fix with Cubic

Comment on lines 1004 to 1006
config.neo4j_max_connection_idle_time = int(
env_neo4j_max_connection_idle_time
)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing error handling for int conversion of NEO4J_MAX_CONNECTION_IDLE_TIME environment variable. If the value is not a valid integer, this will raise a ValueError with an unclear error message. Consider wrapping in try/except with a descriptive error message.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/cli.py, line 1004:

<comment>Missing error handling for int conversion of `NEO4J_MAX_CONNECTION_IDLE_TIME` environment variable. If the value is not a valid integer, this will raise a `ValueError` with an unclear error message. Consider wrapping in try/except with a descriptive error message.</comment>

<file context>
@@ -984,6 +993,18 @@ def main(self, argv: str) -&gt; int:
+            config.neo4j_max_connection_idle_time is None
+            and env_neo4j_max_connection_idle_time
+        ):
+            config.neo4j_max_connection_idle_time = int(
+                env_neo4j_max_connection_idle_time
+            )
</file context>
Suggested change
config.neo4j_max_connection_idle_time = int(
env_neo4j_max_connection_idle_time
)
try:
config.neo4j_max_connection_idle_time = int(
env_neo4j_max_connection_idle_time
)
except ValueError:
raise ValueError(
f"NEO4J_MAX_CONNECTION_IDLE_TIME environment variable must be an integer, "
f"got: {env_neo4j_max_connection_idle_time}"
)
Fix with Cubic

class AWSCloudFormationStackNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("StackId")
name: PropertyRef = PropertyRef("StackName")
arn: PropertyRef = PropertyRef("id") # StackId serves as ARN usually? No, StackId IS the ARN in CloudFormation.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Bug: PropertyRef("id") will look for item.id in the source data, but boto3 returns StackId, not id. This should be PropertyRef("StackId") to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/cloudformation/stack.py, line 15:

<comment>Bug: `PropertyRef(&quot;id&quot;)` will look for `item.id` in the source data, but boto3 returns `StackId`, not `id`. This should be `PropertyRef(&quot;StackId&quot;)` to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.</comment>

<file context>
@@ -0,0 +1,46 @@
+class AWSCloudFormationStackNodeProperties(CartographyNodeProperties):
+    id: PropertyRef = PropertyRef(&quot;StackId&quot;)
+    name: PropertyRef = PropertyRef(&quot;StackName&quot;)
+    arn: PropertyRef = PropertyRef(&quot;id&quot;) # StackId serves as ARN usually? No, StackId IS the ARN in CloudFormation.
+    # Wait, boto3 returns StackId as ARN. Let&#39;s use it for id and arn.
+    creation_time: PropertyRef = PropertyRef(&quot;CreationTime&quot;)
</file context>
Fix with Cubic

…udFormation, Athena, Glue, EventBridge, SSM, VPC Endpoint Services)

Signed-off-by: IlyaasK <[email protected]>
@IlyaasK IlyaasK force-pushed the feat/aws-missing-resources branch from a4a27b4 to d336a22 Compare December 23, 2025 03:00
@kunaals
Copy link
Collaborator

kunaals commented Jan 7, 2026

Hi @IlyaasK mind breaking this up into 1 PR per resource type? Also mind including screenshots showing that you tested each one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing Assets/Resource

2 participants