-
Notifications
You must be signed in to change notification settings - Fork 466
Feat/aws missing resources #2178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feat/aws missing resources #2178
Conversation
76bc963 to
a4a27b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 issues found across 41 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="cartography/models/aws/eventbridge/schema_registry.py">
<violation number="1" location="cartography/models/aws/eventbridge/schema_registry.py:13">
P1: The `id` should use `RegistryArn` instead of `RegistryName` for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment "No ARN returned typically" is also incorrect since `RegistryArn` IS mapped on line 16. This is inconsistent with the similar `event_bus.py` model which correctly uses `Arn` as the id.</violation>
</file>
<file name="cartography/models/aws/ssm/document.py">
<violation number="1" location="cartography/models/aws/ssm/document.py:13">
P1: Using `Name` alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., `{region}:{name}`) or using the ARN format `arn:aws:ssm:{region}:{account}:document/{name}` to ensure uniqueness across regions.</violation>
</file>
<file name="cartography/models/aws/wafv2/web_acl.py">
<violation number="1" location="cartography/models/aws/wafv2/web_acl.py:13">
P1: The `id` property should use the ARN (globally unique) instead of AWS's `Id` field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.</violation>
</file>
<file name="cartography/models/aws/cloudfront/distribution.py">
<violation number="1" location="cartography/models/aws/cloudfront/distribution.py:14">
P2: Missing `extra_index=True` on the `arn` property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.</violation>
</file>
<file name="cartography/models/aws/glue/database.py">
<violation number="1" location="cartography/models/aws/glue/database.py:13">
P1: Using `Name` as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in `transform_glue_databases()` (format: `arn:aws:glue:{region}:{catalog_id}:database/{name}`) and using it as the id, similar to how `AWSBackupVault` uses `BackupVaultArn`.</violation>
</file>
<file name="cartography/cli.py">
<violation number="1" location="cartography/cli.py:1004">
P2: Missing error handling for int conversion of `NEO4J_MAX_CONNECTION_IDLE_TIME` environment variable. If the value is not a valid integer, this will raise a `ValueError` with an unclear error message. Consider wrapping in try/except with a descriptive error message.</violation>
</file>
<file name="cartography/models/aws/cloudformation/stack.py">
<violation number="1" location="cartography/models/aws/cloudformation/stack.py:15">
P1: Bug: `PropertyRef("id")` will look for `item.id` in the source data, but boto3 returns `StackId`, not `id`. This should be `PropertyRef("StackId")` to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
|
|
||
| @dataclass(frozen=True) | ||
| class AWSEventSchemasRegistryNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("RegistryName") # No ARN returned typically, name is unique per account/region? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: The id should use RegistryArn instead of RegistryName for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment "No ARN returned typically" is also incorrect since RegistryArn IS mapped on line 16. This is inconsistent with the similar event_bus.py model which correctly uses Arn as the id.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/eventbridge/schema_registry.py, line 13:
<comment>The `id` should use `RegistryArn` instead of `RegistryName` for global uniqueness. Registry names are only unique within a region, so syncing the same registry name from multiple regions would cause conflicts. The comment "No ARN returned typically" is also incorrect since `RegistryArn` IS mapped on line 16. This is inconsistent with the similar `event_bus.py` model which correctly uses `Arn` as the id.</comment>
<file context>
@@ -0,0 +1,43 @@
+
+@dataclass(frozen=True)
+class AWSEventSchemasRegistryNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("RegistryName") # No ARN returned typically, name is unique per account/region?
+ # RegistryArn is in response.
+ name: PropertyRef = PropertyRef("RegistryName")
</file context>
|
|
||
| @dataclass(frozen=True) | ||
| class AWSSSMDocumentNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("Name") # Name is unique in account/region combo for SSM Documents? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Using Name alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., {region}:{name}) or using the ARN format arn:aws:ssm:{region}:{account}:document/{name} to ensure uniqueness across regions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/ssm/document.py, line 13:
<comment>Using `Name` alone as the node ID will cause collisions when syncing SSM Documents from multiple regions. SSM Document names are unique per account+region (not per account alone). Consider constructing a composite ID in the transform step (e.g., `{region}:{name}`) or using the ARN format `arn:aws:ssm:{region}:{account}:document/{name}` to ensure uniqueness across regions.</comment>
<file context>
@@ -0,0 +1,48 @@
+
+@dataclass(frozen=True)
+class AWSSSMDocumentNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("Name") # Name is unique in account/region combo for SSM Documents?
+ # Actually SSM Documents can be shared. And Name is unique per account.
+ # ARN is not always returned in list. Let's use Name as ID.
</file context>
|
|
||
| @dataclass(frozen=True) | ||
| class AWSWAFv2WebACLNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("Id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: The id property should use the ARN (globally unique) instead of AWS's Id field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/wafv2/web_acl.py, line 13:
<comment>The `id` property should use the ARN (globally unique) instead of AWS's `Id` field (only unique within scope). This follows the established Cartography pattern seen in ACM Certificate and Backup Vault models, and prevents potential node collisions in Neo4j.</comment>
<file context>
@@ -0,0 +1,41 @@
+
+@dataclass(frozen=True)
+class AWSWAFv2WebACLNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("Id")
+ name: PropertyRef = PropertyRef("Name")
+ arn: PropertyRef = PropertyRef("ARN")
</file context>
| @dataclass(frozen=True) | ||
| class AWSCloudFrontDistributionNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("Id") | ||
| arn: PropertyRef = PropertyRef("ARN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Missing extra_index=True on the arn property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/cloudfront/distribution.py, line 14:
<comment>Missing `extra_index=True` on the `arn` property. ARN fields are commonly used for lookups and should be indexed for query performance, following the established pattern in other AWS models.</comment>
<file context>
@@ -0,0 +1,45 @@
+@dataclass(frozen=True)
+class AWSCloudFrontDistributionNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("Id")
+ arn: PropertyRef = PropertyRef("ARN")
+ domain_name: PropertyRef = PropertyRef("DomainName")
+ status: PropertyRef = PropertyRef("Status")
</file context>
|
|
||
| @dataclass(frozen=True) | ||
| class AWSGlueDatabaseNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("Name") # Name is ID for database in account/region context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Using Name as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in transform_glue_databases() (format: arn:aws:glue:{region}:{catalog_id}:database/{name}) and using it as the id, similar to how AWSBackupVault uses BackupVaultArn.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/glue/database.py, line 13:
<comment>Using `Name` as the node id will cause collisions when syncing multiple AWS accounts with databases of the same name. The id should be globally unique across all accounts/regions. Consider constructing the ARN in `transform_glue_databases()` (format: `arn:aws:glue:{region}:{catalog_id}:database/{name}`) and using it as the id, similar to how `AWSBackupVault` uses `BackupVaultArn`.</comment>
<file context>
@@ -0,0 +1,43 @@
+
+@dataclass(frozen=True)
+class AWSGlueDatabaseNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("Name") # Name is ID for database in account/region context
+ name: PropertyRef = PropertyRef("Name")
+ description: PropertyRef = PropertyRef("Description")
</file context>
cartography/cli.py
Outdated
| config.neo4j_max_connection_idle_time = int( | ||
| env_neo4j_max_connection_idle_time | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Missing error handling for int conversion of NEO4J_MAX_CONNECTION_IDLE_TIME environment variable. If the value is not a valid integer, this will raise a ValueError with an unclear error message. Consider wrapping in try/except with a descriptive error message.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/cli.py, line 1004:
<comment>Missing error handling for int conversion of `NEO4J_MAX_CONNECTION_IDLE_TIME` environment variable. If the value is not a valid integer, this will raise a `ValueError` with an unclear error message. Consider wrapping in try/except with a descriptive error message.</comment>
<file context>
@@ -984,6 +993,18 @@ def main(self, argv: str) -> int:
+ config.neo4j_max_connection_idle_time is None
+ and env_neo4j_max_connection_idle_time
+ ):
+ config.neo4j_max_connection_idle_time = int(
+ env_neo4j_max_connection_idle_time
+ )
</file context>
| config.neo4j_max_connection_idle_time = int( | |
| env_neo4j_max_connection_idle_time | |
| ) | |
| try: | |
| config.neo4j_max_connection_idle_time = int( | |
| env_neo4j_max_connection_idle_time | |
| ) | |
| except ValueError: | |
| raise ValueError( | |
| f"NEO4J_MAX_CONNECTION_IDLE_TIME environment variable must be an integer, " | |
| f"got: {env_neo4j_max_connection_idle_time}" | |
| ) |
| class AWSCloudFormationStackNodeProperties(CartographyNodeProperties): | ||
| id: PropertyRef = PropertyRef("StackId") | ||
| name: PropertyRef = PropertyRef("StackName") | ||
| arn: PropertyRef = PropertyRef("id") # StackId serves as ARN usually? No, StackId IS the ARN in CloudFormation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Bug: PropertyRef("id") will look for item.id in the source data, but boto3 returns StackId, not id. This should be PropertyRef("StackId") to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cartography/models/aws/cloudformation/stack.py, line 15:
<comment>Bug: `PropertyRef("id")` will look for `item.id` in the source data, but boto3 returns `StackId`, not `id`. This should be `PropertyRef("StackId")` to match the id property. See backup models for the correct pattern where both id and arn reference the same source field directly.</comment>
<file context>
@@ -0,0 +1,46 @@
+class AWSCloudFormationStackNodeProperties(CartographyNodeProperties):
+ id: PropertyRef = PropertyRef("StackId")
+ name: PropertyRef = PropertyRef("StackName")
+ arn: PropertyRef = PropertyRef("id") # StackId serves as ARN usually? No, StackId IS the ARN in CloudFormation.
+ # Wait, boto3 returns StackId as ARN. Let's use it for id and arn.
+ creation_time: PropertyRef = PropertyRef("CreationTime")
</file context>
…udFormation, Athena, Glue, EventBridge, SSM, VPC Endpoint Services) Signed-off-by: IlyaasK <[email protected]>
a4a27b4 to
d336a22
Compare
|
Hi @IlyaasK mind breaking this up into 1 PR per resource type? Also mind including screenshots showing that you tested each one? |
Summary
This PR implements 9 missing AWS resource modules to significantly improve Cartography's AWS coverage. It adds full support for fetching, transforming, and loading the following resources into Neo4j:
Each module includes proper data models (NodeSchemas), sync logic, and unit tests for data transformation.
Related issues or links
Checklist
Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:
If you are changing a node or relationship:
If you are implementing a new intel module:
Detailed Changes
New Intel Modules & Sync Logic:
AWSBackupPlanandAWSBackupVault.AWSWAFv2WebACLandAWSWAFv2RuleGroup.New Data Models:
cartography/models/aws/.Configuration:
cartography/intel/aws/resources.pyto be included in the default AWS sync.Testing:
tests/unit/cartography/intel/aws/test_backup.pytests/unit/cartography/intel/aws/test_wafv2.pytests/unit/cartography/intel/aws/test_cloudfront.pytests/unit/cartography/intel/aws/test_cloudformation.pytests/unit/cartography/intel/aws/test_athena_glue.py