Skip to content

Feature Request: Filtering Options for Scanning (Tags, Extensions, Paths, Size) #1205

@pgagnidze

Description

@pgagnidze

Improve the cdk-serverless-clamscan construct with a filter property for scanning S3 objects based on tags, file extensions, S3 paths, and object size. Additionally, introduce configurable logic for both overall filtering criteria and tag-specific filtering, allowing different filters per bucket. These filters should also be configurable when dynamically adding buckets using the addSourceBucket method.

Proposed filter Property

The filter property will be an object applied per bucket, with the following sections:

  1. Tags: Check if the object is tagged with specific key-value pairs, with a configurable logic operator to determine matching criteria.
  2. File Extensions: Specific file types to scan.
  3. S3 Paths: Targeted S3 prefixes or paths.
  4. Object Size: Conditions to scan objects larger or smaller than specified sizes.
  5. Logic Operator: Defines the overall logic to combine the specified filters (default: ALL).

Configuration Example

Here’s an organized example showing the filter property per bucket:

Example:

new ServerlessClamscan(this, 'rClamscan', {
  buckets: [
    {
      bucket: bucket_1,
      filter: {
        tags: {
          criteria: { 
            "ScanRequired": "true",
            "Priority": "high"
          },
          logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
        },
        extensions: ['.mp4', '.jpeg', '.png'],
        paths: ['uploads/images/', 'uploads/videos/'],
        objectSize: {
          greaterThanBytes: 1024, // 1 KB, optional
          lessThanBytes: 10485760 // 10 MB, optional
        },
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    },
    {
      bucket: bucket_2,
      filter: {
        extensions: ['.exe', '.zip'],
        logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
      }
    }
  ]
});

// Adding a source bucket with filters dynamically
const sc = new ServerlessClamscan(this, 'rClamscan', { /* initial configuration */ });
sc.addSourceBucket(bucket_3, {
  filter: {
    tags: {
      criteria: { 
        "ScanRequired": "true"
      },
      logicOperator: 'ANY' // Can be 'ANY' or 'ALL' (default: ANY)
    },
    extensions: ['.docx', '.pdf'],
    paths: ['uploads/documents/'],
    objectSize: {
      lessThanBytes: 5242880 // 5 MB, optional
    },
    logicOperator: 'ALL' // Can be 'ANY' or 'ALL' (default: ALL)
  }
});

Scanning Behavior

  • Overall Logic Operator (default: ALL): If set to ALL, only objects meeting all specified criteria will be scanned. If set to ANY, an object meeting any of the specified criteria will be scanned.
  • Tag Logic Operator (default: ANY): Determines if any specified tags must match. If set to ALL, all specified tags must match.
  • Object Size Conditions: Users can specify either greaterThanBytes or lessThanBytes, or both, depending on their needs.

This feature maintains backward compatibility by ensuring that if no filter is specified, all objects are scanned.

Benefits

  • Cost Efficiency: Lower Lambda invocation costs by skipping unnecessary scans.
  • Flexibility: Multiple filters to meet diverse needs, all within a single, unified configuration.
  • Targeted Security: An organization can focus on scanning only certain paths where sensitive documents are uploaded.

Looking forward to your feedback and thank you for considering this feature request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions