Skip to content

filter_aws: Adds resource entity to PLE calls in cloudwatch logs plugin for dataplane and host logs #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: 1.9.10
Choose a base branch
from

Conversation

nathalapooja
Copy link

@nathalapooja nathalapooja commented Dec 30, 2024

  • Modified aws filter plugin to extract additional resource entity attributes
  • Modified cloudwatch logs output plugin to add resource entity in PLE calls

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
        dataplane-log.conf: |
          [INPUT]
            Name                systemd
            Tag                 dataplane.systemd.*
            Systemd_Filter      _SYSTEMD_UNIT=docker.service
            Systemd_Filter      _SYSTEMD_UNIT=containerd.service
            Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
            DB                  /var/fluent-bit/state/systemd.db
            Path                /var/log/journal
            Read_From_Tail      ${READ_FROM_TAIL}

          [INPUT]
            Name                tail
            Tag                 dataplane.tail.*
            Path                /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
            multiline.parser    docker, cri
            DB                  /var/fluent-bit/state/flb_dataplane_tail.db
            Mem_Buf_Limit       50MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Rotate_Wait         30
            storage.type        filesystem
            Read_from_Head      ${READ_FROM_HEAD}

          [FILTER]
            Name                modify
            Match               dataplane.systemd.*
            Rename              _HOSTNAME                   hostname
            Rename              _SYSTEMD_UNIT               systemd_unit
            Rename              MESSAGE                     message
            Remove_regex        ^((?!hostname|systemd_unit|message).)*$

          [FILTER]
            Name                aws
            Match               dataplane.*
            imds_version        v2
            enable_entity       true
            entity_type         resource

          [OUTPUT]
            Name                cloudwatch_logs
            Match               dataplane.*
            region              ${AWS_REGION}
            log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
            log_stream_prefix   ${HOST_NAME}-
            auto_create_group   true
            extra_user_agent    container-insights
            entity_type         resource
            add_entity          true
        host-log.conf: |
          [INPUT]
            Name                tail
            Tag                 host.dmesg
            Path                /var/log/dmesg
            Key                 message
            DB                  /var/fluent-bit/state/flb_dmesg.db
            Mem_Buf_Limit       5MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Read_from_Head      ${READ_FROM_HEAD}

          [INPUT]
            Name                tail
            Tag                 host.messages
            Path                /var/log/messages
            Parser              syslog
            DB                  /var/fluent-bit/state/flb_messages.db
            Mem_Buf_Limit       5MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Read_from_Head      ${READ_FROM_HEAD}

          [INPUT]
            Name                tail
            Tag                 host.secure
            Path                /var/log/secure
            Parser              syslog
            DB                  /var/fluent-bit/state/flb_secure.db
            Mem_Buf_Limit       5MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Read_from_Head      ${READ_FROM_HEAD}

          [FILTER]
            Name                aws
            Match               host.*
            imds_version        v2
            enable_entity       true
            entity_type         resource

          [OUTPUT]
            Name                cloudwatch_logs
            Match               host.*
            region              ${AWS_REGION}
            log_group_name      /aws/containerinsights/${CLUSTER_NAME}/host
            log_stream_prefix   ${HOST_NAME}.
            auto_create_group   true
            extra_user_agent    container-insights
            entity_type         resource
            add_entity          true
  • Debug log output from testing the change
    For Dataplane logs on EKS cluster
[2025/03/10 15:20:27] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] Sending payload={"logGroupName":"/aws/containerinsights/test-agent/dataplane","logStreamName":"ip-192-168-28-20.ec2.internal-dataplane.systemd.kubelet.service","entity":{"keyAttributes":{"Type" │
│ :"AWS::Resource","ResourceType":"AWS::EKS::Cluster","Identifier":"test-agent","AwsAccountId":"9576888xxxxx"},"attributes":{"EC2.InstanceId":"i-0f1346635d4142476"}},"logEvents":[{"timestamp":1741620026570,"message":"{\"systemd_unit\":\"kubelet.service\",\"hostname\": │
│ \"ip-192-168-28-20.ec2.internal\",\"message\":\"E0310 15:20:26.569894    2598 pod_workers.go:965] \\\"Error syncing pod, skipping\\\" err=\\\"failed to \\\\\\\"StartContainer\\\\\\\" for \\\\\\\"opentelemetry-auto-instrumentation-nodejs\\\\\\\" with CrashLoopBackOff │
│ : \\\\\\\"back-off 5m0s restarting failed container=opentelemetry-auto-instrumentation-nodejs pod=nutrition-service-nodejs-77bcf76bc9-dz5kk_default(126e1ad2-79a8-4048-89bd-9dccee24b0ab)\\\\\\\"\\\" pod=\\\"default/nutrition-service-nodejs-77bcf76bc9-dz5kk\\\" podUID │
│ =126e1ad2-79a8-4048-89bd-9dccee24b0ab\",\"az\":\"us-east-1a\",\"ec2_instance_id\":\"i-0f1346635d4142476\"}"}]}   

Verifying entity by calling ListEntitiesForLogGroup

dev-dsk-poojardy-1d-9f6133ab % /apollo/env/envImprovement/bin/awscurl \
    --request POST \
    --header 'Content-Encoding: amz-1.0' \
    --header 'Content-Type: application/json' \
    --region=us-east-1 \
    --service logs \
    --header 'x-Amz-Target: com.amazonaws.logs.v20140328.Logs_20140328.ListEntitiesForLogGroup' \
    --data '{"logGroupIdentifier":"/aws/containerinsights/test-agent/dataplane"}' \
    https://logs.us-east-1.amazonaws.com
{"entities":[{"attributes":{"AWS.Resource.ARN":"arn:aws:eks:us-east-1:9576888xxxxx:cluster/test-agent"},"keyAttributes":{"Identifier":"test-agent","ResourceType":"AWS::EKS::Cluster","Type":"AWS::Resource"}}]}

Log event on cloudwatch console

{
    "systemd_unit": "kubelet.service",
    "hostname": "ip-192-168-28-20.ec2.internal",
    "message": "E0310 15:20:26.569894    2598 pod_workers.go:965] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"opentelemetry-auto-instrumentation-nodejs\\\" with CrashLoopBackOff: \\\"back-off 5m0s restarting failed container=opentelemetry-auto-instrumentation-nodejs pod=nutrition-service-nodejs-77bcf76bc9-dz5kk_default(126e1ad2-79a8-4048-89bd-9dccee24b0ab)\\\"\" pod=\"default/nutrition-service-nodejs-77bcf76bc9-dz5kk\" podUID=126e1ad2-79a8-4048-89bd-9dccee24b0ab",
    "az": "us-east-1a",
    "ec2_instance_id": "i-0f1346635d4142476"
}

For Host logs in EKS cluster

[2025/03/10 15:20:27] [ info] [output:cloudwatch_logs:cloudwatch_logs.2] Sending payload={"logGroupName":"/aws/containerinsights/test-agent/host","logStreamName":"ip-192-168-28-20.ec2.internal.host.messages","entity":{"keyAttributes":{"Type":"AWS::Resource","Resourc │
│ eType":"AWS::EKS::Cluster","Identifier":"test-agent","AwsAccountId":"9576888xxxxx"},"attributes":{"EC2.InstanceId":"i-0f1346635d4142476"}},"logEvents":[{"timestamp":1741620026000,"message":"{\"host\":\"ip-192-168-28-20\",\"ident\":\"kubelet\",\"message\":\"E0310 15: │
│ 20:26.569894    2598 pod_workers.go:965] \\\"Error syncing pod, skipping\\\" err=\\\"failed to \\\\\\\"StartContainer\\\\\\\" for \\\\\\\"opentelemetry-auto-instrumentation-nodejs\\\\\\\" with CrashLoopBackOff: \\\\\\\"back-off 5m0s restarting failed container=opent │
│ elemetry-auto-instrumentation-nodejs pod=nutrition-service-nodejs-77bcf76bc9-dz5kk_default(126e1ad2-79a8-4048-89bd-9dccee24b0ab)\\\\\\\"\\\" pod=\\\"default/nutrition-service-nodejs-77bcf76bc9-dz5kk\\\" podUID=126e1ad2-79a8-4048-89bd-9dccee24b0ab\",\"az\":\"us-east- │
│ 1a\",\"ec2_instance_id\":\"i-0f1346635d4142476\"}"}]}          

Verifying entity by calling ListEntitiesForLogGroup

dev-dsk-poojardy-1d-9f6133ab % /apollo/env/envImprovement/bin/awscurl \
    --request POST \
    --header 'Content-Encoding: amz-1.0' \
    --header 'Content-Type: application/json' \
    --region=us-east-1 \
    --service logs \
    --header 'x-Amz-Target: com.amazonaws.logs.v20140328.Logs_20140328.ListEntitiesForLogGroup' \
    --data '{"logGroupIdentifier":"/aws/containerinsights/test-agent/host"}' \
    https://logs.us-east-1.amazonaws.com
{"entities":[{"attributes":{"AWS.Resource.ARN":"arn:aws:eks:us-east-1:9576888xxxxx:cluster/test-agent"},"keyAttributes":{"Identifier":"test-agent","ResourceType":"AWS::EKS::Cluster","Type":"AWS::Resource"}}]}

Log event on cloudwatch console

{
    "host": "ip-192-168-28-20",
    "ident": "kubelet",
    "message": "E0310 15:20:26.569894    2598 pod_workers.go:965] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"opentelemetry-auto-instrumentation-nodejs\\\" with CrashLoopBackOff: \\\"back-off 5m0s restarting failed container=opentelemetry-auto-instrumentation-nodejs pod=nutrition-service-nodejs-77bcf76bc9-dz5kk_default(126e1ad2-79a8-4048-89bd-9dccee24b0ab)\\\"\" pod=\"default/nutrition-service-nodejs-77bcf76bc9-dz5kk\" podUID=126e1ad2-79a8-4048-89bd-9dccee24b0ab",
    "az": "us-east-1a",
    "ec2_instance_id": "i-0f1346635d4142476"
}
  • Attached Valgrind output that shows no leaks or memory corruption was found
    For cloudwatch logs output plugin: flb-rt-out_cloudwatch
SUCCESS: All unit tests have passed.
==25282== 
==25282== HEAP SUMMARY:
==25282==     in use at exit: 0 bytes in 0 blocks
==25282==   total heap usage: 2 allocs, 2 frees, 1,200 bytes allocated
==25282== 
==25282== All heap blocks were freed -- no leaks are possible
==25282== 
==25282== For lists of detected and suppressed errors, rerun with: -s
==25282== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@nathalapooja nathalapooja changed the title adds resource entity to PLE calls in cloudwatch logs plugin for dataplane and host logs Adds resource entity to PLE calls in cloudwatch logs plugin for dataplane and host logs Dec 30, 2024
@nathalapooja nathalapooja changed the title Adds resource entity to PLE calls in cloudwatch logs plugin for dataplane and host logs filter_aws: Adds resource entity to PLE calls in cloudwatch logs plugin for dataplane and host logs Dec 30, 2024
@zhihonl
Copy link

zhihonl commented Jan 7, 2025

In the entity, Type should be AWS::Resource instead of Resource for AWS resources like EKS

}

/* Create an Upstream context */
ctx->kubernetes_upstream = flb_upstream_create(config,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any scenario where we don't need TLS to communicate with Kubernetes API server? I see the original code uses TCP instead of TLS connection as default so just curious:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the default is TCP, in the code, we have the below definitions for k8s API server info
/* Kubernetes API server info */
#define FLB_API_HOST "kubernetes.default.svc"
#define FLB_API_PORT 443
#define FLB_API_TLS FLB_TRUE

FLB_API_TLS is true in this case, for which we used the FLB_IO_TLS flag in creating the upstream

@@ -165,6 +190,12 @@ static int cb_aws_init(struct flb_filter_instance *f_ins,
/* Remove async flag from upstream */
ctx->ec2_upstream->flags &= ~(FLB_IO_ASYNC);

/*Create kubernetes upstream to query k8s api to define the platform type*/
if (ctx->enable_entity && strncmp(ctx->entity_type, FLB_FILTER_ENTITY_TYPE_RESOURCE, FLB_FILTER_ENTITY_TYPE_RESOURCE_LEN) == 0) {
create_kubernetes_upstream(ctx, config);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the enable_entity flag only applicable to kubernetes? What would happen here if someone enabled this flag and are not using kubernetes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants