Skip to content

Unexpected timeout from process #1758

Open
@maciejsobocinski

Description

@maciejsobocinski

Hi,

We have dotnetmonitor set up on ECS Fargate. Running in listen mode collecting metrics every X. Our set up is a single dotnetmonitor side car inside each launched task with many tasks being launched. It stops working for us on some tasks after a few hours with the following error:

{
    "Timestamp": "2022-04-15T06:50:03.0758604Z",
    "EventId": 52,
    "LogLevel": "Warning",
    "Category": "Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource",
    "Message": "Unexpected timeout from process 6. Process will no longer be monitored.",
    "State": {
        "Message": "Unexpected timeout from process 6. Process will no longer be monitored.",
        "processId": "6",
        "{OriginalFormat}": "Unexpected timeout from process {processId}. Process will no longer be monitored."
    },
    "Scopes": []
}

Then all subsequent requests get this error:

{
    "Timestamp": "2022-04-15T06:55:01.6363199Z",
    "EventId": 1,
    "LogLevel": "Error",
    "Category": "Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController",
    "Message": "Request failed.",
    "Exception": "System.ArgumentException: Unable to discover a target process.    at Microsoft.Diagnostics.Monitoring.WebApi.DiagnosticServices.GetProcessAsync(DiagProcessFilter processFilterConfig, CancellationToken token) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/DiagnosticServices.cs:line 100    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.<>c__DisplayClass33_0`1.<<InvokeForProcess>b__0>d.MoveNext() in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagController.cs:line 713 --- End of stack trace from previous location ---    at Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagControllerExtensions.InvokeService[T](ControllerBase controller, Func`1 serviceCall, ILogger logger) in /_/src/Microsoft.Diagnostics.Monitoring.WebApi/Controllers/DiagControllerExtensions.cs:line 91",
    "State": {
        "Message": "Request failed.",
        "{OriginalFormat}": "Request failed."
    },
    "Scopes": [
        {
            "Message": "SpanId:5f73f4ec6a4c2a06, TraceId:6e3bec22534dca3eed9ae13c8150dc0c, ParentId:0d6726492bd0e999",
            "SpanId": "5f73f4ec6a4c2a06",
            "TraceId": "6e3bec22534dca3eed9ae13c8150dc0c",
            "ParentId": "0d6726492bd0e999"
        },
        {
            "Message": "ConnectionId:0HMGU731FOFDF",
            "ConnectionId": "0HMGU731FOFDF"
        },
        {
            "Message": "RequestPath:/livemetrics RequestId:0HMGU731FOFDF:00000002",
            "RequestId": "0HMGU731FOFDF:00000002",
            "RequestPath": "/livemetrics"
        },
        {
            "Message": "Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.CaptureMetrics (Microsoft.Diagnostics.Monitoring.WebApi)",
            "ActionId": "cc79e4d4-794e-481f-8083-fb3f3c7b5ca5",
            "ActionName": "Microsoft.Diagnostics.Monitoring.WebApi.Controllers.DiagController.CaptureMetrics (Microsoft.Diagnostics.Monitoring.WebApi)"
        },
        {
            "Message": "ArtifactType:livemetrics",
            "ArtifactType": "livemetrics"
        }
    ]
}

Note the main container itself keeps on working just fine and is processing requests without any issues. Per metrics captured before the error I do not see any abnormal memory/cpu/etc usage compared to the other tasks where dotnet-monitor keeps on working.

Here is our ecs task definition (the dotnetmonitor config values are under 'Environment'):

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Cpu: !Ref TaskCpu
      Memory: !Ref TaskMemory
      NetworkMode: awsvpc
      ExecutionRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/ecsTaskExecutionRole"
      TaskRoleArn: !ImportValue AppServicesEcsTaskRoleArn
      RequiresCompatibilities:
        - FARGATE
      Volumes:
        - Name: tmp
      ContainerDefinitions:
        - Essential: true
          Name: appservices
          Image:
            !Sub
              - "${repository}:${image}"
              - repository: !ImportValue AppServicesEcrRepository
                image: !Ref TaskEcrImageTag
          Ulimits:
            - Name: nofile
              HardLimit: 65535
              SoftLimit: 65535
          PortMappings:
            - ContainerPort: 44392
              Protocol: tcp
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !ImportValue AppServicesEcsLogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: !Ref EnvironmentName
          LinuxParameters:
            InitProcessEnabled: true
            Capabilities:
              Add:
                - SYS_PTRACE
          StopTimeout: 120
          MountPoints:
            - ContainerPath: /tmp
              SourceVolume: tmp
          Environment:
            - Name: DOTNET_DiagnosticPorts
              Value: /tmp/port
          DependsOn:
            - ContainerName: dotnet-monitor
              Condition: START
        - Essential: true
          Name: dotnet-monitor
          Image:
            !Sub
            - "${repository}:${image}-dotnetmonitor"
            - repository: !ImportValue AppServicesEcrRepository
              image: !Ref TaskEcrImageTag
          MountPoints:
            - ContainerPath: /tmp
              SourceVolume: tmp
          Environment:
            - Name: Kestrel__Certificates__Default__Path
              Value: /tmp/cert.pfx
            - Name: DotnetMonitor_S3Bucket
              Value: !Sub '{{resolve:ssm:/appservices/${EnvironmentName}/integration.bulk.s3.bucket:1}}'
            - Name: DotnetMonitor_DefaultProcess__Filters__0__Key
              Value: ProcessName
            - Name: DotnetMonitor_DefaultProcess__Filters__0__Value
              Value: dotnet
            - Name: DotnetMonitor_DiagnosticPort__ConnectionMode
              Value: Listen
            - Name: DotnetMonitor_DiagnosticPort__EndpointName
              Value: /tmp/port
            - Name: DotnetMonitor_Storage__DumpTempFolder
              Value: /tmp
            - Name: DotnetMonitor_Egress__FileSystem__file__directoryPath
              Value: /tmp/gcdump
            - Name: DotnetMonitor_Egress__FileSystem__file__intermediateDirectoryPath
              Value: /tmp/gcdumptmp
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Trigger__Type
              Value: EventCounter
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Trigger__Settings__ProviderName
              Value: System.Runtime
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Trigger__Settings__CounterName
              Value: working-set
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Trigger__Settings__GreaterThan
              Value: !Ref TaskMemoryAutoGCDump
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Trigger__Settings__SlidingWindowDuration
              Value: 00:00:05
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__0__Type
              Value: CollectGCDump
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__0__Name
              Value: GCDump
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__0__Settings__Egress
              Value: file
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__1__Type
              Value: Execute
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__1__Settings__Path
              Value: /bin/sh
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Actions__1__Settings__Arguments
              Value: /app/gcdump.sh $(Actions.GCDump.EgressPath)
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Limits__ActionCount
              Value: 1
            - Name: DotnetMonitor_CollectionRules__HighMemoryRule__Limits__ActionCountSlidingWindowDuration
              Value: 03:00:00
          Secrets:
            - Name: DotnetMonitor_Authentication__MonitorApiKey__Subject
              ValueFrom: !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/appservices/${EnvironmentName}/dotnetmonitor.subject"
            - Name: DotnetMonitor_Authentication__MonitorApiKey__PublicKey
              ValueFrom: !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/appservices/${EnvironmentName}/dotnetmonitor.publickey"
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !ImportValue AppServicesEcsLogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: !Ref EnvironmentName

And dockerfile to customize the default dotnet monitor container:

FROM mcr.microsoft.com/dotnet/monitor:6

RUN apk add curl && \
    apk add jq && \
    apk add aws-cli && \
    apk add dos2unix

RUN adduser -s /bin/true -u 1000 -D -h /app app \
  && chown -R "app" "/app"

COPY --chown=app:app --chmod=500 gcdump.sh /app/gcdump.sh
RUN dos2unix /app/gcdump.sh

USER app

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions