Skip to content

RESOURCE_EXHAUSTED: Maximum length exceeded #493

@pieterjanpintens

Description

@pieterjanpintens

We are seeing these errors like this in our logs from fluentd.

2022-09-18 21:50:00.944377087 +0000 fluent.warn: {"error":"3:Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760. debug_error_string:{\"created\":\"@1663537800.943751927\",\"description\":\"Error received from peer ipv4:216.239.34.174:443\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":905,\"grpc_message\":\"Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760\",\"grpc_status\":3}","error_code":"3","message":"Dropping 4805 log message(s) error=\"3:Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760. debug_error_string:{\\\"created\\\":\\\"@1663537800.943751927\\\",\\\"description\\\":\\\"Error received from peer ipv4:216.239.34.174:443\\\",\\\"file\\\":\\\"src/core/lib/surface/call.cc\\\",\\\"file_line\\\":905,\\\"grpc_message\\\":\\\"Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760\\\",\\\"grpc_status\\\":3}\" error_code=\"3\""}

Our setup is a batch like system that processes big log files from s3.
Out config is like this. We tried to set the buffer_chunk_limit low but it does not help.

<match **>
    @type google_cloud
    @log_level debug
    # prevents errors in logs,it will fail anyway
    use_metadata_service false
    label_map {
      "environment": "environment",
      "project": "project",
      "branch": "branch",
      "function": "function",
      "program": "program",
      "stream": "log"
    }
    # Set the chunk limit conservatively to avoid exceeding the recommended
    # chunk size of 10MB per write request. The API request size can be a few
    # times bigger than the raw log size.
    buffer_chunk_limit 512KB
    # Flush logs every 5 seconds, even if the buffer is not full.
    flush_interval 5s
    # Enforce some limit on the number of retries.
    disable_retry_limit false
    # After 3 retries, a given chunk will be discarded.
    retry_limit 3
    # Wait 10 seconds before the first retry. The wait interval will be doubled on
    # each following retry (20s, 40s...) until it hits the retry limit.
    retry_wait 10
    # Never wait longer than 5 minutes between retries. If the wait interval
    # reaches this limit, the exponentiation stops.
    # Given the default config, this limit should never be reached, but if
    # retry_limit and retry_wait are customized, this limit might take effect.
    max_retry_wait 300
    # Use multiple threads for processing.
    num_threads 8
    # Use the gRPC transport.
    use_grpc true
    # Try to limit the size of the uploaded data
    grpc_compression_algorithm gzip
    # If a request is a mix of valid log entries and invalid ones, ingest the
    # valid ones and drop the invalid ones instead of dropping everything.
    partial_success true
    <buffer>
      @type memory
      timekey 60
      timekey_wait 10
      overflow_action block
    </buffer>
</match>

Looking futher down the line it seems that you can specify a channel option on grcp channel: GRPC_ARG_MAX_SEND_MESSAGE_LENGTH. Reading about it I wonder if setting this option would solve this problem?
It currently is not exposed to the fluentd config. By default it is set to -1? Not sure if grcp would split the message or if it would just turn the server error into a client error...

We are looking for guidance on how we should proceed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions