-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Description
We are seeing these errors like this in our logs from fluentd.
2022-09-18 21:50:00.944377087 +0000 fluent.warn: {"error":"3:Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760. debug_error_string:{\"created\":\"@1663537800.943751927\",\"description\":\"Error received from peer ipv4:216.239.34.174:443\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":905,\"grpc_message\":\"Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760\",\"grpc_status\":3}","error_code":"3","message":"Dropping 4805 log message(s) error=\"3:Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760. debug_error_string:{\\\"created\\\":\\\"@1663537800.943751927\\\",\\\"description\\\":\\\"Error received from peer ipv4:216.239.34.174:443\\\",\\\"file\\\":\\\"src/core/lib/surface/call.cc\\\",\\\"file_line\\\":905,\\\"grpc_message\\\":\\\"Data decompression failed with decompression status: RESOURCE_EXHAUSTED: Maximum length exceeded: 10485760; at byte 755395; at uncompressed byte 10485760\\\",\\\"grpc_status\\\":3}\" error_code=\"3\""}
Our setup is a batch like system that processes big log files from s3.
Out config is like this. We tried to set the buffer_chunk_limit low but it does not help.
<match **>
@type google_cloud
@log_level debug
# prevents errors in logs,it will fail anyway
use_metadata_service false
label_map {
"environment": "environment",
"project": "project",
"branch": "branch",
"function": "function",
"program": "program",
"stream": "log"
}
# Set the chunk limit conservatively to avoid exceeding the recommended
# chunk size of 10MB per write request. The API request size can be a few
# times bigger than the raw log size.
buffer_chunk_limit 512KB
# Flush logs every 5 seconds, even if the buffer is not full.
flush_interval 5s
# Enforce some limit on the number of retries.
disable_retry_limit false
# After 3 retries, a given chunk will be discarded.
retry_limit 3
# Wait 10 seconds before the first retry. The wait interval will be doubled on
# each following retry (20s, 40s...) until it hits the retry limit.
retry_wait 10
# Never wait longer than 5 minutes between retries. If the wait interval
# reaches this limit, the exponentiation stops.
# Given the default config, this limit should never be reached, but if
# retry_limit and retry_wait are customized, this limit might take effect.
max_retry_wait 300
# Use multiple threads for processing.
num_threads 8
# Use the gRPC transport.
use_grpc true
# Try to limit the size of the uploaded data
grpc_compression_algorithm gzip
# If a request is a mix of valid log entries and invalid ones, ingest the
# valid ones and drop the invalid ones instead of dropping everything.
partial_success true
<buffer>
@type memory
timekey 60
timekey_wait 10
overflow_action block
</buffer>
</match>
Looking futher down the line it seems that you can specify a channel option on grcp channel: GRPC_ARG_MAX_SEND_MESSAGE_LENGTH. Reading about it I wonder if setting this option would solve this problem?
It currently is not exposed to the fluentd config. By default it is set to -1? Not sure if grcp would split the message or if it would just turn the server error into a client error...
We are looking for guidance on how we should proceed
nymous
Metadata
Metadata
Assignees
Labels
No labels