Open
Description
Describe the bug
in_tail updates the last read offset of a log file to ffffffffffffffff
in var/log/fluentd-containers.log.pos
when a .log file is unwatched after not being updated for limit_recently_modified
.
This causes container logs to be re-read from the head when they are re-watched, because its last read offset in .log.pos is lost.
Below is an example .log.pos, where abcd.log with same inode is being watched, unwatched, and then re-watched.
/var/log/containers/abcd.log ffffffffffffffff 000000000010134e
/var/log/containers/abcd.log 0000000000014bfa 000000000010134e
To Reproduce
change limit_recently_modified
to 30m, and generate a log every 1h ( > 30m) in a container.
Expected behavior
in_tail should leave the offset in .log.pos unchanged for unwatched log files.
Your Environment
- Fluentd version:
- Package version:
- Operating system:
- Kernel version:
Your Configuration
# Prevent fluentd from handling records containing its own logs. Otherwise
# it can lead to an infinite loop, when error in sending one message generates
# another message which also fails to be sent and so on.
<match fluent.**>
@type null
</match>
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head "#{ENV['FLUENTD_INPUT_READ_FROM_HEAD'] || 'false'}"
limit_recently_modified "#{ENV['LIMIT_RECENTLY_MODIFIED'] || '120m'}"
rotate_wait 0
enable_stat_watcher false
<parse>
@type "#{ENV['K8S_FORMAT_PARSER'] || 'regexp' }"
expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[a-zA-Z]) (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%Z
time_key time
keep_time_key true
timezone "+00:00"
</parse>
</source>
Your Error Log
2025-02-26 16:53:45 +0900 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
2025-02-26 16:53:45 +0900 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-avro' version '1.1.1'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-concat' version '2.5.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-dedot_filter' version '1.0.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.15'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-json-in-json-2' version '1.0.2'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-kafka' version '0.17.5'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-kinesis' version '3.4.2'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.9.5'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-parser-avro' version '0.3.1'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-s3' version '1.6.1'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-sampling-filter' version '1.1.0'
2025-02-26 16:53:45 +0900 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2025-02-26 16:53:45 +0900 [info]: gem 'fluentd' version '1.14.3'
2025-02-26 16:53:45 +0900 [warn]: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2025-02-26 16:53:45 +0900 [info]: using configuration file: <ROOT>
<system>
workers 1
@log_level info
</system>
<match fluent.**>
@type null
</match>
<source>
@type tail
@id in_tail_container_logs
path "/var/log/containers/*.log"
pos_file "/var/log/fluentd-containers.log.pos"
tag "kubernetes.*"
read_from_head false
limit_recently_modified 120m
rotate_wait 0
enable_stat_watcher false
<parse>
@type "regexp"
expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[a-zA-Z]) (?<log>.*)$/
time_format "%Y-%m-%dT%H:%M:%S.%N%Z"
time_key "time"
keep_time_key true
timezone "+00:00"
unmatched_lines
</parse>
</source>
<match kubernetes.**>
@type record_modifier
prepare_value "@tag_to_kubernetes_name_regexp = \'kubernetes\\\\.var\\\\.log\\\\.containers\\\\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\\\\.log$\';@tag_to_kubernetes_name_regexp_compiled = Regexp.compile(@tag_to_kubernetes_name_regexp)"
tag "kcp"
<record>
@timestamp ${Time.at(time).strftime('%Y-%m-%dT%H:%M:%S.%L+09:00')}
cluster_name redacted
hostname redacted
kubernetes ${tag.match(@tag_to_kubernetes_name_regexp_compiled).named_captures}
</record>
</match>
<filter **>
@type parser
key_name "log"
reserve_data true
emit_invalid_record_to_error false
remove_key_name_field true
<parse>
@type "none"
expression /^.*$/
types
keep_time_key true
estimate_current_event false
time_key "__time_field_in_log_json__"
time_format "%Y-%m-%d %H:%M:%S.%NZ"
message_key "log"
</parse>
</filter>
<match **>
@type copy
<store>
@type "elasticsearch"
@id out_es
include_tag_key true
suppress_type_name true
host "es.hostname"
port 9200
scheme http
reload_connections false
reload_on_failure true
reconnect_on_error true
index_name "${tag}-%Y.%m.%d"
type_name "fluentd"
id_key "${SecureRandom.uuid}"
<buffer tag,time>
@type "memory"
timekey 5
flush_mode interval
flush_at_shutdown true
flush_thread_count 8
flush_interval 5s
chunk_limit_size 32M
queue_limit_length 32
retry_max_interval 60
retry_forever true
</buffer>
</store>
</match>
</ROOT>
2025-02-26 16:53:45 +0900 [info]: starting fluentd-1.14.3 pid=8 ruby="2.6.9"
2025-02-26 16:53:45 +0900 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "--under-supervisor"]
2025-02-26 16:53:46 +0900 [warn]: #0 '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
2025-02-26 16:53:46 +0900 [info]: adding match pattern="fluent.**" type="null"
2025-02-26 16:53:46 +0900 [info]: adding match pattern="kubernetes.**" type="record_modifier"
2025-02-26 16:53:46 +0900 [info]: adding filter pattern="**" type="parser"
2025-02-26 16:53:46 +0900 [info]: adding match pattern="**" type="copy"
2025-02-26 16:53:46 +0900 [info]: adding source type="tail"
2025-02-26 16:53:46 +0900 [warn]: #0 define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2025-02-26 16:53:46 +0900 [warn]: parameter 'expression' in <parse>
@type "none"
expression /^.*$/
types
keep_time_key true
estimate_current_event false
time_key "__time_field_in_log_json__"
time_format "%Y-%m-%d %H:%M:%S.%NZ"
message_key "log"
</parse> is not used.
2025-02-26 16:53:46 +0900 [info]: #0 starting fluentd worker pid=13 ppid=8 worker=0
2025-02-26 16:53:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-02-26 16:53:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-02-26 16:53:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-02-26 16:53:46 +0900 [info]: #0 fluentd worker is now running worker=0
2025-02-27 09:57:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-02-28 09:57:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-03-01 09:57:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
2025-03-02 09:57:46 +0900 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/abcd.log
Additional context
No response
Metadata
Metadata
Assignees
Type
Projects
Status
To-Do