Skip to content

Refactor whole_file overwrite detection and chunked output in LogFileReader#2577

Open
beartyson-tech wants to merge 3 commits into
alibaba:mainfrom
beartyson-tech:fix-whole-file
Open

Refactor whole_file overwrite detection and chunked output in LogFileReader#2577
beartyson-tech wants to merge 3 commits into
alibaba:mainfrom
beartyson-tech:fix-whole-file

Conversation

@beartyson-tech
Copy link
Copy Markdown
Collaborator

Background

In whole_file overwrite mode the entire file is one log record, but relying solely on second-precision mtime
missed same-second in-place rewrites; oversized files were also split on fixed byte counts, breaking lines and
multibyte characters and making downstream output unreadable.

Scope of Changes

  1. Overwrite detection: CheckFileSignatureAndOffset now combines nanosecond mtime, file size, and the
    first-1KB signature (OR), reads the signature only when mtime/size are unchanged, and adds the
    mLastMTimeNs / mLastWholeFileSize baselines.
  2. Chunked output: Extracts DrainWholeFileChunk / GetWholeFileChunkSize / CountWholeFileChunks, sizing
    chunks line-first with a character-aligned fallback and converting GBK to UTF8 per chunk so each chunk is
    independently readable.
  3. Oversize warning rate-limit: The MaxWholeFileBytes warning in getNextReadSize is throttled by
    logtail_alarm_interval to avoid flooding the log on every read (mWholeFileOversizeWarnTime).
  4. Tests and docs: Adds TestChunkLineAlignment / TestChunkCharAlignment and fixes the case SetUp to set
    OVERWRITE; docs add whole_file, FileWriteMode, MaxWholeFileBytes, and fix the FlushTimeoutSecs default to 60.
  5. E2E case (host mode): The reader_whole_file_overwrite case switches to the @host environment and the
    sls subscriber, seeding alpha and overwriting it in place with beta via run command on loongcollector,
    validating single-record whole-file collection and a full re-read triggered by the overwrite.
  6. E2E host runner: Adds test/e2e/e2e_host_test.go (TestE2EOnHost) to drive @host-tagged cases, filling
    the gap where the open-source test/e2e only had a @docker-compose runner (with a minor go.mod indirect
    dependency adjustment).

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 3, 2026

CLA assistant check
All committers have signed the CLA.

- Improved detection of file modifications in WHOLE_FILE overwrite mode by combining mtime, size, and signature checks.
- Added new methods for chunked draining of whole files, ensuring chunks end on line boundaries or character boundaries for better readability.
- Updated unit tests to validate chunk alignment for both line and character boundaries.
- Adjusted documentation for input configurations to include new parameters related to whole file processing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants