Skip to content

Conversation

@the-mann
Copy link
Contributor

Problem

When log rotation occurs (e.g., via lumberjack), the auto_removal feature was deleting the new log file instead of the rotated one. This caused:

  • Loss of fresh log data
  • Repeated "no such file or directory" errors
  • Gaps in log collection

Root Cause

cleanUp() used ts.tailer.Filename (a string) to remove the file. After rotation:

  1. Lumberjack renames app.logapp.log.1 and creates new app.log
  2. CW detects deletion, continues reading old inode via open FD
  3. CW reaches EOF and calls os.Remove("app.log")
  4. This deletes the new file, not the rotated one

Solution

Track the file by inode instead of filename:

  • Capture inode and device number when file is opened
  • At cleanup time, search for the file with that inode
  • Remove the actual rotated file (e.g., app.log.1)

Testing

Added TestAutoRemovalWithLogRotation that:

  • Simulates log rotation
  • Verifies the rotated file is deleted
  • Verifies the new file remains intact

Test passes with the fix, fails without it.

Compatibility

  • Unix/Linux: Uses inode tracking
  • Windows: Falls back to filename (no change in behavior)
  • Existing TestLogsFileAutoRemoval still passes

…EFA retrans metrics

- Add container_efa_retrans_bytes, container_efa_retrans_pkts, container_efa_retrans_timeout_events, container_efa_impaired_remote_conn_events, container_efa_unresponsive_remote_events
- Add corresponding pod and node EFA retrans metrics
- Updates expected config to match recent EFA retrans metrics added in commit 30f0900
- Add container_efa_retrans_bytes, container_efa_retrans_pkts, container_efa_retrans_timeout_events, container_efa_impaired_remote_conn_events, container_efa_unresponsive_remote_events
- Add corresponding pod and node EFA retrans metrics to test expectations
- Fixes GenerateAwsEmfExporterConfigKubernetesWithHighFrequencyGPUMetrics test failure
- Aligns test with EFA retrans metrics added in commit 30f0900
When log rotation occurs (e.g., via lumberjack), auto_removal was
deleting the new log file instead of the rotated one. This happened
because cleanUp() used the filename string, which after rotation
points to the new file, not the original inode being tailed.

Fix:
- Track file inode and device at tailer creation
- Search for file by inode at cleanup time
- Remove the actual rotated file instead of the new one

Added test to verify correct behavior during log rotation.
@the-mann the-mann closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant