Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2] Support tail file source #8795

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

hailin0
Copy link
Member

@hailin0 hailin0 commented Feb 24, 2025

Purpose of this pull request

[Connector-V2] Support tail file source

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new tail file source connector (Connector-V2) to support tailing log files and includes several new classes and improvements for file scanning, change detection, and event processing. In addition, small adjustments were made to CDC connectors to improve configuration and event handling.

Reviewed Changes

Copilot reviewed 24 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
FileMatcher.java Adds file matching functionality with configurable scan depth and caching.
FileManager.java Introduces file registry management and split assignment based on file changes.
FileHarvester.java Implements file tailing with buffering, multiline support, and position management.
ChangedFiles.java Provides utilities to detect added/removed files based on inode changes.
Utils.java Adds utility methods including inode extraction and hostname/ip retrieval (note: one getParentDir variant returns null).
TailFileSourceState.java, TailFileSourceSplitEnumerator.java, TailFileSourceSplit.java, TailFileSourceReader.java, TailFileSourceFactory.java, TailFileSourceConfig.java, TailFileSource.java New classes and configuration for the tail file source connector, handling split management, state snapshots, and source creation.
Various CDC connector files (SQL Server, Postgres, Debezium related) Minor adjustments and enhancements for CDC event handling and configuration properties.
Files not reviewed (2)
  • plugin-mapping.properties: Language not supported
  • seatunnel-connectors-v2/connector-tailfile/pom.xml: Language not supported
Comments suppressed due to low confidence (1)

seatunnel-connectors-v2/connector-tailfile/src/main/java/org/apache/seatunnel/connectors/tailfile/source/Utils.java:51

  • The method getParentDir(String... paths) is not implemented and always returns null, which may lead to NullPointerExceptions if invoked. Consider providing a proper implementation or removing the unused method.
public static File getParentDir(String... paths) { return null; }

try {
fileManager.tailFile(fileInode, output::collect);
} catch (Exception e) {
log.error("Field to tail file {}", fileInode, e);
Copy link
Preview

Copilot AI Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message contains a typo ('Field to tail file') which should be corrected to 'Failed to tail file' for clarity.

Suggested change
log.error("Field to tail file {}", fileInode, e);
log.error("Failed to tail file {}", fileInode, e);

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant