Flume S3 Sink

Special wrapper around the native HDFSEventSink to be able to use empty inUseSuffix.

Problem description

HDFSEvenSink was designed to write data into HDFS. When flume is writing data into HDFS it does make sense to use tmp suffix, because client has to able to distinguish final data and data "in-progress".

S3 doesn't support "append" operation, so flume follows the next workflow:

creates temporary file on the agent machine and writes new events to it
when file is ready, flume copies it on s3 with inUseSuffix in the end
finally, flume renames the file by removing inUseSuffix

Renaming of files on s3 is essentially 2 operations: "copy to a new file" and "remove the old one". I was trying to raise the question via flume user-list, but without success. Flume doesn't allow you to specify empty inUseSuffix because of: https://github.com/apache/flume/blob/flume-1.6/flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java#L155

Development

To build the jar file for the sink (tested with gradle 2.2.1):

brew install gradle
gradle build

Usage

To use the sink:

Place jar into the plugins directory:

mkdir -p $FLUME_HOME/plugins.d/flume-s3-sink/lib
cp build/libs/flume-s3-sink-1.0.jar $FLUME_HOME/plugins.d/flume-s3-sink/lib

Configure the sink:

agent.sinks.my_s3sink.type = org.apache.flume.sink.s3.S3Sink
# Other options are the same as for https://flume.apache.org/FlumeUserGuide.html#hdfs-sink

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flume S3 Sink

Problem description

Development

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Flume S3 Sink

Problem description

Development

Usage