From 37fbf72c35eee9cff8f6480dd3967c2e1d566b22 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Tue, 10 May 2016 16:09:46 +0900 Subject: [PATCH 1/3] datasources: Add kafka descriptions Related to #1. --- content/datasources/kafka.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 content/datasources/kafka.md diff --git a/content/datasources/kafka.md b/content/datasources/kafka.md new file mode 100644 index 00000000..7ebd2b6a --- /dev/null +++ b/content/datasources/kafka.md @@ -0,0 +1,31 @@ +# Collecting Data from Kafka + +## Scenario + +[Kafka](http://kafka.apache.org) is a highly distributed messaging system. + +You run Kafka as a messaging system and now want to send the messages into various other systems. + +Fluentd can setup to collect messages from Kafka. Applications include: + +1. Sending Kafka messages into HDFS for analysis +2. Sending Kafka messages into Elasticsearch for analysis + +## Setup + +1. Download the latest [kafka-fluentd-consumer jar](https://github.com/treasure-data/kafka-fluentd-consumer/releases). + +2. Set kafka-fluentd-consumer settings correctly. (See [fluentd-consumer.properties](https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties) for example.) + +3. Open your Fluentd configuration file and add the following lines: + + ``` + + type exec + command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-LATEST_VERSION-all.jar /path/to/config/fluentd-consumer.properties + tag dummy + format json + + ``` + + With the above setup, Fluentd consumes Kafka messages which are specified topics in `fluentd-consumer.properties` via `in_exec` plugin. From e256e2d59cba0b86cf0c6946533cc5209edb0b1c Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Wed, 11 May 2016 16:58:51 +0900 Subject: [PATCH 2/3] datasource kafka: Add in_kafka plugin description --- content/datasources/kafka.md | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/content/datasources/kafka.md b/content/datasources/kafka.md index 7ebd2b6a..b88a03ff 100644 --- a/content/datasources/kafka.md +++ b/content/datasources/kafka.md @@ -11,7 +11,32 @@ Fluentd can setup to collect messages from Kafka. Applications include: 1. Sending Kafka messages into HDFS for analysis 2. Sending Kafka messages into Elasticsearch for analysis -## Setup +You can two choices for this purpose whether using `in_kafka` or using `kafka-fluentd-consumer`. + +## Setup: fluent-plugin-kafka + +1. Install the [Kafka input plugin](https://github.com/htgc/fluent-plugin-kafka) by running the following command: + + ``` + $ fluent-gem install fluent-plugin-kafka + ``` + +2. Open your Fluentd configuration file and add the following lines: + + ``` + + @type kafka + host + port + topics + format + message_key + + ``` + + With the above setup, Fluentd consumes Kafka messages via `in_kafka` plugin. + +## Setup: kafka-fluentd-consumer 1. Download the latest [kafka-fluentd-consumer jar](https://github.com/treasure-data/kafka-fluentd-consumer/releases). @@ -29,3 +54,8 @@ Fluentd can setup to collect messages from Kafka. Applications include: ``` With the above setup, Fluentd consumes Kafka messages which are specified topics in `fluentd-consumer.properties` via `in_exec` plugin. + +### Note + +For simplification, you can use `in_kafka` plugin to retrive kafka messages. +If you assume highly kafka traffic in production, we recommend to use `kafka-fluentd-consumer` instead of `in_kafka`. Because `in_kafka` has been reported high CPU usage when 1000req/sec environment. In more detail, please refer to [the issue](https://github.com/htgc/fluent-plugin-kafka/issues/16). From 74938a0037dc6e5e4d33a5ce88bbade4aa6595ec Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Wed, 11 May 2016 17:04:17 +0900 Subject: [PATCH 3/3] Use "Kafka" notation --- content/datasources/kafka.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/datasources/kafka.md b/content/datasources/kafka.md index b88a03ff..046c2bbd 100644 --- a/content/datasources/kafka.md +++ b/content/datasources/kafka.md @@ -57,5 +57,5 @@ You can two choices for this purpose whether using `in_kafka` or using `kafka-fl ### Note -For simplification, you can use `in_kafka` plugin to retrive kafka messages. -If you assume highly kafka traffic in production, we recommend to use `kafka-fluentd-consumer` instead of `in_kafka`. Because `in_kafka` has been reported high CPU usage when 1000req/sec environment. In more detail, please refer to [the issue](https://github.com/htgc/fluent-plugin-kafka/issues/16). +For simplification, you can use `in_kafka` plugin to retrive Lafka messages. +If you assume highly Kafka traffic in production, we recommend to use `kafka-fluentd-consumer` instead of `in_kafka`. Because `in_kafka` has been reported high CPU usage when 1000req/sec environment. In more detail, please refer to [the issue](https://github.com/htgc/fluent-plugin-kafka/issues/16).