Skip to content

Latest commit

 

History

History
119 lines (102 loc) · 3.66 KB

File metadata and controls

119 lines (102 loc) · 3.66 KB
id thrift
title Thrift

To use this Apache Druid extension, include druid-thrift-extensions in the extensions load list.

This extension enables Druid to ingest Thrift-encoded data from streaming sources such as Kafka and Kinesis.

You may want to use another version of thrift, change the dependency in pom and compile yourself.

Thrift input format

Thrift-encoded data for streaming ingestion (Kafka, Kinesis) can be ingested using the Thrift input format. It supports flattenSpec for extracting fields from nested Thrift structs using JSONPath expressions.

Field Type Description Required
type String Must be thrift yes
thriftClass String Fully qualified class name of the Thrift-generated TBase class to deserialize into. yes
thriftJar String Path to a JAR file containing the Thrift class. If not provided, the class is looked up from the classpath. no
flattenSpec JSON Object Specifies flattening of nested Thrift structs. See Flattening nested data for details. no

Example: Kafka ingestion

Consider the following Thrift schema definition:

namespace java com.example.druid

struct Author {
  1: string firstName;
  2: string lastName;
}

struct Book {
  1: string date;
  2: double price;
  3: string title;
  4: Author author;
}

Compile it to produce com.example.druid.Book (and com.example.druid.Author) and make the resulting JAR available on the classpath of your Druid processes, or reference it via thriftJar.

The following Kafka supervisor spec ingests compact-encoded Book messages, using a flattenSpec to extract the nested author.lastName field:

{
  "type": "kafka",
  "spec": {
    "dataSchema": {
      "dataSource": "books",
      "timestampSpec": {
        "column": "date",
        "format": "auto"
      },
      "dimensionsSpec": {
        "dimensions": [
          "title",
          "lastName"
        ]
      },
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "NONE"
      }
    },
    "tuningConfig": {
      "type": "kafka"
    },
    "ioConfig": {
      "type": "kafka",
      "consumerProperties": {
        "bootstrap.servers": "localhost:9092"
      },
      "topic": "books",
      "inputFormat": {
        "type": "thrift",
        "thriftClass": "com.example.druid.Book",
        "flattenSpec": {
          "useFieldDiscovery": true,
          "fields": [
            {
              "type": "path",
              "name": "lastName",
              "expr": "$.author.lastName"
            }
          ]
        }
      },
      "taskCount": 1,
      "replicas": 1,
      "taskDuration": "PT1H"
    }
  }
}