Skip to content

Add log4j-iceberg appender module for writing logs to Apache Iceberg tables #4115

@gsoundar

Description

@gsoundar

Feature Request

Description

Add a new log4j-iceberg module that provides an IcebergAppender plugin for writing log events as Parquet-backed rows in an Apache Iceberg table. This enables structured, columnar log storage with time-travel, schema evolution, and partition pruning capabilities out of the box.

Motivation

Modern observability pipelines increasingly rely on data lake formats (Iceberg, Delta, Hudi) for log analytics due to their advantages over flat files:

  • Columnar storage (Parquet) enables efficient analytical queries over large log volumes
  • Partition pruning by date allows fast time-range scans without full table reads
  • Schema evolution means log schemas can be extended without rewriting history
  • Time travel enables querying historical log state at any snapshot
  • Catalog integration (REST, Hive, AWS Glue) provides unified metadata management

Log4j already supports structured output to databases (JDBC, Cassandra, MongoDB) and message systems (Kafka, JMS). An Iceberg appender fills the gap for the data lake ecosystem.

Proposed Implementation

A new log4j-iceberg module with:

  • IcebergAppender — Log4j plugin (<Iceberg>) that buffers events and flushes them as Parquet data files
  • IcebergManager — Manages catalog lifecycle, table creation, buffered writes, and commit retry
  • Table partitioned by event_date (day granularity)
  • Schema validation on startup when loading existing tables
  • Configurable catalog properties for S3 credentials, REST auth, etc.
  • Exponential backoff retry on commit conflicts

Configuration Example

<Iceberg name="IcebergAppender"
         catalogName="my_catalog"
         catalogImpl="rest"
         catalogUri="http://localhost:8181"
         catalogWarehouse="s3://my-bucket/warehouse"
         tableNamespace="logs"
         tableName="app_logs"
         batchSize="1000"
         flushIntervalSeconds="30">
  <CatalogProperties>
    <Property name="s3.access-key-id">AKIA...</Property>
    <Property name="s3.secret-access-key">secret</Property>
  </CatalogProperties>
</Iceberg>

Dependencies

  • Apache Iceberg 1.10.1
  • Apache Parquet 1.16.0
  • Hadoop 3.4.1

Related PR

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions