Feature Request
Description
Add a new log4j-iceberg module that provides an IcebergAppender plugin for writing log events as Parquet-backed rows in an Apache Iceberg table. This enables structured, columnar log storage with time-travel, schema evolution, and partition pruning capabilities out of the box.
Motivation
Modern observability pipelines increasingly rely on data lake formats (Iceberg, Delta, Hudi) for log analytics due to their advantages over flat files:
- Columnar storage (Parquet) enables efficient analytical queries over large log volumes
- Partition pruning by date allows fast time-range scans without full table reads
- Schema evolution means log schemas can be extended without rewriting history
- Time travel enables querying historical log state at any snapshot
- Catalog integration (REST, Hive, AWS Glue) provides unified metadata management
Log4j already supports structured output to databases (JDBC, Cassandra, MongoDB) and message systems (Kafka, JMS). An Iceberg appender fills the gap for the data lake ecosystem.
Proposed Implementation
A new log4j-iceberg module with:
IcebergAppender — Log4j plugin (<Iceberg>) that buffers events and flushes them as Parquet data files
IcebergManager — Manages catalog lifecycle, table creation, buffered writes, and commit retry
- Table partitioned by
event_date (day granularity)
- Schema validation on startup when loading existing tables
- Configurable catalog properties for S3 credentials, REST auth, etc.
- Exponential backoff retry on commit conflicts
Configuration Example
<Iceberg name="IcebergAppender"
catalogName="my_catalog"
catalogImpl="rest"
catalogUri="http://localhost:8181"
catalogWarehouse="s3://my-bucket/warehouse"
tableNamespace="logs"
tableName="app_logs"
batchSize="1000"
flushIntervalSeconds="30">
<CatalogProperties>
<Property name="s3.access-key-id">AKIA...</Property>
<Property name="s3.secret-access-key">secret</Property>
</CatalogProperties>
</Iceberg>
Dependencies
- Apache Iceberg 1.10.1
- Apache Parquet 1.16.0
- Hadoop 3.4.1
Related PR
Feature Request
Description
Add a new
log4j-icebergmodule that provides anIcebergAppenderplugin for writing log events as Parquet-backed rows in an Apache Iceberg table. This enables structured, columnar log storage with time-travel, schema evolution, and partition pruning capabilities out of the box.Motivation
Modern observability pipelines increasingly rely on data lake formats (Iceberg, Delta, Hudi) for log analytics due to their advantages over flat files:
Log4j already supports structured output to databases (JDBC, Cassandra, MongoDB) and message systems (Kafka, JMS). An Iceberg appender fills the gap for the data lake ecosystem.
Proposed Implementation
A new
log4j-icebergmodule with:IcebergAppender— Log4j plugin (<Iceberg>) that buffers events and flushes them as Parquet data filesIcebergManager— Manages catalog lifecycle, table creation, buffered writes, and commit retryevent_date(day granularity)Configuration Example
Dependencies
Related PR