Iceberg Metadata Insights is a Streamlit-based application designed to provide comprehensive insights into Apache Iceberg table metadata. It allows users to analyze, optimize, and explore Iceberg table statistics, snapshots, file details, and more.
See demo video on LinkedIn: Iceberg Metadata Insights Demo
- Table Overview: View key statistics such as file counts, partition counts, row counts, and file size metrics.
- Snapshot Timeline: Visualize snapshot history with detailed operation data.
- File Size Distribution: Analyze file size distribution with histograms.
- Table Actions: Perform operations like table analysis, optimization, snapshot expiration, and orphan file removal.
- Detailed Metadata: Explore table DDL, properties, history, manifests, partitions, files, and more.
- Python 3.8 or higher
- Required Python libraries:
streamlittrinopandasplotlystreamlit-extras
- Required environment
- Running Trino Server with Iceberg connector
- Hive Metastore
- Existing iceberg tables
Create .env file in the root directory and add your Trino connection details:
TRINO_HOST=your_trino_host
TRINO_PORT=your_trino_port
TRINO_USER=your_trino_user
TRINO_CATALOG=your_trino_catalog
TRINO_SCHEMA=your_trino_schema
# TRINO_PASSWORD # If applicableTo run the application in a Docker container, you can use the provided Dockerfile. This allows for easy deployment and isolation of dependencies.
docker build -t iceberg-metadata-insights .docker run -p 8501:8501 -e TRINO_HOST=your_trino_host -e TRINO_PORT=your_trino_port -e TRINO_USER=your_trino_user iceberg-metadata-insights-
Clone the repository:
git clone https://github.com/alaturqua/iceberg-metadata-insights.git cd iceberg-metadata-insightsInstall the required Python libraries:
python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate` pip install .
-
Start the Streamlit app:
streamlit run ./src/app.py
-
Open the app in your browser at
http://localhost:8501.
- Select a schema and table from the sidebar.
- View table statistics, snapshot timelines, and file size distributions.
- Perform table actions like optimization, snapshot expiration, and more.
- Explore detailed metadata through various tabs.
This project is licensed under the GNU GPLv3 License. See the LICENSE file for details.
Contributions are welcome! If you have suggestions or improvements, please open an issue or submit a pull request.
