Skip to content
/ PUMLE Public
forked from gcpeixoto/PUMLE

Data engineering pipeline blueprint for geological carbon storage applications in Brazil.

License

Notifications You must be signed in to change notification settings

luiz826/PUMLE

 
 

Repository files navigation

About PUMLE

Beta Status

PUMLE (a quibble for "plume") is a project under the CO2SS Project by the TRIL Lab / CCS Team. Its primary goals are to:

  • Produce simulation data related to plume migration from numerical simulations generated by MRST software.
  • Feed physics-informed machine learning experiments with high-quality, consistent datasets.
  • Build an end-to-end ingestion/consumption data engineering pipeline for geological carbon storage applications in Brazilian reservoirs.

PUMLE consolidates simulation outputs, processes them into multidimensional (5D) arrays, and offers the ability to export data in various formats (NumPy, Zarr, MAT-files, CSV) as well as upload final results to cloud storage.


Process Overview

This flowchart summarizes PUMLE's purpose and high-level workflow:

flowchart TD
    n1(("Start")) --> n2("fa:fa-file Input simulation parameters")
    n2 --> n3["fa:fa-file-lines Process 'setup.ini'"]
    n3 --> n4["fa:fa-file-export Export Matlab structs"]
    n4 --> n5["fa:fa-file-import Load structs into m-file"]
    n5 --> n6["fa:fa-gear Run Matlab simulation"]
    n6 --> n7["fa:fa-database Store simulation results"]
    n7 --> n8{"API"}
    n8 -- CSE --> n9["fa:fa-arrow-up-right-dots Forward modeling"]
    n8 -- ML --> n10["fa:fa-brain Machine learning"]
    n9 --> n11["fa:fa-list-check Data quality assessment"]
    n10 --> n11
    n11 --> n12{"Consistency?"}
    n12 -- Not OK --> n2
    n12 -- OK --> n13(("End"))
    style n1 stroke:#00C853
    style n2 stroke:#2962FF
    style n3 stroke:#2962FF
    style n4 stroke:#2962FF
    style n5 stroke:#2962FF
    style n6 stroke:#2962FF
    style n7 stroke:#2962FF
    style n8 stroke:#FF6D00
    style n9 stroke:#2962FF
    style n10 stroke:#2962FF
    style n11 stroke:#2962FF
    style n12 stroke:#FF6D00
    style n13 stroke:#D50000
Loading

Features

  • Parameter Variation & Caching:
    Generate multiple simulation parameter combinations and cache them to avoid redundant simulation runs.

  • Simulation Management:
    Integrate with MRST software via MATLAB scripts to execute numerical simulations and process simulation outputs.

  • Data Consolidation:
    Consolidate simulation outputs into multidimensional arrays, transform them into a unified “golden” dataset, and support different output formats (NumPy, Zarr, MAT-files, CSV).

  • Cloud Storage Integration:
    Optionally upload consolidated outputs to cloud storage (e.g., Amazon S3) using built-in S3 upload functionality.

  • Metadata Handling:
    Process, validate, and export simulation metadata (bronze, silver, and golden layers) using Pandas and Pandera.

  • Tabular Conversion:
    Transform high-dimensional simulation data into tabular (CSV) format for further analysis or consumption by downstream applications.


Installation

PUMLE is organized as a Python package and is installable via pip. To install the package (once published on PyPI), run:

pip install pumle

Alternatively, if you are developing or using it locally, clone the repository and install with:

pip install .

Additionally, create a conda environment using the provided environment file:

conda env create -f environment.yml -n pumle-env
conda activate pumle-env

Usage

A typical workflow involves configuring the pipeline via a configuration dictionary or setup.ini file, then running the pipeline to process simulation parameters, execute simulations, consolidate results, and (optionally) upload to cloud storage.

Here’s an example script demonstrating usage:

python main.py

This example shows that after installation, a user simply imports the Pumle class from your package, configures it, and runs the pipeline. The caching in the parameter variation module ensures that simulations with previously run parameter combinations are skipped.


Development & Contributing

Project Structure

pumle_project/
├── setup.py
├── pyproject.toml       # Optional: for modern packaging standards
├── README.md
├── LICENSE
├── requirements.txt
├── MANIFEST.in          # Optional: include additional files
└── src/
    └── pumle/           # Your package code
        ├── __init__.py  # Contains __version__ and key imports
        ├── arrays.py
        ├── cloud_storage.py
        ├── ini.py
        ├── mat_files.py
        ├── metadata.py
        ├── parameters.py
        ├── parameters_variation.py
        ├── paths.py
        ├── sim_results_parser.py
        ├── tabular.py
        └── utils.py

License

PUMLE is released under the MIT License.


Acknowledgements

  • CO2SS Project – For inspiring the simulation use case.
  • TRIL Lab / CCS Team – For the foundational research and development.
  • Contributors: Gustavo Oliveira, Luiz Fernando Santos, Samuel Mendes

Remarks

  • Environment Configuration:
    Modify the prefix in environment.yml as needed for your local setup.
  • Further Documentation:
    Refer to the GLOSSARY.md for detailed descriptions of configuration parameters.

About

Data engineering pipeline blueprint for geological carbon storage applications in Brazil.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 77.6%
  • Python 10.1%
  • Jupyter Notebook 9.8%
  • MATLAB 1.9%
  • BitBake 0.4%
  • Dockerfile 0.1%
  • Shell 0.1%