PUMLE (a quibble for "plume") is a project under the CO2SS Project by the TRIL Lab / CCS Team. Its primary goals are to:
- Produce simulation data related to plume migration from numerical simulations generated by MRST software.
- Feed physics-informed machine learning experiments with high-quality, consistent datasets.
- Build an end-to-end ingestion/consumption data engineering pipeline for geological carbon storage applications in Brazilian reservoirs.
PUMLE consolidates simulation outputs, processes them into multidimensional (5D) arrays, and offers the ability to export data in various formats (NumPy, Zarr, MAT-files, CSV) as well as upload final results to cloud storage.
This flowchart summarizes PUMLE's purpose and high-level workflow:
flowchart TD
n1(("Start")) --> n2("fa:fa-file Input simulation parameters")
n2 --> n3["fa:fa-file-lines Process 'setup.ini'"]
n3 --> n4["fa:fa-file-export Export Matlab structs"]
n4 --> n5["fa:fa-file-import Load structs into m-file"]
n5 --> n6["fa:fa-gear Run Matlab simulation"]
n6 --> n7["fa:fa-database Store simulation results"]
n7 --> n8{"API"}
n8 -- CSE --> n9["fa:fa-arrow-up-right-dots Forward modeling"]
n8 -- ML --> n10["fa:fa-brain Machine learning"]
n9 --> n11["fa:fa-list-check Data quality assessment"]
n10 --> n11
n11 --> n12{"Consistency?"}
n12 -- Not OK --> n2
n12 -- OK --> n13(("End"))
style n1 stroke:#00C853
style n2 stroke:#2962FF
style n3 stroke:#2962FF
style n4 stroke:#2962FF
style n5 stroke:#2962FF
style n6 stroke:#2962FF
style n7 stroke:#2962FF
style n8 stroke:#FF6D00
style n9 stroke:#2962FF
style n10 stroke:#2962FF
style n11 stroke:#2962FF
style n12 stroke:#FF6D00
style n13 stroke:#D50000
-
Parameter Variation & Caching:
Generate multiple simulation parameter combinations and cache them to avoid redundant simulation runs. -
Simulation Management:
Integrate with MRST software via MATLAB scripts to execute numerical simulations and process simulation outputs. -
Data Consolidation:
Consolidate simulation outputs into multidimensional arrays, transform them into a unified “golden” dataset, and support different output formats (NumPy, Zarr, MAT-files, CSV). -
Cloud Storage Integration:
Optionally upload consolidated outputs to cloud storage (e.g., Amazon S3) using built-in S3 upload functionality. -
Metadata Handling:
Process, validate, and export simulation metadata (bronze, silver, and golden layers) using Pandas and Pandera. -
Tabular Conversion:
Transform high-dimensional simulation data into tabular (CSV) format for further analysis or consumption by downstream applications.
PUMLE is organized as a Python package and is installable via pip. To install the package (once published on PyPI), run:
pip install pumle
Alternatively, if you are developing or using it locally, clone the repository and install with:
pip install .
Additionally, create a conda environment using the provided environment file:
conda env create -f environment.yml -n pumle-env
conda activate pumle-env
A typical workflow involves configuring the pipeline via a configuration dictionary or setup.ini
file, then running the pipeline to process simulation parameters, execute simulations, consolidate results, and (optionally) upload to cloud storage.
Here’s an example script demonstrating usage:
python main.py
This example shows that after installation, a user simply imports the Pumle
class from your package, configures it, and runs the pipeline. The caching in the parameter variation module ensures that simulations with previously run parameter combinations are skipped.
pumle_project/
├── setup.py
├── pyproject.toml # Optional: for modern packaging standards
├── README.md
├── LICENSE
├── requirements.txt
├── MANIFEST.in # Optional: include additional files
└── src/
└── pumle/ # Your package code
├── __init__.py # Contains __version__ and key imports
├── arrays.py
├── cloud_storage.py
├── ini.py
├── mat_files.py
├── metadata.py
├── parameters.py
├── parameters_variation.py
├── paths.py
├── sim_results_parser.py
├── tabular.py
└── utils.py
PUMLE is released under the MIT License.
- CO2SS Project – For inspiring the simulation use case.
- TRIL Lab / CCS Team – For the foundational research and development.
- Contributors: Gustavo Oliveira, Luiz Fernando Santos, Samuel Mendes
- Environment Configuration:
Modify the prefix inenvironment.yml
as needed for your local setup. - Further Documentation:
Refer to the GLOSSARY.md for detailed descriptions of configuration parameters.